On a modern multi-socket multi-core machine each socket is its own largely independent computer and the whole machine is a NUMA cluster. That means that each process is assigned to a particular node and that's where its memory lives, splitting memory between nodes or bouncing a process between different nodes reduces performance. My guess is that threading will scale weirdly when you get beyond what can be handled by all the cores in one socket whereas a multi-process model can keep more memory local to the socket the process is running on.
I would suggest starting with a multi-process model because you get better fault tolerance with memory protection then consider threading if that doesn't test out for concurrent performance
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds