> For one modern architecture there is a big hit after 32 sockets
32 sockets * 6 (true) cores/socket = 192 core system
at that sort of scale, I'll bet that locking overhead is at least as big a problem as the memory access times.
now, the 'commodity' NUMA keeps creeping up the scale, what is it now, 8 sockets * 6 cores = 48 core systems (*2 or more if you want to include hyperthread 'cores')?