Quite large systems have large penalties for using distant nodes than small ones: bus backplane latency is not your friend (;-)) Smaller systems with bus lengths in the millimeters don't pay so great a penalty.
For one modern architecture there is a big hit after 32 sockets even when using a backplane derived from Cray's lowest-latency design. The speed of light needs improvement!
Ancient mainframes used a radial design to avoid having to be NUMA, at the expense of having an exceedingly complicated, multi-ported "system controller" where we'd put a bus.
Posted Mar 18, 2012 0:56 UTC (Sun) by dlang (✭ supporter ✭, #313)
[Link]
> For one modern architecture there is a big hit after 32 sockets
32 sockets * 6 (true) cores/socket = 192 core system
at that sort of scale, I'll bet that locking overhead is at least as big a problem as the memory access times.
now, the 'commodity' NUMA keeps creeping up the scale, what is it now, 8 sockets * 6 cores = 48 core systems (*2 or more if you want to include hyperthread 'cores')?
you're a bit low....
Posted Mar 30, 2012 21:55 UTC (Fri) by cbf123 (guest, #74020)
[Link]