LWN.net Logo

memory mirroring?

memory mirroring?

Posted Mar 17, 2012 12:24 UTC (Sat) by dlang (✭ supporter ✭, #313)
In reply to: memory mirroring? by khim
Parent article: Toward better NUMA scheduling

True, current NUMA machines are almost all in the category of multi-socket AMD64 systems, which have fast enough interconnects that it's usable if you ignore NUMA ad just treat is as a SMP machine.

Historic NUMA machines had MUCH slower interconnects, the best comparison in moderns systems would be if you were connecting your CPU nodes together with high speed networks.

There are still some people building such machines (I think the current Cray systems are this category), but when you get to interconnects that are that expensive, you are frequently better segmenting the system and running it as if it was a cluster of systems, or (the more common case), just build a cluster of commodity systems instead of the monster NUMA system in the first place.

I think that if AMD hadn't introduced NUMA to the commodity desktop/server with the Opteron, NUMA would be something that's so rare that the overhead and complexity of it's logic wouldn't be acceptable in the kernel.

There are some applications that really are hard to split into a multi-machine cluster, and for those NUMA (including RDMA setups that tie multiple commodity machine together) are the right tool for the task, but they are pretty rare, it's almost always worth re-architecting the application to avoid this requirement.


(Log in to post comments)

memory mirroring?

Posted Mar 17, 2012 23:07 UTC (Sat) by davecb (subscriber, #1574) [Link]

Quite large systems have large penalties for using distant nodes than small ones: bus backplane latency is not your friend (;-)) Smaller systems with bus lengths in the millimeters don't pay so great a penalty.

For one modern architecture there is a big hit after 32 sockets even when using a backplane derived from Cray's lowest-latency design. The speed of light needs improvement!

Ancient mainframes used a radial design to avoid having to be NUMA, at the expense of having an exceedingly complicated, multi-ported "system controller" where we'd put a bus.

--dave

memory mirroring?

Posted Mar 18, 2012 0:56 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

> For one modern architecture there is a big hit after 32 sockets

32 sockets * 6 (true) cores/socket = 192 core system

at that sort of scale, I'll bet that locking overhead is at least as big a problem as the memory access times.

now, the 'commodity' NUMA keeps creeping up the scale, what is it now, 8 sockets * 6 cores = 48 core systems (*2 or more if you want to include hyperthread 'cores')?

you're a bit low....

Posted Mar 30, 2012 21:55 UTC (Fri) by cbf123 (guest, #74020) [Link]

Current high-end xeons have 8 "real" cores.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds