Posted Mar 18, 2012 19:08 UTC (Sun) by khim
In reply to: Hardware?
Parent article: Toward better NUMA scheduling
AFAIK no kernel code is needed for operation of the CPU caches, since the BIOS does all the setup (with the exception of marking uncacheable memory ranges on some systems).
This only true if you don't ever use DMA and don't play tricks with page tables. Since kernel does both it includes huge amount of code which is supposed to keep all the data in sync.
As for DMA, surely the system could manage that automatically as well?
To do that it basically needs to virtualize all memory accesses by all devices. Yes, it's doable but it'll slowdown everything and will either hog the VT-x/AMD-V or introduce yet another emulation level (which will require specialized CPU or separate emulation chips). Not a good idea: large NUMA systems are exactly where things like KVM are most valuable.
like PCIe DMA is cache coherent for L1/L2/L3
Fail. PCIe DMA is not cache coherent for L1/L2/L3. It's resposibility of kernel to make sure everything works correctly despite the fact that IOMMU may have different setup from MMU in CPU and despite the fact that DMA moves data to main memory without bothering to do anything with CPU caches.
To clarify, a simple way to do this is to just add a few gigabytes of per-node L4 cache (in standalone chips), and use the same cache-coherency mechanism for it used for the L3 level.
Success. “Cache-coherency mechanism used for the L3 level” is part of OS kernel and yes, it's possible to add transparent handling of memory from different NUMA nodes to it. You don't need anything on hardware level for that - this was my point.
The advantage could be that memory movement would happen by specialized hardware in parallel with CPU operation.
Impossible. Contemporary systems attach memory directly to CPU - this means that any such mechanism will slow down regular memory accesses which will probably make the whole schema quite pointless.
Basically this is nice idea which looks fine on paper but requires radical redesign of everything (kernel, CPU, chipset, etc) from the ground up which makes it pointless in practice.
to post comments)