User: Password:
|
|
Subscribe / Log in / New account

Hardware?

Hardware?

Posted Mar 18, 2012 18:15 UTC (Sun) by slashdot (guest, #22014)
In reply to: Hardware? by khim
Parent article: Toward better NUMA scheduling

AFAIK no kernel code is needed for operation of the CPU caches, since the BIOS does all the setup (with the exception of marking uncacheable memory ranges on some systems).

As for DMA, surely the system could manage that automatically as well?

That is, IOMMUs would map to the 64-bit automatically managed address space, and the system would move memory for DMA just like it does for CPU access, and just like PCIe DMA is cache coherent for L1/L2/L3, it can be cache-coherent for this hypotetical L4 cache.

To clarify, a simple way to do this is to just add a few gigabytes of per-node L4 cache (in standalone chips), and use the same cache-coherency mechanism for it used for the L3 level.

The advantage could be that memory movement would happen by specialized hardware in parallel with CPU operation.


(Log in to post comments)

Hardware?

Posted Mar 18, 2012 19:08 UTC (Sun) by khim (subscriber, #9252) [Link]

AFAIK no kernel code is needed for operation of the CPU caches, since the BIOS does all the setup (with the exception of marking uncacheable memory ranges on some systems).

This only true if you don't ever use DMA and don't play tricks with page tables. Since kernel does both it includes huge amount of code which is supposed to keep all the data in sync.

As for DMA, surely the system could manage that automatically as well?

To do that it basically needs to virtualize all memory accesses by all devices. Yes, it's doable but it'll slowdown everything and will either hog the VT-x/AMD-V or introduce yet another emulation level (which will require specialized CPU or separate emulation chips). Not a good idea: large NUMA systems are exactly where things like KVM are most valuable.

like PCIe DMA is cache coherent for L1/L2/L3

Fail. PCIe DMA is not cache coherent for L1/L2/L3. It's resposibility of kernel to make sure everything works correctly despite the fact that IOMMU may have different setup from MMU in CPU and despite the fact that DMA moves data to main memory without bothering to do anything with CPU caches.

To clarify, a simple way to do this is to just add a few gigabytes of per-node L4 cache (in standalone chips), and use the same cache-coherency mechanism for it used for the L3 level.

Success. “Cache-coherency mechanism used for the L3 level” is part of OS kernel and yes, it's possible to add transparent handling of memory from different NUMA nodes to it. You don't need anything on hardware level for that - this was my point.

The advantage could be that memory movement would happen by specialized hardware in parallel with CPU operation.

Impossible. Contemporary systems attach memory directly to CPU - this means that any such mechanism will slow down regular memory accesses which will probably make the whole schema quite pointless.

Basically this is nice idea which looks fine on paper but requires radical redesign of everything (kernel, CPU, chipset, etc) from the ground up which makes it pointless in practice.

Hardware?

Posted Mar 18, 2012 21:20 UTC (Sun) by slashdot (guest, #22014) [Link]

> To do that it basically needs to virtualize all memory accesses by all devices

Which the hardware already does where IOMMUs are present...

> Fail. PCIe DMA is not cache coherent for L1/L2/L3

Uh?

PCI and PCIe are definitely cache coherent (or more precisely, they support it, although you can tell devices to not snoop caches).

> Success. “Cache-coherency mechanism used for the L3 level” is part of OS kernel

What?!?

This is totally false, and is simply a ridiculous claim.

The cache coherence of L3 caches in x86 SMP systems certainly isn't managed by the kernel!

Hardware?

Posted Mar 19, 2012 4:21 UTC (Mon) by khim (subscriber, #9252) [Link]

PCI and PCIe are definitely cache coherent (or more precisely, they support it, although you can tell devices to not snoop caches).

PCI does not support it at all (initially it was optional part of the standard but since nobody bothered to implement it later versions just removed it completely) and PCIe does not recommend to use it on large and/or busy systems (and NUMA systems tends to be both large and busy).

Which the hardware already does where IOMMUs are present...

Nope. IOMMU only hides physical addresses from hardware devices. OS kernel is in charge and must keep everything in sync. IOMMU presence is quite visible for device drivers. Since you want to make something not visible in kernel at all you need yet another level of indirection.

> Success. “Cache-coherency mechanism used for the L3 level” is part of OS kernel

What?!?

This is totally false, and is simply a ridiculous claim.

The cache coherence of L3 caches in x86 SMP systems certainly isn't managed by the kernel!

See above. If your system includes some hardware which does not care about L3 cache coherence (contemporary system tend to include few PCI devices at least and on busy systems you don't want to use built-in PCIe cache snooping because it sucks significant amount of inter-CPU bandwidth which is scarce on such systems) then your kernel is charge of keeping L3 cache and main memory in sync.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds