As for DMA, surely the system could manage that automatically as well?
That is, IOMMUs would map to the 64-bit automatically managed address space, and the system would move memory for DMA just like it does for CPU access, and just like PCIe DMA is cache coherent for L1/L2/L3, it can be cache-coherent for this hypotetical L4 cache.
To clarify, a simple way to do this is to just add a few gigabytes of per-node L4 cache (in standalone chips), and use the same cache-coherency mechanism for it used for the L3 level.
The advantage could be that memory movement would happen by specialized hardware in parallel with CPU operation.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds