memory mirroring?
memory mirroring?
Posted Mar 24, 2012 4:37 UTC (Sat) by jzbiciak (guest, #5246)In reply to: memory mirroring? by dlang
Parent article: Toward better NUMA scheduling
I was thinking about this earlier. If you have a page from a shared library (eg. libc), if it's hot enough to be truly important, then multiple tasks will have pulled it into at least the shared L3 on a modern processor. You don't need further duplication at the NUMA-node level then.
If the page is shared, but not hot, then the cost of missing on it won't register very highly on the performance of the app, because it's a small portion of its run time.
So, that leaves us with these weird middle-ground pages that are shared, moderately used (ie. neither hot nor cold, or only hot in sporadic bursts), but their users are so spread out and diffuse that they can't manage to keep copies resident in the onchip caches. It seems like those will truly benefit from duplication.
All that said, the crossover thresholds that determine the size and impact of this weird middle ground are a function of the cost of the remote fetch (larger latency/less bandwidth makes this middle-ground window larger) and the size of the last-level-of-cache-before-NUMA (smaller size makes this middle-ground window larger). Modern systems seem to be working to close this gap from both sides, with increasing L3 sizes, and an emphasis on moderating the chip-to-chip latency while increasing the chip-to-chip bandwidth.
Or am I thinking about this wrongly?
Posted Mar 24, 2012 8:47 UTC (Sat)
by dlang (guest, #313)
[Link] (1 responses)
Posted Mar 24, 2012 15:46 UTC (Sat)
by jzbiciak (guest, #5246)
[Link]
That said, library / shared pages still will get referenced at least somewhat regularly by all of the processors on the NUMA node, and so the LRU will prevent the hottest lines from getting evicted. If you assume non-random replacement (which, unfortunately, you can't with certain recent processors), the hot library pages will remain near the front of the LRU, so only the back of the LRU gets cycled.
(The "unfortunately you can't" comment applies to recent ARM Cortex-A series processors, which have a highly associative shared L2 ("That's good!") with random replacement in lieu of an LRU ("That's bad!").)
memory mirroring?
memory mirroring?