The multi-generational LRU and ELRU

Posted Apr 5, 2021 7:15 UTC (Mon) by hsu (guest, #151475)
Parent article: The multi-generational LRU

I presented very similar algorithm with further improvements in my master's thesis in 1995, and did some analysis and performance evaluation through simulation models on it. I called the algorithm ELRU, for Extended Least Recently Used algorithm. The letter E also conveniently presents the data structure in abstract form.

Please check my thesis for more stuff. Some quick comments:

- I put the new pages to "insert list", not the top list (youngest generation). If new pages are put on youngest generation list, with smaller memories would essentially reduce the algorithm LRU case anyway, when a large sequential access would happen. Insert list would be dynamically defined, and I simulated different configurations (see ELRU Treshold). Simulation graphics on page 60 show why this is important at least for my simulated workloads.

- Further, for pages which are known to be of a specific type, they can be insert to lists representing different priorities. If the kernel makes a wrong prediction, it likely gets corrected by accesses bumping the pages to upper list. I think I did not do simulations for this though.

- I moved the accessed pages to one list higher. Not straight to youngest generation. If the access pattern is very hierachical, which it often case with filesystems, you get better results with this. See simulation graphs. I think putting active pages back to youngest list would end up corresponding ELRU(n, 0), which has substantially lower performance.

- See priorities, 3.3 in thesis, may be relevant in page cache. I did not simulate it during my work, further testing might be useful. It would be simple to implement, to allow high priority stuff to have better chance to stay in memory.

- I think for simulations I used fixed list set, but it can be dynamic, new lists are created when needed. I do not remember if we did dynamic list allocation in real implementation/

We also implemented the algorithm as part of the Shadows Database Project at Helsinki University of Technology (HUT), now part of Aalto Univerity in Helsinki. Shadow paging (similar concept to what is now called "copy on write") creates strongly hierarchical workloads due to page tables and metadata accessed very frequently. Same might be true for other copy on write filesystems as well, such as ZFS, depending on the implementation. We considered implementing a filesystem using code base created during Shadows project but it did not materialize due to other stuff taking all the time. The code can be released under public source license, though it is database & userland stuff. Shadows had some useful ideas which I do not have seen anywhere else, such as RAID model works with disks of different sizes, works well with flash without wear levelling as it naturally does large block sequential writes through garbage collection.

I think the Master's thesis was published in CS department publication, so it should count as prior art in case of IPR issues. I did not file patents at time, so it is possibly safe to use.

I also did some literature research during the project so there are some references in the Master's thesis as well, caching algorithms are pretty old field, already was 25 years ago. The oldest stuff might not be googleable, I spent a lot of time in dusty libraries back then. This can be further prior art source.

The full Master's thesis is at

http://www.suonsivu.net/masters-thesis-Heikki-Suonsivu-19...

BibTex and URN can be found at

https://aaltodoc.aalto.fi/handle/123456789/83375