|
|
Log in / Subscribe / Register

I think MGLRU isn't ready for a lot of uses unless it keeps the paired lists

I think MGLRU isn't ready for a lot of uses unless it keeps the paired lists

Posted Mar 6, 2026 18:30 UTC (Fri) by PeeWee (subscriber, #175777)
In reply to: I think MGLRU isn't ready for a lot of uses unless it keeps the paired lists by jthill
Parent article: Reconsidering the multi-generational LRU

I don't think tmpfs pages are considered file-backed, so your swappiness setting wouldn't be trying to protect them anyway.
No, that's not why I mentioned my liberal use of tmpfs, at all; it's mounted noswap, forgot to say. It was meant to convey the memory pressure it induces. Otherwise I'd never see any swapping, ever, because I keep my system on a diet. I should have also mentioned why swappiness=180. That's because I am using LUKS full disk encryption, which makes file page refault significantly more expensive, even from the fast-ish NVMe SSD. On top of that I have BTRFS with transparent compression combined with some FUSE-based mergerfs mounts on top, which adds even more costs, in case files are read from there, i.e. all my media files. Page cache is almost exclusively clean most of the time, so evicting from there is as cheap as it gets. But that's only half the story, when one considers the possibility of refaults. Zswap, while it does incur a penalty on reclaim, is practically free, in terms of refault cost, compared LUKS->BTRFS->mergerfs. As long as I don't hit the zswap pool limit, that is, of course, but that (almost) never happens.

I got in to swappiness tuning when I only had an hdd and was pushing limits, I found I disliked deferring slowdowns to workload-switching time, if I needed to do like ten minutes of image editng and it was little slower but say madeupnumber30 seconds extra, then go back to what I was doing before, the needed files being all still cached and ready is very gratifying, like, when *I'm* done with the interruption, my computer's got my back and the real work is all still up to speed instead of making me sit and wait for 10 seconds while it rereads all the stuff it evicted to save me those 30 I didn't notice and might have spent anyway.

tl;dr: whether the computer's ready for my next trick a little before or well before I'm ready to perform it makes no difference to me, but if it's not ready when I am, that matters.

So yeah, it's going to be workload-specific, and my impression from the article was people with specific workloads didn't like mglru's observed behavior. So if the idea is to settle on just one reclaim algorithm I think pick the one that's easily tunable to avoid frustrating waiting humans.
According to this lengthy article, which was referenced here, swappiness fits even better into a more precise picture:
> > For phones and laptops, executable pages are frequently evicted
> > despite the fact that there are many less recently used anon pages.
> > Major faults on executable pages cause "janks" (slow UI renderings)
> > and negatively impact user experience.
>
> This is not because of the inactive/active scheme but rather because
> of the anon/file split, which has evolved over the years to just not
> swap onto iop-anemic rotational drives.
>
> We ran into the same issue at FB too, where even with painfully
> obvious anon candidates and a fast paging backend the kernel would
> happily thrash on the page cache instead.
>
> There has been significant work in this area recently to address this
> (see commit 5df741963d52506a985b14c4bcd9a25beb9d1981). We've added
> extensive testing and production time onto these patches since and
> have not found the kernel to be thrashing executables or be reluctant
> to go after anonymous pages anymore.
>
> I wonder if your observation takes these recent changes into account?

Again, I agree with all you said above. And I can confirm your series
has generally fixed the problem for the following test case.

When our most common 4GB Chromebook model is zram-ing under memory
pressure, the size of the file lru is
  ~80MB without that series
  ~120MB with that series
  ~140MB with this series

User experience is acceptable as long as the size is above 100MB. For
optimal user experience, the size is 200MB. But we do not expect the
optimal user experience under memory pressure.
I hope there is enough context. The gist here is, they wanted to mitigate exectuable page thrashing, i.e. refaulting program code, on Android and Chrome OS which was rooted in the kernel's PTSD-induced reluctance to swap to spinning rust. Then Facebook fixed that and MGLRU further improves on that. That seems to contradict the claim that swappiness is rendered useless. Further up in that message Zhao explains that MGLRU actually sets file and anon pages on an equal footing in terms of age-based eviction. Reclaim will look at the oldest generation of both types just as ordinary pages. And only then will swappiness factor into the selection from the candidates found. And I think that MGLRU seemingly evicting file pages overly aggressively just shows that most file pages are cold. Oh, and file pages keep getting put on the oldest generation. They also need to "earn" their place on the next-younger generation LRU list by climbing above tier 0 inside gen 0. But refaults are also still tracked by shadow entries, so they will refault to the tier they were evicted from. And only when the refault rate of their tier has a higher refault rate than tier 0, will they move to the younger generation. I hope that's roughly correct.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds