LWN: Comments on "Multi-generational LRU: the next generation"

Multi-generational LRU: the next generation

Hi-Angel — Thu, 10 Jun 2021 11:46:23 +0000

I have 2 comments for this thread, both regarding HDD and SSD.

First, for HDD: I'm the guy who posted the testimonial that the v3 of the patches refers to, and I am using HDD. So, just in case anyone's wondering about behavior on HDD specifically, here it is.

Second, for SSD: I see here an attitude that once you create a SWAP on SSD, all problems are gone. So here's my experience: this is not true.

My gf has a Macbook 2013 with SSD, 4G RAM, ZSWAP. She always had swap-partition on the SSD. Before I went out to try the patches on her laptop, she have also had frequent SWAP-storms, her overall experience was pretty bad.

After I configured her system to use the multi-LRU patches (v2), her experience improved a lot. Now the only moment when lags start appearing is when her SWAP usage goes to around 7-8G (Idk why exactly that size).

So, for anyone out there thinking creating a SWAP on SSD will magically solve any need in the memory reclaim rework — that ain't true.

Multi-generational LRU: the next generation

anton — Fri, 28 May 2021 18:17:22 +0000

My recommendation is to have no swap if the backing device is a HDD. Why? If the system needs so much RAM that it starts swapping, it becomes unusable anyway.

With an SSD, swap may be more viable. And if you swap rarely, don't worry about the limited number of SSD writes possible. If you swap often, buy more RAM.

Multi-generational LRU: the next generation

yuzhao@google.com — Thu, 27 May 2021 07:29:20 +0000

The building blocks are similar: the access recency, the access frequency and shadows/ghosts.

The fundamental arguments are different: the multigenerational lru argues that pages that have been used only once (if we are sure) are always the best choice, no matter how recent they were used, because 1) even some of them are not, they will be protected upon the second access; 2) the cost to ascertain whether the rest are better or not is higher (to do so we probably have to scan more than half a million pages on a 4GB laptop under moderate pressure, and there is no guarantee we'd make better choices after we have done it).

Essentially, ZFS ARC is a *cache* replacement implementation, which discovers page accesses reactively by hooks. For *page* replacement, we have to scan page tables proactive, which is far more difficult and expensive in terms of discovering page accesses.

Multi-generational LRU: the next generation

nybble41 — Wed, 26 May 2021 21:43:09 +0000

I think the problem here is the "number of accesses" metric. All that matters for predicting how likely the page is to be needed in the future is whether the page *was* accessed (over some predetermined interval), not the number of read() calls or the number of bytes read.

> … is a 4096-byte read from a page the same number of accesses as 4096 1-byte reads from the page, or as 1 1-byte read from the page?

In my opinion: both. A single 4096-byte read, 4096 separate 1-byte reads, and a single 1-byte read (all within a single sample interval) should all be weighted the same for determining whether to keep the page in RAM. Of course the final decision should be based on multiple sample intervals, not just one sample. A better metric might be how long the page has gone without any access vs. how many times the data has been faulted back into RAM after being discarded.

Multi-generational LRU: the next generation

iabervon — Wed, 26 May 2021 18:48:25 +0000

It's a little hard to compare numbers of fd-based accesses with numbers of direct accesses; you generally read a bunch of data with one syscall and then do multiple reads out of anonymous memory, but you don't generally bother to copy parts of a mmapped file into anonymous memory.

How you perform your accesses kind of has to matter in order to make sense, regardless: is a 4096-byte read from a page the same number of accesses as 4096 1-byte reads from the page, or as 1 1-byte read from the page?

Multi-generational LRU: the next generation

intelfx — Wed, 26 May 2021 12:36:37 +0000

I may be ignorant (and quite likely am), but isn’t thia thing basically ZFS’ ARC with extra steps?

Multi-generational LRU: the next generation

garloff — Wed, 26 May 2021 06:56:10 +0000

Doesn't x86 have an accessed bit in the PTE that the CPU sets on access?
Lazily scanning pages for this bit, counting and clearing it again would seem like a way to approximate non-fd page access.
Maybe the kernel already does this?

Multi-generational LRU: the next generation

yuzhao@google.com — Wed, 26 May 2021 04:01:30 +0000

There are lots of people who couldn't afford high memory laptops. I implore you not to take your quality of life for granted for the rest of the world.

Multi-generational LRU: the next generation

yuzhao@google.com — Wed, 26 May 2021 03:40:13 +0000

In practice, there is no way to track how many times a page has been accessed via page tables mapping it.

From: https://lwn.net/ml/linux-kernel/CAOUHufbz_f4EjtDsMkmEBbQp...

Remark 1: a refault, *a page fault* or a buffered read is exactly one
memory reference. A page table reference as how we count it, i.e., the
accessed bit is set, could be one or a thousand memory references. So
the accessed bit for a mapped page and PageReferenced() for an
unmapped page may carry different weights.

Multi-generational LRU: the next generation

MattBBaker — Tue, 25 May 2021 17:30:09 +0000

I would argue the opposite. With disk space as fast and cheap as it is now, why not have a 1:1 mapping between swap and RAM with /proc/sys/vm/swappiness set to 100? With a kernel that believes in over committing memory, being able to blow away the entire working set in an instant is a positive. My QoL has vastly improved on my linux workstation after I give it gobs of swap, since I no longer worry about it thrashing a tiny swap file as the OOM killer desperately looks for an alternative to firefox to knife.

Multi-generational LRU: the next generation

mwsealey — Tue, 25 May 2021 16:52:23 +0000

The kernel already does this on reclaim as a last resort, so...

Multi-generational LRU: the next generation

comicfans — Tue, 25 May 2021 14:41:23 +0000

>Windows XP and spinning rust used to grind to a halt on heavy memory use

maybe I'm lucky, I haven't hit XP halt even once with celeron1.7 + 845G + 256RAM + HDD . IIRC, I've run many apps which definitely exceeds 256MB, it got slower, but never stop responding.

>and Win10 puts a device out of action for hours upgrading and patching itself if you still have a rotating hard disk.

we all know windows update isn't good, but we're talking about high memory pressure, not a bad system update.

>I've disabled swap and bought large amounts of RAM so that apps can't get stroppy about memory pressure. Only recently have I configured zswap but it's not noticeably changed my experience.

it doesn't apply if laptop have none-replacable ram. and memory leak (maybe by accident) can eat as much ram as you have. of course better hardware can resolve many software problems, but should it stop kernel improve experience on old hardware ? while slower-bigger storage is still cheaper than faster-smaller storage, swap is always needed (that's why bcache/lvm-cache exist)

Multi-generational LRU: the next generation

k3ninho — Tue, 25 May 2021 12:05:17 +0000

>while using windows, I've never seen system hang due to high memory usage, but such problem can easily makes linux "hang": gui/terminal not respond to any key in reasonable time/ ssh timeout ,you can't kill bad app, anything you're doing just makes it slower. you may wait it for hours (or forever) to restore, or just hard reset. I've hit such problems multi times.
Windows XP and spinning rust used to grind to a halt on heavy memory use, and Win10 puts a device out of action for hours upgrading and patching itself if you still have a rotating hard disk.

That's more about the latency of storage access. What is the current advice about RAM and swap? I ask because, since having NAND-based SSD's with block overwriting concerns, I've disabled swap and bought large amounts of RAM so that apps can't get stroppy about memory pressure. Only recently have I configured zswap but it's not noticeably changed my experience.

I think that this is something that should involve running user-experience items at elevated nice levels and using the alt-sysrq keys to safely OOM-kill and then unmount filesystems if you can't recover the device.

Are we still advocating for swap in 2021?

K3n.

Multi-generational LRU: the next generation

epa — Tue, 25 May 2021 07:32:37 +0000

I guess the kernel could randomly sprinkle access restrictions over a selection of userspace pages, on architectures that support it. When a process tries to access that page, it gets a memory protection fault. The kernel updates a usage count for the page, removes the restriction, and lets the process continue. As long as only one in a million page requests gets faulted in this way, it might not noticeably affect performance. The question is whether it's possible, and whether it would give high quality information on what pages are being used.

Multi-generational LRU: the next generation

comicfans — Tue, 25 May 2021 03:50:26 +0000

If you're browsing lots of web page, you may easily hit swap storms problem(even on powerful laptop). this may be problem of (possible) memory leaked browser or bad oom killer, but kernel indeed needs improvement under such situation.

while using windows, I've never seen system hang due to high memory usage, but such problem can easily makes linux "hang": gui/terminal not respond to any key in reasonable time/ ssh timeout ,you can't kill bad app, anything you're doing just makes it slower. you may wait it for hours (or forever) to restore, or just hard reset. I've hit such problems multi times.

taken from patch mail:

...8G RAM + zswap + swap... lots of apps opened ... LSP/chats/browsers...
gets quickly to a point of SWAP-storms...
system lags heavily and is barely usable.
... migrated from 5.11.15 kernel to 5.12 + the LRU
patchset... Till now I had *not a
single SWAP-storm*, and mind you I got 3.4G in SWAP. I was never
getting to the point of 3G in SWAP before without a single
SWAP-storm.

Multi-generational LRU: the next generation

dxin — Tue, 25 May 2021 01:56:46 +0000

I like the technical aspect of this work, but doesn't it feel like users under tight memory constraint is steering the development? Yes, I know they are the majority now, e.g. Android phones and cloud providers, but what's the frequency of page out on our laptops? One page per week?

This is like the opposite of I/O scheduler situation, where users with very fast NVMEs drives it.

Multi-generational LRU: the next generation

NYKevin — Mon, 24 May 2021 18:58:03 +0000

The wording of the article suggests that this is not a policy decision so much as a "we measure what we can measure" decision. Nobody wants to fire an interrupt handler on every single (userspace) memory access. If it came in via a page fault, then that would imply that it was previously not in memory and did not have a frequency count in the first place. If it got refaulted, then it should be counted, and the article says that it is counted.

Frankly, I'm not sure I understand the change you are proposing.

Multi-generational LRU: the next generation

dancol — Mon, 24 May 2021 18:00:44 +0000

> Specifically, tiers are a way of sorting the pages in a generation by the frequency of accesses — but only accesses made by way of file descriptors.

[Twitch] Please, God, no! An access to a page should be an access to a page regardless of whether that access came from a file descriptor operation, from a page fault, from a kernel file server, or from any other source. The kernel should not be making policy decisions based on how an application chooses to spell its resource access!