Toward a swap abstraction layer

Posted May 24, 2023 12:30 UTC (Wed) by xecycle (subscriber, #140261)
In reply to: Toward a swap abstraction layer by farnz
Parent article: Toward a swap abstraction layer

> by removing swap as an option, you force the kernel to evict code

This can be right or wrong, depending on other factors. E.g. on single-purpose systems I can go extreme: memlock all code sections; I can do this together with cgroup-limiting all userspace to use some less memory than total physical memory. Removing swap basically means to lock all anon pages; I can still manually lock file-backed pages.

This, IMO, can be a valid protection scheme, and it can work without swap.

Btw "in order to avoid OOM" is what the kernel is programmed to do; which makes sense, as a general-purpose kernel it must try to allow programs to proceed, in the absence of configured limits. But this may be not the top-priority of an end user. In the desktop use case (where the word "responsiveness" makes sense), I would prefer something that can quickly identify a misbehaving program and kill it, ideally without other programs' code dropped or anon swapped out. Let's imagine this protection scheme:
- I pick some apps, declare them as important, run them in a cgroup with memory.min protections, all code sections locked;
- for all other apps run those in another cgroup that is limited to use total RAM - some margin for the kernel - dedicated amount given to above.
This way, with or without swap, with or without an early OOM daemon, those important apps should get equal protection after all. "Unimportant" apps can benefit from swap, yes, but I can simply declare them "unimportant".

Of course I don't think there are many people doing this on their desktops, but I think similar practice is already used elsewhere, e.g. pod QoS in k8s.

Again, I'm arguing that users actually want a good protection scheme, and a good scheme for some use case may work well without swap at all. After all, what to do with diskless systems?

Toward a swap abstraction layer

Posted May 24, 2023 13:56 UTC (Wed) by farnz (subscriber, #17727) [Link]

Sure, in special-purpose systems you can memlock all important code so that it can't be evicted from the page cache, and you can limit userspace so that it can't overcommit. This works fine with or without swap.

But the underlying problem we have is that fork either requires a huge amount of wasted swap/memory, or overcommit. If you allow overcommit, then you have to have some way of handling what happens when you get it wrong, and run out of memory.

Your scheme isn't something that's commonly used, precisely because of the overcommit issue - you've just declared that important apps can't overcommit, and thus can't reliably use fork. What is common is setting up OOM controls so that if there's a shortage of memory, unimportant apps get killed first, and unimportant apps get paged out in preference to important apps.

But on a desktop, pretty much everything you're running is an important application - you don't bother running something if you don't want it to work. So you need a system that works well for a case where nothing is unimportant - and that's the case where you have a small amount of swap, and an early OOM daemon that kills programs if they start entering swap thrash.

And it's not useful to keep all code sections in memory - my desktop is currently perfectly responsive, and yet significant chunks of running applications are not paged in - start up code, error handling code for errors I'm never going to hit, POP3 code in my e-mail client, X11 support in my GUI libraries for Wayland-native programs, Wayland support in GUI libraries for programs that still use X11 and more. None of that needs to be loaded into memory, since it's never going to run; and by locking it into memory, you stop my applications from using that memory for something more useful to me.

The goal the kernel has is to page out only the parts of file-backed data that won't be accessed again soon, and to page out anonymous data that won't be accessed again soon. It needs a bit of guidance (the swappiness parameter) to help it out in deciding between the two, but it won't page something out unless it's rarely accessed - and by definition, that's the stuff that you don't need for responsiveness, since if you did, it'd be frequently accessed.