|
|
Log in / Subscribe / Register

Down: Debunking zswap and zram myths

Chris Down has posted a detailed look at how the kernel's zswap and zram subsystems work — and how they differ.

Most people think of zswap and zram simply as two different flavours of the same thing: compressed swap. At a surface level, that's correct – both compress pages that would otherwise end up on disk – but they make fundamentally different bets about how the kernel should handle memory pressure, and picking the wrong one for your situation can actively make things worse than having no swap at all


to post comments

I've never seen such a comprehensive and detailed explanation of zram and zswap

Posted Mar 25, 2026 20:13 UTC (Wed) by Fantu (guest, #162182) [Link] (4 responses)

I’ve read quite a bit about zram and zswap in the past, but nothing like this, complete and detailed.
Even though it's long, it's worth reading. Many thanks to Chris Down for the excellent work.

I've never seen such a comprehensive and detailed explanation of zram and zswap

Posted Mar 26, 2026 0:21 UTC (Thu) by lordsutch (guest, #53) [Link] (3 responses)

Agreed! I appreciate the detailed explanations but also the bottom line that you probably just want to let zswap do its thing and stop using zram for swap unless you're on an embedded system.

I've never seen such a comprehensive and detailed explanation of zram and zswap

Posted Mar 26, 2026 12:54 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

An alternative, which I think he didn't explicitly address, is to create an ordinary uncompressed ramdisk and swap to that using zswap.

I've never seen such a comprehensive and detailed explanation of zram and zswap

Posted Mar 26, 2026 14:46 UTC (Thu) by patrakov (subscriber, #97174) [Link] (1 responses)

Let me address this: it doesn't work in a useful way. The maximum amount of data that the kernel can accept into swap is the total size of all swap devices. In your case, it would be the size of your ramdisk. So the ramdisk will sit in memory, occupying some part of it (let's say, 2 GB), and this memory will not be available for any other use. Yet, in this situation, despite the compression, you cannot free more than 2 GB of RAM, as zswap only intercepts swap writes and replaces them with compressed-memory writes. 2 GB lost on the ramdisk, 2 GB saved by swapping, 0.66 GB spent on storing the compressed pages (assuming 3x compression), net 0.66 GB loss.

I've never seen such a comprehensive and detailed explanation of zram and zswap

Posted Mar 26, 2026 15:35 UTC (Thu) by epa (subscriber, #39769) [Link]

Thanks for the explanation. That's why zram exists, then: it can promise a bigger block device based on some estimate of compression ratio. (Then there must be some hard stop when the zram device is full earlier than expected.)

Finally an HTML version

Posted Mar 26, 2026 10:29 UTC (Thu) by PeeWee (subscriber, #175777) [Link]

So he must have come around to making that PDF (defunct), to which I was linking in past comments, into a proper HTML document. Or he found the original from which the PDF was made, which was just a print-out of some HTML output that cut off some of the longer code comments.

That's great news! Thanks for sharing it!

Use cases may vary

Posted Mar 26, 2026 23:45 UTC (Thu) by gutschke (subscriber, #27910) [Link] (1 responses)

My main use case for compressed swapping is on embedded devices, where I don't want any disk writes if at all possible. These devices usually have an SD card and in principle can write to it. But they will only do so for data that rarely changes and that incurs few writes (e.g. user-provided settings that get configured once, but that do need to persist). Everything else is optimized for running out of RAM.

The root filesystem is an overlayfs on top of a tmpfs, and any service that produces a lot of write operations by default has been configured so that it no longer does this. This generally works great even with relatively small embedded devices, results in fast reboot times, and makes for very reliable devices that can handle random power cycles without requiring human intervention to check on the device's health. If you never write to SD card, it is much less likely that the data or the SD card's internal data structures would get corrupted.

I typically add zram support, not because I expect that I will need it (ideally, the amount of RAM has been provisioned to handle all expected use cases), but because it can give me a little more head room when things don't go as I expect them. Maybe, there was a network hiccup and some rarely-tested code path ended up blowing up memory usage? Maybe the user did something really unexpected and our working set is much bigger than what we have ever seen before. Hard to predict what could go wrong, but that's the nature of unexpected behavior. In a best case scenario, zram handles the peak load and everything soon returns back to normal. In the worst case, the device crashes and either a watchdog reboots it, or a user power cycles it. That's no worse than would otherwise have happened without the use of zram. So, despite its well-documented shortcomings, zram seems to be the right tool for the job.

I wouldn't mind switching to zswap, but the need for an actual backing store defeats the purpose that I have in mind. All I need is a kernel feature that starts compressing dirty RAM in order to eek out a few more usable pages every once in a blue moon.

Use cases may vary

Posted Mar 27, 2026 6:41 UTC (Fri) by PeeWee (subscriber, #175777) [Link]

So, despite its well-documented shortcomings, zram seems to be the right tool for the job.
Yes, indeed your use case fits the bill of swap on zram perfectly, just as that article says.
I wouldn't mind switching to zswap, but the need for an actual backing store defeats the purpose that I have in mind. All I need is a kernel feature that starts compressing dirty RAM in order to eek out a few more usable pages every once in a blue moon.
With recent work on virtual swap space, the need for real swap, to make zswap work, will be a thing of the past. But I think that doesn't matter much for cases like yours. Maybe there can be some, possibly minor, reduction of overhead because there won't be a need for the zram block device. But there will also be virtually zero hassle involved with setting up compressed RAM, because with zswap it's just a matter of flicking a runtime switch or setting zswap.enabled=1 on the kernel command line; no more zram device creation and mkswap. So maybe you don't need to migrate existing deployments to zswap, but you can simply use zswap with new ones, once those patches will have been merged that is, of which I am almost certain that it's going to happen - the removal of the RFC tag with v4 is a good sign towards that end.

No zswap in Debian cloud kernel

Posted Mar 27, 2026 9:47 UTC (Fri) by cypherpunks2 (guest, #152408) [Link] (2 responses)

Frustratingly, CONFIG_ZSWAP is unset in the Debian cloud kernel, so I've had to use zram with writeback to emulate zswap (poorly) across a number of resource-constrained VMs.

No zswap in Debian cloud kernel

Posted Mar 28, 2026 1:06 UTC (Sat) by gioele (subscriber, #61675) [Link] (1 responses)

Request to enable CONFIG_ZSWAP in Debian's cloud kernel, for those interested in it: <https://bugs.debian.org/1132098>.

No zswap in Debian cloud kernel

Posted Apr 23, 2026 22:22 UTC (Thu) by gioele (subscriber, #61675) [Link]

Update: Debian will enable CONFIG_ZSWAP in cloud images starting with the packages for Linux version 7.0.

Fedora installer that enables zswap by default

Posted Mar 27, 2026 11:14 UTC (Fri) by kaputtix (guest, #182947) [Link] (3 responses)

Many thanks, this was very illuminating. I've updated our alternative fedora installer, which now creates an on-drive swap partition, disables system-oomd and enables zswap by default. (It doesn't have a storage module for btrfs yet, so I hope someone can contribute with more storage options.)

Fedora installer that enables zswap by default

Posted Mar 27, 2026 20:57 UTC (Fri) by PeeWee (subscriber, #175777) [Link] (1 responses)

I've updated our alternative fedora installer, which now creates an on-drive swap partition, disables system-oomd and enables zswap by default.
(emphasis added)

But why disable systemd-oomd? There is no conflict with zswap. It just enables actual writeback to the now existent swap space and thus relieves memory pressure even more, so systemd-oomd will be all the happier for it. I think you should reconsider, because the kernel's OOM killer is often very late to the party, way past the patience threshold of desktop users. Having read some more of Down's musings, or just between the lines in the article above, and watched some of his presentations - in one of those he mentioned machines thrashing for >30min(!) before the kernel OOM killer stepped in -, I am pretty certain that Meta (his employer) is using some variation of a userspace OOM daemon. It's better to have and not need than need and not have, as that saying goes.

I don't know the project and am also far removed from Fedora, being a Debian/Ubuntu user myself, but since the official Fedora installer configures swap on zRAM (IIRC), I just want to make sure that's not still enabled with your changed installer; just double checking, since you didn't explicitly say anything about that. ;)

Fedora installer that enables zswap by default

Posted Mar 29, 2026 16:42 UTC (Sun) by kaputtix (guest, #182947) [Link]

Thank you, good point, didn't think much about this. I will consider enabling systemd-oomd again, or maybe making this configurable. By the way, the project has been moved to codeberg.

Fedora installer that enables zswap by default

Posted Mar 27, 2026 22:50 UTC (Fri) by PeeWee (subscriber, #175777) [Link]

P.S.: I've just found this while following some links in Down's article:
Over the time we (kernel MM community) have implicitly decided to keep the kernel oom-killer very conservative as adding more heuristics in the reclaim/oom path makes the kernel more unreliable and punt the aggressiveness of oom-killing to the userspace as a policy. All major Linux deployments have started using userspace oom-killers like systemd-oomd, Android's LMKD, fb-oomd or some internal alternatives. That provides more flexibility to define the aggressiveness of oom-killing based on your business needs.

Though userspace oom-killers are prone to reliability issues (oom-killer getting stuck in reclaim or not getting enough CPU), so we (Roman) are working on adding support for BPF based oom-killer where wen think we can do oom policies more reliably.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds