|
|
Log in / Subscribe / Register

Compression on the backing device?

Compression on the backing device?

Posted Feb 20, 2026 1:25 UTC (Fri) by PeeWee (subscriber, #175777)
Parent article: Modernizing swapping: virtual swap spaces

The one problem I have with zswap is that it cannot push out compressed folios as-is. The explanation for this is covered briefly in this article: hides behind frontswap. I don't see virtual swap space changing anything about that, because once a folio gets pushed to the backing device the swap subsystem needs to be able to swap it in as if there were no zswap to begin with.

On a more general note, why has nobody ever thought of compressed swap on the actual backing device? Experience with transparent filesystem compression suggests performance gains for rather little CPU time, since the actual I/O is reduced by the compression ratio. LZ4 is an essentially free I/O booster; very fast at decent compression ratio. I also think that ZStandard fits that bill at even greater compression ratios, albeit at higher CPU cost, so why not invest some of the time spent I/O-waiting on making that wait shorter and gain effective capacity? In the case of zswap that work will already have been done, so it's just the writeback that improves without additional penalty. Arguably, decompression upon writeback is an unfair penalty for zswap; it's just that in real world workloads it doesn't figure because it happens very rarely and, with the proactive shrinker, at (more) opportune occasions. This could also eliminate LRU inversions when the zpool limit gets hit hard, because to get to the backing store a folio must go through compression first; while the LRU entry gets written back to make room for it in the zpool.

As it stands, zswap is a great improvement but, due to its implementation details (frontswap), it cannot go further. Couldn't this virtual swap space just point to zswap entries regardless if they are in RAM or on the backing device? If such an entry is encountered, have zswap decompress it, i.e. go through frontswap in reverse? This is much like when zswap gets disabled at runtime: new folios will just go directly to and from the backing device but the already compressed ones are kept and treated accordingly. In the long run, I think zswap could become the default that way. Don't want compression? Set some dummy compressor to get legacy behavior. Is all this just too much complexity/overhead, or do kernel devs have such approach on their wish list as well?


to post comments

Compression on the backing device?

Posted Feb 20, 2026 14:36 UTC (Fri) by intelfx (subscriber, #130118) [Link] (5 responses)

> The one problem I have with zswap is that it cannot push out compressed folios as-is. The explanation for this is covered briefly in this article: hides behind frontswap

Frontswap is a name I haven't heard in a while.

I think it's been quite long since the kernel rid itself of both cleancache and frontswap as general-purpose abstractions, with zswap absorbing the remnants of the latter and growing into a standalone mechanism.

Compression on the backing device?

Posted Feb 20, 2026 22:19 UTC (Fri) by PeeWee (subscriber, #175777) [Link] (4 responses)

Conceptually it is still the same approach, is it not? The point is that one can't write back compressed folios. That makes LRU inversions upon hitting the zpool limit even worse, because they need to be decompressed before writeback while being passed left, right and center by rejected new ones. I think that was the impetus for proactive shrinking.

Compression on the backing device?

Posted Feb 21, 2026 15:54 UTC (Sat) by intelfx (subscriber, #130118) [Link] (3 responses)

> Conceptually it is still the same approach, is it not? The point is <...>

It might be at the moment, but my point is that it is certainly not true that "due to its implementation details (frontswap), it cannot go further" as frontswap does not exist.

Compression on the backing device?

Posted Feb 24, 2026 23:15 UTC (Tue) by PeeWee (subscriber, #175777) [Link] (2 responses)

So, Zswap essentially assimilated frontswap, because it was the only user left. But, unless that design changes, i.e. no longer fooling the rest of the kernel that its zpool contents resides on actual swap space, which is why the allocation happens before even touching the pages/folios in question, it can indeed not go further. From what I understand, this virtual swap space just takes the foolery one step further by also pretending that the space actually exists.

Then why not let zswap assimilate the swap allocator, outright? Then it'd only need to allocate on writeback, i.e. just in time, as plain old swap does; no need for virtual swap. And when writebacks are few and far between it's really not an issue to have decompression in that path. For instance, I currently have ~1G in a ~300M zpool, which are mostly cold and got zswapped on some memory spikes induced by my liberal use of tmpfs (mounted with noswap) so something had to give. There was some to and fro, so I estimate that ~3G went through zswap and only ~5000 pages were reject_compress_fail - interestingly, I never see reject_compress_poor. And that's a cumulative counter that never decrements because zswap never sees those pages again - swap-in from the backing store is none of its concern; another quirk born from the "frontswap" design.

Also see the comment by Jonathan Corbet about the next iteration and the linked content, "ghost swap" in particular.

Compression on the backing device?

Posted Feb 25, 2026 2:05 UTC (Wed) by intelfx (subscriber, #130118) [Link] (1 responses)

> Then why not let zswap assimilate the swap allocator, outright?

No objection from me.

Like I said several times already, I am merely remarking that while the present limitations of zswap may have _occurred_ due to it being originally a client of frontswap, nothing says it "cannot go further" than that. You have just suggested one way it, indeed, *can*.

Compression on the backing device?

Posted Feb 25, 2026 8:17 UTC (Wed) by PeeWee (subscriber, #175777) [Link]

Got ya. You were right, all along. And thanks for bringing me up to speed on that front (pun intended). ;)

Compression on the backing device?

Posted Feb 22, 2026 2:48 UTC (Sun) by cesarb (subscriber, #6266) [Link] (2 responses)

> Experience with transparent filesystem compression suggests performance gains for rather little CPU time, since the actual I/O is reduced by the compression ratio.

I don't think it would be worth it, at least for normal 4K pages, since AFAIK the unit of I/O in modern systems is also 4K; that is, even if you could compress a page's contents to a single byte, you'd still have to write a full 4K.

(IIRC, the transparent filesystem compression in btrfs works in blocks of 128K)

Compression on the backing device?

Posted Feb 23, 2026 0:39 UTC (Mon) by PeeWee (subscriber, #175777) [Link]

I was thinking more towards mTHP which have a higher potential compression ratio. On x86_64 anonymous mTHP start at 16K (for completeness: shared memory mTHP start at 8K for some reason). Given that I see compression ratios >3 with plain 4K pages when using zstd, one compressed 16K mTHP might just fit inside a single 4K block, for example, but not more that two 4K blocks. That'd be a >100% increase in I/O throughput, on average.

Clustered pages for swap

Posted Feb 23, 2026 11:42 UTC (Mon) by farnz (subscriber, #17727) [Link]

The usual way round that is to not swap page-by-page, but to try and select groups of pages to swap together; this has other benefits (since you're paying one set of swap subsystem and I/O overheads per swap page group), but has to be balanced against paging out part of your working set.

This involves a certain amount of activity tracking - has the page been referenced recently? - but you can then do things like swapping in units of 16 pages (64 KiB) at a time most of the time, except when a range of pages contains a recently accessed page (where you switch to page-at-a-time to avoid problems). Complexity is traded off here; you spend more CPU time on which pages to swap, in return for doing less I/O while swapping and getting more clean pages each time you decide to swap at all.

Note that it's very rare for the system to only need one more clean page before it settles into a steady state, unless you've entered swap thrash anyway; in practice, you're going to be swapping out chunks of memory that aren't part of the working set if you're swapping at all.

There's also a related trick to consider; if you can mark pages that you've swapped out as "do not re-read from swap", you can write out 16 pages, mark 15 of them as clean, and remove one from RAM. Then, if the other 15 are read, you can avoid removing them from RAM, and if they're written, you can mark the swap page as "discard on decompression". This can be a win in some cases - e.g. if you turn 64 KiB of I/O into 32 KiB of I/O and have no more than 8 pages that you avoid removing from RAM, you win.

Writing back compressed pages

Posted Feb 24, 2026 14:21 UTC (Tue) by farnz (subscriber, #17727) [Link] (10 responses)

Reading this week's second half of the merge window article, I noticed this patch for zram to enable compressed page writeback.

That puts compressed pages on the actual backing device - pages first get "swapped" to zram, where they're compressed and placed in the zram memory pool. If the zram memory pool fills, the compressed pages in zram are written as-is to zram's writeback device (along with incompressible pages).

Writing back compressed pages

Posted Feb 24, 2026 22:35 UTC (Tue) by PeeWee (subscriber, #175777) [Link]

But Zram writeback is not really automatic [PDF] (page 7) and least of all LRU-based, because all the swap subsystem sees is yet another swap space and all zram sees is yet another block. A low-cost polling approach, i.e. long(-ish) sleep intervals, might be too late to the party when sudden memory spikes happen. Maybe hacking something that monitors PSI (pressure stall info), but then you are already reinventing kernel wheels.

Writing back compressed pages

Posted Feb 25, 2026 0:25 UTC (Wed) by PeeWee (subscriber, #175777) [Link] (8 responses)

One thing just popped up in my head. Why not "unify" zram and tmpfs? Just make tmpfs zswappable. My last encounters with tmpfs and incompressible data on it has led me to the conclusion that it "bypasses" frontswap; my guess is that it's simply unaware of it, since it has seniority, by some margin. Because the data was incompressible, I was expecting lots and lots of reject_compress_fail pages being counted in the zswap stats - this was before I discovered the noswap mount option had been added to tmpfs -, but I didn't see any. Swap space was filling up, alright, but zswap never saw those pages, apparently.

I don't really see the appeal of zram as a general-purpose block device. I believe someone wanted a compressed tmpfs. And then they realized, by running mkswap on it, that one can get "compressed RAM" on the cheap. While that is not unreasonable, using it that way, has some side effects. For instance, one should not have real swap space - at lower pri, of course -, because of LRU inversions when the zram-swap is full. OK, those inversions are on the swap subsystem; they happen without the need for zram in the mix. For instance, I have 4G swap at pri=2 on an NVMe-SSD and an additional 16G at pri=1 on an SATA-SSD for when I really push things. Obviously, when the 4G are full, the much fresher pages go to the slower 16G space. It's all hypothetical now, mostly because I've enabled zswap, now. This is only meant to show, that LRU inversions are not an inherent swap-on-zram problem; they have been long before zram even existed.

I think, for the purposes of zram, plain tmpfs with a path through zswap would suffice, unless I am missing some real use cases that require the block layer. Think of it, folios read from zram need to be decompressed, so the reader can use them. That means that some pages exist in compressed and uncompressed form at the same time, wasting memory; or maybe they replace one another? Doesn't really matter. My point is that tmpfs was almost there but from the other direction; pages live uncompressed in the page cache until they get swapped out. Just make that swapping go through zswap and zram is obsolete. The added bonus being that hot pages, regardless if they are anonymous or tmpfs or whatever, naturally stay in the "hot" uncompressed region of RAM, and the rest get (z)swapped out, as per usual. Whereas with swap on zram, the pages are already considered cold. And other uses of zram cannot easily make use of swap, because that might just be the same device they are coming from; kernel say: reclaim pages from zram0, swap subsystem writes pages to swap0 which resides on zram0! And just like that, you ruptured the space-time-continuum, by letting Marty McFly prevent his father from getting together with his mother, or some such. ;)

And, of course, that zswap quirk, to require pre-allocating space, (almost) never to be touched, needs to be fixed. But, in my head at least, it could all be way simpler that way. I think the authors of these patches may suffer from tunnel vision, so I am just throwing this out there as food for thought.

Writing back compressed pages

Posted Feb 25, 2026 1:58 UTC (Wed) by intelfx (subscriber, #130118) [Link] (7 responses)

> Just make tmpfs zswappable.

Tmpfs certainly is {,z}swappable. Perhaps your system was misconfigured.

> I don't really see the appeal of zram as a general-purpose block device. I believe someone wanted a compressed tmpfs. And then they realized, by running mkswap on it, that one can get "compressed RAM" on the cheap. While that is not unreasonable, using it that way, has some side effects.

Yes. Trying to pass zram as a swap device, IMO, is a pretty blatant abuse of mm that only flies because practical deployments rarely get to exercise the interesting corner cases (in addition to the priority inversion concerns that you mention). For one, the swap subsystem is not designed around fallibility of swap devices. What would happen, for instance, if you configure a zram device with a limit on physical RAM utilization which is subsequently hit before the declared logical capacity is used up (for example, due to the data being incompressible)? Your guess is as good as mine.

Writing back compressed pages

Posted Feb 25, 2026 4:12 UTC (Wed) by PeeWee (subscriber, #175777) [Link] (6 responses)

> Just make tmpfs zswappable.

Tmpfs certainly is {,z}swappable. Perhaps your system was misconfigured.
Are you absolutely positive? I do know that it's swappable - says so on the box -, that's why I had been using it. Not so sure about the zswappable part, though. As I said, i was expecting lots of rejected pages on occasions when I put lots of incompressible data on tmpfs, but they did not show in the respective zswap stat counters, all the while swap was filling, with said zswap counters barely changing.

Maybe it's a more recent change? I haven't followed any changes since I discovered the noswap mount option. Since my heavy tmpfs usage involves incompressible data exclusively, I now prefer capping the tmpfs size somewhere close to but below the physical RAM - was double that before - and have other pages zswapped in their place, when the need arises. Otherwise, it'd be reject_compress_fail galore for tmpfs pages, anyway, and I'd like to save as much I/O as possible from happening. I'll tolerate the odd rejected page, but not on the gigabyte order of magnitude, on top of the outcome being clear before zswap even tried. My use case is only slightly worse off for the limitation of available space.

Writing back compressed pages

Posted Feb 25, 2026 5:25 UTC (Wed) by intelfx (subscriber, #130118) [Link] (5 responses)

> Are you absolutely positive?

Entirely.

> Maybe it's a more recent change?

Depends on how recent we are talking. Linux 5.10 old enough?

Writing back compressed pages

Posted Feb 25, 2026 6:05 UTC (Wed) by PeeWee (subscriber, #175777) [Link] (4 responses)

> Are you absolutely positive?

Entirely.
Found it:
		error = swap_writeout(folio, plug);
		if (error != AOP_WRITEPAGE_ACTIVATE) {
			/* folio has been unlocked */
			return error;
		}


		/*
		 * The intention here is to avoid holding on to the swap when
		 * zswap was unable to compress and unable to writeback; but
		 * it will be appropriate if other reactivate cases are added.
		 */
> Maybe it's a more recent change?

Depends on how recent we are talking. Linux 5.10 old enough?
That may very well be the case. I am running Ubuntu LTS which is not exactly bleeding edge.

Thanks for finally giving me a definitive answer to that question!

Writing back compressed pages

Posted Feb 25, 2026 9:31 UTC (Wed) by intelfx (subscriber, #130118) [Link] (3 responses)

It wasn't really "definitive", more like an upper bound that I could give relatively quickly... Tmpfs is zswappable at least up to Linux 4.9 (stretch); this was a good excuse to play around with a clustered Incus and imagebuilder but I won't bother checking further.

> I am running Ubuntu LTS which is not exactly bleeding edge.

So I'd say unless your Ubuntu LTS is like 14.04, you've just got it misconfigured somehow.

Writing back compressed pages

Posted Feb 25, 2026 9:35 UTC (Wed) by intelfx (subscriber, #130118) [Link] (1 responses)

> So I'd say unless your Ubuntu LTS is like 14.04, you've just got it misconfigured somehow.

(That said, given that you say your workloads are incompressible, it's all purely academic anyway. Oh well, still an excuse to build some images.)

Writing back compressed pages

Posted Feb 25, 2026 11:44 UTC (Wed) by PeeWee (subscriber, #175777) [Link]

I've just realized that my memory might also be clouded by my tinkering with memory.zswap.writeback=0. But I am pretty certain I saw that behavior before that cgroup knob even existed. And there seem to have been problems with accounting & cgroup control. I don't know how relevant they would have been for this case, but there is a set of commits under that umbrella linked to from the kernelnewbies page for the v5.19 release. I found those by accident while researching when writeback disabling was introduced, to narrow the time frame.

Anyway, I am starting to feel like I am abusing this thread/forum, so I'll leave it at that. I am very grateful for all your input and effort! Now I can explore some more use cases I had ruled out before. And if, against expectations, I do see the erroneous behavior, I'll know for sure that it must be an error of some kind.

Writing back compressed pages

Posted Feb 25, 2026 11:50 UTC (Wed) by PeeWee (subscriber, #175777) [Link]

It wasn't really "definitive"
Oh, now I get it. The "definitive" was meant in reference to the question if tmpfs somehow bypasses zswap. It does not matter too much, when it was made zswappable.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds