User: Password:
|
|
Subscribe / Log in / New account

Compressed swap

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jonathan Corbet
March 26, 2014
2014 LSFMM Summit
There are a number of projects oriented around improving memory utilization through the compression of memory contents. Two of these, zswap and zram, have found their way into the mainline kernel; they both aim to replace swapping with compressed, in-memory storage of data. They differ in an important way, though. Zram acts like a special block device which can be mounted as a swap device; zswap, instead, uses the "frontswap" hooks to try to avoid swapping altogether.

Bob Liu led a session to talk about this technology with a specific focus on zswap and a performance problem he has encountered with it. Zswap stores "swapped" data by compressing it and placing the result in a special RAM zone maintained by the "zbud" memory allocator. When the zbud pool fills, zswap must respond by evicting pages from that area and pushing them out to a real swap device. That involves decompressing the data, then writing the resulting pages to the swap device. That can slow things down significantly.

Bob had a couple of options that he asked the group to consider. One of those was to turn zswap into a write-through cache; any pages stored in zswap would also be written to the swap device at the same time. That would allow the instant eviction of pages from zswap; since they already exist on the swap device, no further effort would be required. The cost, of course, would be in the form of increased swap device I/O and space usage.

The second option would be to make the zswap area dynamic in size. It is currently a fixed-size region of memory. If it were dynamic, it could grow in response to increased demand. Of course, there would be limits to that growth, after which it would still be necessary to evict pages from the zswap area.

Bob may have hoped for guidance with regard to which direction he should take, but he did not get it. Instead, Mel Gorman made the point that neither zram nor zswap has been well analyzed to quantify the benefits they provide to a running system. When people do run benchmarks, they tend to choose tests like SPECjbb which, he said, is not well suited to the job. Or they pick kernel compiles, which is even worse.

What the compressed swapping subsystems really need, he said, is better demonstration workloads. In fact, they need those workloads so badly that no changes to the behavior of these subsystems will be considered until those workloads have been provided. So the real next step for developers working with compressed swapping is not to worry about how the system responds to pool exhaustion — at least, not until a better way to quantify the performance impact of any changes has been found.

[Your editor would like to thank the Linux Foundation for supporting his travel to the Summit.]


(Log in to post comments)

Compressed swap

Posted Mar 27, 2014 12:31 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

> That involves decompressing the data, then writing the resulting pages to the swap device. That can slow things down significantly.

Hm, why cannot zswap write compressed pages on the disk avoiding the need for decompression?

> The cost, of course, would be in the form of increased swap device I/O and space usage.

This is unacceptable if one uses cheap flash device as it introduces too much wear. The current implementation prevents that nicely.

> What the compressed swapping subsystems really need, he said, is better demonstration workloads.

From personal experience zram works very nicely on a chromebook allowing to run several Eclipse instances with multiple tabs opened in the browser on 2 GB machine with no swap. So I wonder why nobody at Google bothers to provide those workloads...

Compressed swap

Posted Mar 27, 2014 18:07 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> Hm, why cannot zswap write compressed pages on the disk avoiding the need for decompression?

That was my thought as well. Instead of evicting (possibly multiple) compressed pages into uncompressed swap, why not just evict one of the pages holding compressed data instead? Besides avoiding decompression, you'd save swap space and I/O bandwidth, and avoid the issue of needing to write out several pages of previously compressed data to make room for a one new page.

Compressed swap

Posted Mar 28, 2014 20:05 UTC (Fri) by gfa (subscriber, #53331) [Link]

> Hm, why cannot zswap write compressed pages on the disk avoiding the need for decompression?

while it would be **awesome** it would need changes on the code which retrieves the pages from swap, it would make zswap non-optional (at least runtime)
or at least you wouldn't be able to mix zswap and regular swap on the same system
unless you mark the compressed pages as compressed.

a simple solution would be run 2 swap devices one for compressed and other for non-compressed pages

Compressed swap

Posted Mar 28, 2014 22:31 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

> it would need changes on the code which retrieves the pages from swap...
> you wouldn't be able to mix zswap and regular swap on the same system

Why is that? What I had in mind seemed fairly simple to integrate, though I'm not a Linux kernel developer and may be missing something. The idea was that when frontswap hands a page to zswap and there isn't enough room for it in zswap's RAM, zswap picks one of its less-recently-used compressed pages and hands it off to the normal swap code, storing whatever token it gets back to identify the page for later retrieval. That frees up the page to store the new compressed data. Later, if and when frontswap asks for one of the old pages' data back, zswap can request the compressed page back from the normal swap subsystem and handle the request as it normally would.

Compressed swap

Posted Apr 14, 2014 2:30 UTC (Mon) by kmeyer (subscriber, #50720) [Link]

> The idea was that when frontswap hands a page to zswap and there isn't enough room for it in zswap's RAM, zswap picks one of its less-recently-used compressed pages and hands it off to the normal swap code, storing whatever token it gets back to identify the page for later retrieval.

Usually (I think) the page (a pointer in virtual memory) is unmapped from the page table and swapped out. Then when that pointer is accessed, a hardware fault occurs because the VMA is unmapped, and the kernel fault handler then brings back the page from swap.

> That frees up the page to store the new compressed data. Later, if and when frontswap asks for one of the old pages' data back, zswap can request the compressed page back from the normal swap subsystem and handle the request as it normally would.

So, the way this would actually be implemented (I think) is allowing pages in zswap's backing store to be swapped, but only when requested by zswap's front-end.

Because you don't want to accidentally access those backing pages and bring them back in and trigger more swapping elsewhere, this would require some additional complexity in zswap to track which backing pages are actually available and which need to be paged in.

You might also have some terrible / recursive locking and code flow going on between the memory request -> no clean pages -> need to swap a page -> pager <-> zswap <-> pager again(!); I'm unfamiliar with the area but it sounds like a nightmare (that zswap likely tries very hard to avoid right now with zbud).

It may be possible, but it would probably be a pain in the ass to implement and would have room for latent bugs.

Compressed swap

Posted Apr 14, 2014 22:36 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> ... this would require some additional complexity in zswap to track which backing pages are actually available and which need to be paged in.

I agree that zswap would need to keep track of its pages. However, there shouldn't be any risk of accidentally triggering swapping. The compressed pages would only be read from swap at zswap's request.

> You might also have some terrible / recursive locking and code flow going on ...

I don't think so. The requests should be unidirectional, not recursive. The zswap RAM would be reserved and managed by zswap, not the pager. This assumes that zswap can access the swap interface directly the same way the pager does; if the pager and swap are tightly integrated then some refactoring would be in order.

Memory request -> no clean pages -> pager -> zswap -> write existing page directly to swap -> compress new page -> free source page -> resume thread.

Page fault -> pager -> zswap -> read compressed page back from swap into preallocated buffer -> decompress into target RAM -> resume thread.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds