LSFMM: In-kernel memory compression

By Jonathan Corbet
April 23, 2013

LSFMM Summit 2013

Compressed memory — the use of compression to compact memory contents in RAM as an alternative to swapping those contents to disk — has been the subject of extensive discussion on the kernel mailing lists. In particular, there has been a lot of debate over the relative merits of two approaches, known as zswap and zcache. Zswap developer Seth Jennings and zcache hacker Dan Magenheimer came to LSFMM 2013 in an attempt to reach an agreement on the way forward; an attempt that appears to have been successful.

Seth started by presenting compressed swap and the problems that it is trying to solve. Readers who are not familiar with this concept may want to read Seth's zswap article and this article by Dan for the relevant background information.

Andrew Morton noted that the whole thing makes his head spin. From there, the discussion quickly honed in on one of the more contentious sub-issues related to in-kernel memory compression: the need for a special-purpose memory allocator. There is a widespread sentiment among memory management developers that we have plenty of allocators already; adding another would increase maintenance costs significantly. In this case, though, there seems to be a legitimate need; in a sense, the "memory allocator" under discussion here is more of a bin-packing system that stores compressed pages of memory. The slab allocators are very good at packing fixed-size allocations, but compressed pages vary widely in size, so the existing allocators do not work for in-kernel compression.

Agreement on the need for a new allocator is not the same as agreement on which allocator should be used, though. Zswap uses "zsmalloc," which uses some clever schemes to pack compressed pages efficiently. That efficiency comes at a cost, though: freeing a page used by zsmalloc is difficult; it can require pushing an unknown number of pages out to the real swap device and an unknown amount of time. The "zbud" allocator used with zcache, instead, is inefficient: it can only pack two compressed pages into a physical page, but the cost of freeing that physical page is small and known ahead of time.

Hugh Dickins questioned the design of zbud, observing that it was strange to design that kind of limitation into the system from the start. But Mel Gorman came out strongly in favor of zbud despite its "awful packing properties." He sees the unpredictability at the heart of zsmalloc as a long-term source of bugs and strange memory management behavior. What he would really like, he said, would be a modular interface to the allocation layer so that different solutions could be used at different sites.

From there the developers worked toward a consensus on how to handle the conflict between these two projects. There was some concern raised about the complexity of zcache, though the extra features provided by that complexity (primarily the ability to store compressed page cache pages) were appreciated. One of the data structures used by zcache was described in the session as "a table of hash tables of red-black trees of radix trees." So it is not surprising that the relative simplicity of zswap looked appealing. As Mel put it, if zswap cannot be made to work well, zcache is hopeless, so perhaps the best course is to start with zswap, possibly with a modular allocator interface.

Hugh added that compression of page cache (file) pages may be appealing, but the filesystem developers do not seem to be that interested in zcache in general. So he agreed that it might make better sense to start with zswap, perhaps adding zcache features over time. Dan said that he would agree to merging zswap as long as there was an explicit understanding that zswap is not the end of development in this area; there is, he said, a lot more work to be done to gain the full benefits of in-kernel compression. In other words, he would plan to submit patches to increase the functionality of zswap over time.

There was further discussion on various details, including writeback (the process by which compressed pages are uncompressed and written to the "real" swap device). Zswap does it in the zsmalloc allocator, which is seen as being the wrong place; the separate thread used by zcache looked better to some developers and was suggested as being a good first feature to port over to zswap. Hugh complained that writeback decisions should be made at a higher level altogether, though.

Michel Lespinasse said that zswap, using the zsmalloc allocator, would work well in Google. They don't run with "real" swap at all, so the problematic writeback behavior associated with zsmalloc would not be experienced there. Zswap is not designed to run in this mode — it expects there to be a swap device to use as a backing store — but Mel suggested creating a fake swap device that would fail all requests as a possible solution there.

In the end, it appears that there is a consensus for merging zswap as the next step for in-kernel compression. Mel noted that he would block the merging, though, if it didn't have a modular allocation layer. Leaving modularization for later would not work, he said; that work would never be done and he'd have to deal with the bug reports a couple of years down the line. So the allocation layer in zswap will need some work; after that, we will likely see a submission for mainline merging.

Index entries for this article
Kernel	Transcendent memory
Kernel	zswap
Conference	Storage, Filesystem, and Memory-Management Summit/2013