LWN: Comments on "The PuzzleFS container filesystem" https://lwn.net/Articles/945320/ This is a special feed containing comments posted to the individual LWN article titled "The PuzzleFS container filesystem". en-us Sat, 27 Sep 2025 01:58:10 +0000 Sat, 27 Sep 2025 01:58:10 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net The PuzzleFS container filesystem https://lwn.net/Articles/945731/ https://lwn.net/Articles/945731/ calumapplepie <div class="FormattedComment"> <span class="QuotedText">&gt;&gt; You also can't share page cache use between different puzzlefs mounts even if they use shared base files. </span><br> <span class="QuotedText">&gt; This is a great point. I guess it could be worked around with some core mm+fs fu, but it would definitely not be simple.</span><br> <p> Could KSM be extended to work on file-backed pages?<br> </div> Wed, 27 Sep 2023 19:01:57 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945648/ https://lwn.net/Articles/945648/ rcampos <div class="FormattedComment"> The minute FUSE is involved, you lost any kind of performance game.<br> <p> There were some patches to FUSE to allow a passthrough in some cases, that is what I wish is used to do this in a reasonable way.<br> </div> Tue, 26 Sep 2023 20:28:33 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945636/ https://lwn.net/Articles/945636/ bluca <div class="FormattedComment"> Ah, that's very nice, I didn't know that, thanks!<br> </div> Tue, 26 Sep 2023 16:57:13 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945629/ https://lwn.net/Articles/945629/ walters <div class="FormattedComment"> In theory too, one could have a hybrid system that operates I think a lot like git does with pack files (deduplicated files w/deltas) that can be dynamically materialized the first time they're referenced. Or for files that you *know* will be shared or are "hot", materialize them automatically. This would require FUSE userspace today though. It's an interesting topic because for the in-kernel puzzlefs implementation you'd still probably (?) want a userspace system in control of optimizing things.<br> <p> <p> <p> </div> Tue, 26 Sep 2023 16:11:07 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945633/ https://lwn.net/Articles/945633/ hsiangkao <div class="FormattedComment"> <span class="QuotedText">&gt; I have an erofs image with a rootfs that contains /usr/foo/a, and other two extension erofs images that contain usr/foo/b and usr/foo/c respectively. I create two overlays, each with the base, and one of the extension, so that one has usr/foo/a+usr/foo/b and the other usr/foo/a+usr/foo/c. Is memory being deduplicated, given usr/foo/a is the same?</span><br> <p> In that case, memory of /usr/foo/a is deduplicated according to how overlayfs works since /usr/foo/a is on the same EROFS instance.<br> That already works without any extra built-in feature.<br> </div> Tue, 26 Sep 2023 16:08:42 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945631/ https://lwn.net/Articles/945631/ bluca <div class="FormattedComment"> How would that work in practice? Say I have an erofs image with a rootfs that contains /usr/foo/a, and other two extension erofs images that contain usr/foo/b and usr/foo/c respectively. I create two overlays, each with the base, and one of the extension, so that one has usr/foo/a+usr/foo/b and the other usr/foo/a+usr/foo/c. Is memory being deduplicated, given usr/foo/a is the same?<br> </div> Tue, 26 Sep 2023 15:57:03 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945627/ https://lwn.net/Articles/945627/ hsiangkao <div class="FormattedComment"> Yes, I know that is not composefs intended use cases.<br> I'm not sure if some people really rely on layering concept (such as system using raw partitions/devices without real filesystem storage), anyway.<br> </div> Tue, 26 Sep 2023 15:09:06 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945625/ https://lwn.net/Articles/945625/ alexl <div class="FormattedComment"> Yes, this is doable, but its kinda a weird way to store the blobs. The main goal of a system like this is to share the blobs between different images, and having the blob store be per-image is contrary to this goal.<br> </div> Tue, 26 Sep 2023 15:02:05 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945620/ https://lwn.net/Articles/945620/ hsiangkao <div class="FormattedComment"> <span class="QuotedText">&gt; you would have to use fs-verity to verity the blob storage layer</span><br> <p> I think if some blob storage layer is just an EROFS image, you could directly apply dm-verity on these layers.<br> And check dm-verity root digests of these layers before mounting. I think that would be in the same effect, anyway.<br> <p> We can also use fs-verity to verity the blob storage layers if these layers are on a RW fs (the current composefs does.)<br> </div> Tue, 26 Sep 2023 14:52:45 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945619/ https://lwn.net/Articles/945619/ tych0 <div class="FormattedComment"> <span class="QuotedText">&gt; You also can't share page cache use between different puzzlefs mounts even if they use shared base files. </span><br> <p> This is a great point. I guess it could be worked around with some core mm+fs fu, but it would definitely not be simple.<br> <p> <span class="QuotedText">&gt; I guess an optimal system would use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.</span><br> <p> It was an explicit design goal of PuzzleFS not to have any translation step between the image that is pushed to the container registry and the one that's mounted+run on the host, so any translation step here would be a non-starter, which seems like you need to pick "one or the other" unfortunately.<br> <p> <p> </div> Tue, 26 Sep 2023 14:47:13 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945617/ https://lwn.net/Articles/945617/ hsiangkao <div class="FormattedComment"> Not quite sure if I got the point. With the current upstream overlayfs, you could actually make <br> - EROFS + dm-verity block device blobs as data only layers;<br> - a small overlayfs meta layer (with EROFS + dm-verity) to merge these data storage blobs into a merged rootfs.<br> Thus all layers are under dm-verity protection, so the whole image won't be tampered.<br> <p> Alternatively, as an EROFS self-containerd approach, EROFS could share page cache if files with same data across images without relying on overlayfs, anyway.<br> </div> Tue, 26 Sep 2023 14:35:50 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945618/ https://lwn.net/Articles/945618/ alexl <div class="FormattedComment"> You can use dm-verity to protect a composefs-style erofs image, but you would have to use fs-verity to verity the blob storage layer.<br> </div> Tue, 26 Sep 2023 14:30:07 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945613/ https://lwn.net/Articles/945613/ bluca <div class="FormattedComment"> Congrats on the release!<br> </div> Tue, 26 Sep 2023 14:17:57 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945612/ https://lwn.net/Articles/945612/ bluca <div class="FormattedComment"> Yes I mean directly in EROFS, that would be great to have and we'd use it immediately - unless I'm missing something, you can't really do this together with dm-verity in the overlayfs model, as the data storage is not actually in dm-verity, but it's in the blob storage layer?<br> </div> Tue, 26 Sep 2023 14:17:00 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945606/ https://lwn.net/Articles/945606/ alexl <div class="FormattedComment"> True, if you don't need the kernel-side to support it you can use arbitrary complex delta approaches to optimize the download.<br> <p> Also, you can use filesystem level compression (in e.g. btrfs) to compress the files in the backing dir. Or even use reflinks copy create backing files that share some (but not all) blocks.<br> <p> The sky is the limit, and all these approaches are compatible with using composefs to mount them.<br> </div> Tue, 26 Sep 2023 13:52:29 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945603/ https://lwn.net/Articles/945603/ hsiangkao <div class="FormattedComment"> <span class="QuotedText">&gt; Are there any plans to bring file-based runtime dedup to EROFS? Being able to combine that with dm-verity would be awesome</span><br> <p> Hi! Using EROFS + dm-verity + blockdevice + overlayfs (composefs-like model), you could already implement a clean file-based runtime dedup by using overlayfs.<br> <p> I understand that you mean EROFS self-contained file-based runtime dedupe across images as you mentioned in some previous article. Actually we already have an internal version for internal products by Jingbo, but it's still unclean for now. We will try to clean up and post it to fsdevel mailing list later, but not quite sure it could land smoothly, anyway.<br> </div> Tue, 26 Sep 2023 13:40:58 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945604/ https://lwn.net/Articles/945604/ alexl <div class="FormattedComment"> In fact, we just released 1.0 as all the required pieces are merged into the upstream kernel:<br> <a href="https://blogs.gnome.org/alexl/2023/09/26/announcing-composefs-1-0/">https://blogs.gnome.org/alexl/2023/09/26/announcing-compo...</a><br> </div> Tue, 26 Sep 2023 13:38:19 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945602/ https://lwn.net/Articles/945602/ walters <div class="FormattedComment"> <span class="QuotedText">&gt; Composefs has run into difficulties getting into the mainline kernel</span><br> <p> Not at all - the decision was that the *ingredients* for composefs (mainly overlayfs extensions) would be in the kernel, and "composefs" would be a userspace concept wiring those ingredients together - and this has all happened!<br> <p> Please see the README.md for <a href="https://github.com/containers/composefs">https://github.com/containers/composefs</a><br> </div> Tue, 26 Sep 2023 13:04:24 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945601/ https://lwn.net/Articles/945601/ bluca <div class="FormattedComment"> Are there any plans to bring file-based runtime dedup to EROFS? Being able to combine that with dm-verity would be awesome<br> </div> Tue, 26 Sep 2023 12:45:10 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945600/ https://lwn.net/Articles/945600/ smcv <div class="FormattedComment"> There is if you download 10 slightly different versions of it - they'll all have the exact same binary for anything that didn't get a security update during the time window covered by those versions (for instance if glibc didn't get updated for a while, they'll all have identical glibc).<br> </div> Tue, 26 Sep 2023 12:45:05 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945599/ https://lwn.net/Articles/945599/ jezuch <div class="FormattedComment"> My goodness, is there so much duplication in the Ubuntu image?<br> </div> Tue, 26 Sep 2023 12:28:44 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945591/ https://lwn.net/Articles/945591/ bluca <div class="FormattedComment"> No dm-verity, no party!<br> </div> Tue, 26 Sep 2023 10:02:12 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945535/ https://lwn.net/Articles/945535/ roc <div class="FormattedComment"> Capnproto would be an interesting thing to have in the kernel. We use it in rr to specify the trace format and manage compatibility as we add things to the format over time. It's worked really well for us.<br> </div> Tue, 26 Sep 2023 08:39:21 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945534/ https://lwn.net/Articles/945534/ hsiangkao <div class="FormattedComment"> <span class="QuotedText">&gt; use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.</span><br> <p> Yes, yet in that way, there are more effective backup/restore technologies such as delta compression (frequently discussed in several academic conferences such as ATC) to reduce I/Os than just do content-defined chunking. And not necessary in a real filesystem form to keep such delta compression archival format.<br> </div> Tue, 26 Sep 2023 08:18:19 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945532/ https://lwn.net/Articles/945532/ alexl <div class="FormattedComment"> In addition to all the competing implementations of this I feel like this article misses one of the main points of composefs. <br> <p> PuzzleFS (as well as e.g. nydus, etc) does block-chunking de-duplication, whereas composefs does entire-file de-duplication. Yes, the block-chunking approach is more efficient during download (as you may reuse blocks when files are only partially the same), and will use less disk-space. <br> <p> However the entire-file dedup approach is more efficient at runtime.<br> <p> This is because the page cache, and things like vm address spaces, are tied to a single inode. You can't get a single inode out of the multiple chunk files. This means the chunked approach doubles the page cache use. Once to cache the chunk, and once to cache the merged file. You also can't share page cache use between different puzzlefs mounts even if they use shared base files. With composefs, the page caching *only* happens on the base file, and is shared between any composefs mounts using the same backing file. This allows better container density, as e.g. all the glibc mmaps shared between any running containers are the same in the page cache.<br> <p> I guess an optimal system would use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.<br> </div> Tue, 26 Sep 2023 08:05:13 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945530/ https://lwn.net/Articles/945530/ hsiangkao <div class="FormattedComment"> CDC isn't a new idea and typically used for backup archival programs (such as casync[1]). For kernel filesystems, I guess they are designed more for performance (otherwise FUSE approaches are enough). If data isn't encoded (e.g. compression) and metadata overhead of CDC approaches is sensitive to them (as mentioned in the FastCDC paper [2]), I think they will just use fixed-size block deduplication because of I/O friendly and no extra copy (out of raw block I/Os).<br> <p> Side note is that EROFS already supports a varient CDC with compression since Linux v6.1. It would be nice to make a comparsion too in some extent so I could know what people care about more for our further enhancements. I know people enjoy new stuffs all the time but it would be nice to get more hints why state-of-art projects don't work well except for Rust adaption.<br> <p> Similar to Christian said before [3], AFAIK, there were already four container image approaches raised for Linux kernel in addition to Composefs:<br> 1. original Nydus [4], also called Dragonfly Image service in OCIv2 brainstorm [5] --- Actually they already had an in-kernel implementation of their RAFS v5 before I joined Alibaba Cloud, if needed, they could post the original kernel implementation at that time to the mailing list;<br> 2. TarFS [6];<br> 3. Overlaybd [7] --- Another QCOW2-like block driver approach from another team in Alibaba Cloud;<br> 4. Puzzlefs.<br> I'm not sure if Linux could accept all of them for the same use case, anyway.<br> <p> [1] <a href="https://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html">https://0pointer.net/blog/casync-a-tool-for-distributing-...</a><br> [2] <a href="https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf">https://www.usenix.org/system/files/conference/atc16/atc1...</a><br> [3] <a href="https://lore.kernel.org/r/20230609-nachrangig-handwagen-375405d3b9f1@brauner">https://lore.kernel.org/r/20230609-nachrangig-handwagen-3...</a><br> [4] <a href="https://github.com/opencontainers/.github/blob/master/meeting-notes/oci-weekly-notes-2020-apr-2021-mar.md">https://github.com/opencontainers/.github/blob/master/mee...</a><br> <a href="https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md">https://github.com/dragonflyoss/image-service/blob/master...</a><br> [5] <a href="https://hackmd.io/@cyphar/ociv2-brainstorm">https://hackmd.io/@cyphar/ociv2-brainstorm</a><br> [6] <a href="https://github.com/kata-containers/kata-containers/pull/7106">https://github.com/kata-containers/kata-containers/pull/7106</a><br> <a href="https://kangrejos.com/2023/Read-only%20FS%20abstractions%20&amp;%20tarfs.pdf">https://kangrejos.com/2023/Read-only%20FS%20abstractions%...</a><br> [7] <a href="https://lore.kernel.org/r/9505927dabc3b6695d62dfe1be371b12f5bdebf7.1684491648.git.durui@linux.alibaba.com/">https://lore.kernel.org/r/9505927dabc3b6695d62dfe1be371b1...</a><br> <p> </div> Tue, 26 Sep 2023 07:31:16 +0000 The PuzzleFS container filesystem https://lwn.net/Articles/945527/ https://lwn.net/Articles/945527/ josh <div class="FormattedComment"> PuzzleFS seems really useful!<br> <p> I'm wondering how easily PuzzleFS could build atop a remote object store rather than a local one?<br> </div> Tue, 26 Sep 2023 05:49:01 +0000