|
|
Subscribe / Log in / New account

The PuzzleFS container filesystem

The PuzzleFS container filesystem

Posted Sep 26, 2023 8:05 UTC (Tue) by alexl (subscriber, #19068)
In reply to: The PuzzleFS container filesystem by hsiangkao
Parent article: The PuzzleFS container filesystem

In addition to all the competing implementations of this I feel like this article misses one of the main points of composefs.

PuzzleFS (as well as e.g. nydus, etc) does block-chunking de-duplication, whereas composefs does entire-file de-duplication. Yes, the block-chunking approach is more efficient during download (as you may reuse blocks when files are only partially the same), and will use less disk-space.

However the entire-file dedup approach is more efficient at runtime.

This is because the page cache, and things like vm address spaces, are tied to a single inode. You can't get a single inode out of the multiple chunk files. This means the chunked approach doubles the page cache use. Once to cache the chunk, and once to cache the merged file. You also can't share page cache use between different puzzlefs mounts even if they use shared base files. With composefs, the page caching *only* happens on the base file, and is shared between any composefs mounts using the same backing file. This allows better container density, as e.g. all the glibc mmaps shared between any running containers are the same in the page cache.

I guess an optimal system would use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.


to post comments

The PuzzleFS container filesystem

Posted Sep 26, 2023 8:18 UTC (Tue) by hsiangkao (guest, #123981) [Link] (1 responses)

> use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.

Yes, yet in that way, there are more effective backup/restore technologies such as delta compression (frequently discussed in several academic conferences such as ATC) to reduce I/Os than just do content-defined chunking. And not necessary in a real filesystem form to keep such delta compression archival format.

The PuzzleFS container filesystem

Posted Sep 26, 2023 13:52 UTC (Tue) by alexl (subscriber, #19068) [Link]

True, if you don't need the kernel-side to support it you can use arbitrary complex delta approaches to optimize the download.

Also, you can use filesystem level compression (in e.g. btrfs) to compress the files in the backing dir. Or even use reflinks copy create backing files that share some (but not all) blocks.

The sky is the limit, and all these approaches are compatible with using composefs to mount them.

The PuzzleFS container filesystem

Posted Sep 26, 2023 14:47 UTC (Tue) by tych0 (subscriber, #105844) [Link] (1 responses)

> You also can't share page cache use between different puzzlefs mounts even if they use shared base files.

This is a great point. I guess it could be worked around with some core mm+fs fu, but it would definitely not be simple.

> I guess an optimal system would use a block-chunked approach for downloads, but expand into full-file dedup in the local storage.

It was an explicit design goal of PuzzleFS not to have any translation step between the image that is pushed to the container registry and the one that's mounted+run on the host, so any translation step here would be a non-starter, which seems like you need to pick "one or the other" unfortunately.

The PuzzleFS container filesystem

Posted Sep 27, 2023 19:01 UTC (Wed) by calumapplepie (guest, #143655) [Link]

>> You also can't share page cache use between different puzzlefs mounts even if they use shared base files.
> This is a great point. I guess it could be worked around with some core mm+fs fu, but it would definitely not be simple.

Could KSM be extended to work on file-backed pages?

The PuzzleFS container filesystem

Posted Sep 26, 2023 16:11 UTC (Tue) by walters (subscriber, #7396) [Link] (1 responses)

In theory too, one could have a hybrid system that operates I think a lot like git does with pack files (deduplicated files w/deltas) that can be dynamically materialized the first time they're referenced. Or for files that you *know* will be shared or are "hot", materialize them automatically. This would require FUSE userspace today though. It's an interesting topic because for the in-kernel puzzlefs implementation you'd still probably (?) want a userspace system in control of optimizing things.

The PuzzleFS container filesystem

Posted Sep 26, 2023 20:28 UTC (Tue) by rcampos (subscriber, #59737) [Link]

The minute FUSE is involved, you lost any kind of performance game.

There were some patches to FUSE to allow a passthrough in some cases, that is what I wish is used to do this in a reasonable way.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds