LWN: Comments on "The future of Flatpak"

Is a web browser _less_ secure when run within a Flatpak?

swilmet — Mon, 02 Jun 2025 23:27:17 +0000

So in short, what the article says is that there is a workaround for the lack of nested sandboxing, but it's a fragile implementation.

My understanding is that "fragile" means it'll break when the surrounding code changes a bit too much, or when doing some heavy refactorings. Not great security-wise.

And in fact, "There have been issues with this approach for quite a while", the article says.

For the about:support page I'm not sure, Firefox maybe provides the same information for some fields, but internally the details would differ (the fragile side-sandbox for Flatpak, versus the full-blown sandboxing solutions for distro packages). To be confirmed, this is just supposition.

Flatpak is not primarily for corporate environments

bluca — Thu, 29 May 2025 19:02:36 +0000

> What you are describing makes sense for corporate machines where IT determines what users are and are not allowed to do.

No, that's one use case, but certainly not the only one. The owner of the machine is in control of what runs. The owner might be IT, or an individual. The keys are the same as the ones used for secureboot/mok.

Unprivileged users need to be able to create and run flatpaks

bluca — Thu, 29 May 2025 18:57:14 +0000

Sorry, but that's really not how any of that works, you might want to look at it again

Unprivileged users need to be able to create and run flatpaks

DemiMarie — Thu, 29 May 2025 01:21:44 +0000

mountfsd is only secure if you configure it to only mount volumes created by a key that only root-equivalent users have access to. I strongly suspect that any solution that requires root-equivalent privileges to create and run a flatpak is not going to be accepted upstream. Only allowing signed flatpaks to run might be acceptable as an option, but not as the default, at least not unless users can enroll their own signing keys without needing any special privileges to do it.

Flatpak is not primarily for corporate environments

DemiMarie — Thu, 29 May 2025 01:17:29 +0000

What you are describing makes sense for corporate machines where IT determines what users are and are not allowed to do. However, those machines are not Flatpak’s main use-case. Flatpak is primarily for end-user machines, and anything that prevents end users from controlling what can run on their own hardware is not something the desktop Linux ecosystem will consider.

I used to work on Qubes OS and now work on Spectrum. Both are aimed at security far beyond what Windows and macOS can achieve. They do this via hypervisor-enforced isolation, not by kernel-enforced code-signing.

If your employer wants to lock down Flatpak to the degree you are describing, they should contribute the missing features themselves, rather than complaining that upstream maintainers who don’t care about your use-case (which they may well consider user-hostile) are not supporting it. For what it is worth, I believe that Snap could support what you are talking about fairly easily, and might be a better fit for your needs.

Flatpak needs an unprivileged solution

bluca — Thu, 29 May 2025 01:12:47 +0000

No, that's not necessary at all, via mountfsd or similar solutions

Flatpak needs an unprivileged solution

DemiMarie — Thu, 29 May 2025 00:46:23 +0000

dm-verity requires root privileges to use. Flatpak doesn’t need any special privileges at all, so dm-verity isn’t even an option.

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 29 May 2025 00:34:21 +0000

I was under the impression that one of the core aspects of os-tree was the ability to locally compose a snapshot from completely arbitrary content? Maybe I'm mixing up things

What attacks does IPE stop?

bluca — Thu, 29 May 2025 00:32:35 +0000

> The ones I can think of all fall into the “you’ve already lost” case.

There is no such case. This is the kind of mindset that needs to be left behind if Linux ever hopes to catch up with the competition on these aspects. The most important question to ask after a security boundary has been put in place is: "what happens _when_ it gets breached?"

A process being able to compromise a less privileged one is not a vulnerability

bluca — Thu, 29 May 2025 00:20:13 +0000

> Okay, so Flatpak can compromise any of the apps it spawns.

Uhm that's not really the point at all, I'd suggest to read the comment again

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 29 May 2025 00:11:40 +0000

> Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

Which is all nice and well, until you _need_ to have a component that is allowed to do such actions (eg: it needs to capture live dumps in order to keep a fleet maintainable), and it gets compromised

> Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

Which is why virtualization-based security levels are being worked on, to split the kernel into multiple privilege levels too.

> I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

Sure, it's an example, the point was to show that it is widely accepted that having hard security boundaries is widely accepted as good and necessary, and that delegating certain tasks to userspace and hoping for the best is not acceptable anymore for certain things, e.g. one wouldn't do that with the handling of a plain-text private key for a production system. The same principle applies to other security policies, in different contexts.

Which Windows security feature?

bluca — Thu, 29 May 2025 00:00:29 +0000

One can deploy Windows to a machine with policies set up in such a way that there is complete code integrity enforcement, including scripts. I'm not sure it's published or for internal use only, but there are even forks of common interpreters that are locked down to only allow signed scripts.

Is a web browser _less_ secure when run within a Flatpak?

daenzer — Wed, 28 May 2025 07:22:53 +0000

I was wondering the same thing while reading the article. Comparing the Sandbox section of about:support between the Flatpak and native Fedora versions, the only difference is that "User Namespaces" is false with Flatpak. Everything else is the same, including the "Sandbox Level" values.

I'm not sure about the implications of the lack of user namespaces, offhand it doesn't seem like a big difference though.

OCI is an antiquated format, not fit for modern security requirements

sramkrishna — Tue, 27 May 2025 17:00:33 +0000

100% this. OCI has all the mature tooling and the mindshare.

We've done this a few times within the life of GNOME and other projects where we have a solution because there wasn't really one at the time this was all engineered that was mature. Now, the tooling and industry mindshare has caught up and we're left trying to maintain something that has no mindshare and no tools as we don't have the resources.

During the very first Linux App Summit, we had it in Portland and alexl and some others met with Valve who also attended back in 2016 while flatpak was still in active development. Valve was hoping for a set of tooling for what I suspect was the steam deck or at least steamos. They were frustrated with flatpak at the time. It would have been nice if they had stayed involved and help guide it given their background in gaming. but alas.

Is a web browser _less_ secure when run within a Flatpak?

swilmet — Tue, 27 May 2025 13:29:21 +0000

From the article:

> One thing that has been a bit of a pain point, Wick said, is that nested sandboxing does not work in Flatpak. For instance, an application cannot use Bubblewrap inside Flatpak. Many applications, such as web browsers, make heavy use of sandboxing.
>
> > They really like to put their tabs into their own sandboxes because it turns out that if one of those tabs is running some code that manages to exploit and break out of the process there, at least it's contained and doesn't spread to the rest of the browser.
>
> What Flatpak does instead, currently, is to have a kind of side sandbox that applications can call to and spawn another Flatpak instance that can be restricted even further. ""So, in that sense, that is a solution to the problem, but it is also kind of fragile"." There have been issues with this approach for quite a while, he said, but no one knows quite how to solve them.

So, it's not really clear to me whether Firefox for example is more or less secure when run as a Flatpak compared to a traditional Linux distribution package.

Which Windows security feature?

mathstuf — Sat, 24 May 2025 17:55:14 +0000

Before anyone goes and implements TCC on Linux, *please* consider allowing some mechanism to grant permissions other than via a system-level prompt when the permission is *attempted*. It is quite frustrating to have to schedule "dummy" CI jobs to "poke" the permissions so that the CI runner executable has "permission" to perform the actions done inside of its jobs and then babysit the machine to grant access via the dialog manually when the job is running.

I was able to use `sqlite3` on the TCC database, but it is SIP-protected, so not actually actionable.

What attacks does IPE stop?

DemiMarie — Sat, 24 May 2025 00:17:20 +0000

What kinds of attacks does IPE really stop? The ones I can think of all fall into the “you’ve already lost” case. If an attacker has arbitrary filesystem read/write, they’ve won. The problem is that they were able to get such access in the first place.

If you are that concerned about security, you would be vastly better off running each container as an entire virtual machine. That protects against kernel vulnerabilities, which are far, far, far more important and devastating. The security of this approach is far better than any solution based in a shared kernel, because VM escapes are so much less common than kernel exploits. Qubes OS, Spectrum, Edera, and OpenXT all use this approach.

A process being able to compromise a less privileged one is not a vulnerability

DemiMarie — Fri, 23 May 2025 23:13:52 +0000

Okay, so Flatpak can compromise any of the apps it spawns. That’s completely useless from an attacker’s perspective. It just means they have gone from a more privileged process to a less privileged one, which does not help them at all.

For all practical purposes, Flatpak is part of the trusted computing base of a desktop system. It can access any and all resources that the user can, and that’s enough to do pretty much anything the attacker wants. Advanced iOS malware just needs to escape the sandbox. It doesn’t need root or kernel privileges to do its job.

What is your actual goal here, and what is your threat model? Instead of trying to prevent Flatpak from compromising the processes it runs, I think your efforts would be far better spent ensuring that Flatpak itself is not compromised. Flatpak can be signed and then tell the kernel what signatures and/or hashes to expect for the binaries it runs.

Which Windows security feature?

DemiMarie — Fri, 23 May 2025 22:58:58 +0000

macOS Transparency, Consent, and Control is indeed far more advanced than anything on desktop Linux, but I’m not aware of anything on Windows. There’s just code signing and even that is only checked during launch, not runtime.

OCI is an antiquated format, not fit for modern security requirements

walters — Fri, 23 May 2025 18:41:56 +0000

> and then this boi shows up and sends it all tumbling down the drain: PTRACE

Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

> Security decisions need to be made by more privileged components than the ones being checked.

Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

> Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design.

That's a huge strawman. I know the point you're trying to make, but TPMs are really quite different than what's being discussed here.

I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

> So deduplication happens at the runtime level. OCI doesn't have anything like that,

Yeah, I have thought about this more than once. It would make a lot of sense for sure, but would also have ecosystem-splitting effects, though I do think that something like this would actually be doable as a standards change.

That said, it's important to point out that flatpak already supports OCI as a transport and absolutely nothing prevents one from implementing such a thing for docker/podman as a kind of opt-in today either.

OCI is an antiquated format, not fit for modern security requirements

bluca — Fri, 23 May 2025 13:01:31 +0000

Well, in 2019 and earlier hardly anybody had heard of UKIs, so I very much doubt they were actually in use anywhere :-) It's not that there was a viable alternative, it's simply that there was less security and several threat models were left completely unaddressed. 2019 is when we started working on Az Boost which is where these threat models had to be addressed, and that's why that feature was added.
There is just no version of any userspace solution that fixes those threat models. One can try and imagine any creative setups with LSMs or what not, to try and create some 'super special trustmebro' userspace daemon that is supposed to be unhackable, and then this boi shows up and sends it all tumbling down the drain: PTRACE

Security decisions need to be made by more privileged components than the ones being checked. This is not a matter of implementations or workarounds or solutions, it's a design pattern. If you have the same privilege level as the thing checking if you have privileges, you _will_ find ways to subvert it.

For example, on Windows these days security policies are implemented by a completely different kernel, running at a higher privilege level than your OS's kernel, with hard security boundaries enforced by HyperV. Our org is working to bring this to Linux: https://www.youtube.com/watch?v=vmt4wlf3a1A
The direction of travel is the opposite of "just check it in userspace", and one day (TM) the dm-verity signature will be checked by a higher-privileged kernel, instead of the host kernel, so that an entire new class of threat models can be closed off too.

Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design. Nobody in their right mind would suggest that just running swtpm is a viable alternative for production usage on a secure host, or they'd be laughed out of the room.

That's why I keep saying that Linux is hopelessly behind Windows/OSX. Because it is. And crufty old stuff like OCI, that have cemented in the ecosystem an absolutely terrible image format (tarballs! What is this, 1982?), is a very large part of why this is the case, as projects like yours (for no faults of your own or your colleagues! You have to work with what is there, and I don't envy you one bit :-) ) are forced to do somersaults through flaming hoops to try and somewhat patch the leaky bucket, because god forbid docker switches to a sensible image format that's fit for purpose.

> Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

But that's again a shortcoming of OCI, being the terrible format that it is. And it doesn't affect Flatpak, because the Flatpak devs made a very clever and sensible decision to separate the runtimes from the apps, and the app developer doesn't supply the runtime, it chooses one. So deduplication happens at the runtime level. OCI doesn't have anything like that, because it's a binfire of an ecosystem. For our use case in Boost we copied this design, and the runtime is shared and developers don't bring their own, and we get the best of both worlds: strong integrity protection that's not currently possible otherwise, and file/page level sharing of DSOs. Once again this is not a problem that composefs created or can solve, it's just inherited from OCI, and has to find ways to work around it.

OCI is an antiquated format, not fit for modern security requirements

walters — Fri, 23 May 2025 12:03:15 +0000

> A label with a digest is not enough to match dm-verity properties.

I think this is the root of the problem; we are talking about different levels of this. As you know, before https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/... dm-verity signatures were verified in userspace - and that worked fine for a use case where the root hash is covered by being embedded in a UKI signed for secure boot or equivalent. That's the case with our current work in composefs-rs.

You are for sure correct (again as we are discussing in that composefs-rs issue) that usage for applications and wiring up with IPE or equivalent does become easier with an in-kernel key verification. That said I could imagine here also doing it in userspace where the userspace process doing the verification is running in a targeted SELinux domain e.g. with an extra capability to mark a mount as verified for the purpose of the LSM.

IOW you are arguing:

> A label with a digest is not enough to match dm-verity properties.

I would change that to:

> A label with a digest is not enough to match dm-verity+IPE/LSM properties.

And then we agree.

> And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity?

Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 22 May 2025 22:41:52 +0000

A label with a digest is not enough to match dm-verity properties. It needs to be a cryptographic signature, verified in the kernel, by a keyring that cannot be modified from userspace, that covers the entire content plus metadata. Otherwise, again, there's no way to distinguish a valid volume with content that want from a valid volume with content that you don't want.

This necessaraily rules out ostree or similar, where content is added dynamically and switched at runtime.

And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity? That's what Azure Linux is going to do in its OCI runtime.

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 22 May 2025 22:35:00 +0000

Yes, as it's PKCS7, but there's no reason to, as it's the kernel that verifies them

OCI is an antiquated format, not fit for modern security requirements

Cyberax — Thu, 22 May 2025 22:24:30 +0000

Can I verify dm-verity signatures in Java or Go?

OCI is an antiquated format, not fit for modern security requirements

alexl — Thu, 22 May 2025 14:47:08 +0000

>> 1) It is quite possible to implement the same thing for composefs. I.e. add some IPE rules that let you specify a policy about files originating from an overlay filesystem where all non-data-only layers are signed EROFS images. This would require upstream work, but conceptually it is not hard.

>I don't believe that's the case: one of the main reasons for composefs to exist is to de-duplicate content. It can be updated on-the-fly when new tarballs or ostree content appears. Updated content means updated signature, but you can't sign locally or it would completely defeat its purpose.

I'm not sure what you mean here. When you do an update, we go from one signed EROFS image to another, the deduplication happens because both EROFS image may refer to the same backing file for file content. However, both EROFS images contains the digest of that backing file and validate it on use.

We do *create* the EROFS image locally (from the tarball), that is true, but the tooling is designed to be 100% reproducible. So you can build and sign it on the server, and then ship the signature as part of the OCI image metadata, and recombine the signature with the locally build EROFS image on the host.

OCI is an antiquated format, not fit for modern security requirements

walters — Thu, 22 May 2025 12:40:27 +0000

> No, it cannot, because only userspace knows which EROFS image is the right one in composefs.

> There's simply no way to do that, and I have shared already the very much real-world and used-in-production policy for dm-verity earlier.

If dm-verity applies to one's use case and is already working, then it makes sense to continue to use it for sure.

> That's only because, again, OCI is a terrible, antiquated and legacy format. Shipping applications as tarballs is a really, really bad idea. If it shipped signed dm-verity images, it could work just fine.

The core claim we're making here with composefs (and especially integration with OCI) is that we can add the core dm-verity integrity properties by simply adding a label with a digest on the existing OCI format, without such a huge ecosystem break. There's also the sub-threads that management of many images is more efficient with page cache and disk sharing.

Now as per other sub-threads, indeed in https://github.com/composefs/composefs/issues/360 it is harder to wire up LSMs to composefs today. However, that's not *just* dm-verity, I would phrase it more as "dm-verity ecosystem" if that makes sense.

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 22 May 2025 09:50:56 +0000

> We've debated this before (see https://lwn.net/Articles/1011529/)

Repetita iuvant, a teacher of mine used to say :-)

> 1) It is quite possible to implement the same thing for composefs. I.e. add some IPE rules that let you specify a policy about files originating from an overlay filesystem where all non-data-only layers are signed EROFS images. This would require upstream work, but conceptually it is not hard.

I don't believe that's the case: one of the main reasons for composefs to exist is to de-duplicate content. It can be updated on-the-fly when new tarballs or ostree content appears. Updated content means updated signature, but you can't sign locally or it would completely defeat its purpose.

If you change it so that content can never change and all images are fixed then... you just reinvented dm-verity with extra steps.

> 2) I don't think such super-locked down IPE setups are useful to most people, so it is not currently a priority for the composefs project.

For the latter, you are of course in charge of what constitutes a priority for the composefs project. However for the former, generally speaking people never care about security, until it's too late. The fact that the OCI ecosystem is so hopelessly far behind, and forces insecure-by-default setups on users and thus results in Linux severely lagging behind the competition, is a sad indictment and not something to be proud of or that can be employed as an excuse.

In fact, I can talk about it now as it's public since it was announced at MSFT Build just a couple of days ago, Azure Linux will be shipping a feature that adds dm-verity and IPE based security to its OCI runtime. It's not perfect as it still needs to employ a metric ton of ugly workarounds due to how terrible and antiquated OCI is, but still it's miles ahead of bare OCI tarballs/composefs in terms of security. This wouldn't happen if there wasn't demand for it.

OCI is an antiquated format, not fit for modern security requirements

alexl — Thu, 22 May 2025 09:14:59 +0000

We've debated this before (see https://lwn.net/Articles/1011529/)

Currently only dm-verity allows you to specify an IPE policy such that the policy is based on the origin of the file. I.e. you can have a setup where only files that originate on a signed dm-verity image are allowed to execute. In such a setup, if an evil root user manages to over-mount the image (with a tmpfs or whatever) the kernel will disallow executing files from the over-mount.

To this I still have the same answer:

1) It is quite possible to implement the same thing for composefs. I.e. add some IPE rules that let you specify a policy about files originating from an overlay filesystem where all non-data-only layers are signed EROFS images. This would require upstream work, but conceptually it is not hard.

2) I don't think such super-locked down IPE setups are useful to most people, so it is not currently a priority for the composefs project.

OCI is an antiquated format, not fit for modern security requirements

bluca — Thu, 22 May 2025 08:37:21 +0000

> The same policies you've in place for the dm-verity volume can be applied to the EROFS mount. I don't see in principle why we couldn't use dm-verity as well, but that wouldn't be different than using fs-verity+IMA on the image file itself. This configuration is not different than what you are proposing.

No, it cannot, because only userspace knows which EROFS image is the right one in composefs. There's simply no way to do that, and I have shared already the very much real-world and used-in-production policy for dm-verity earlier. There's no equivalent for composefs.

> Whether you want to restrict the system to mount only signed images is a separate discussion (only in part it is technical) and no doubt that dealing only with signed images is better. That might work in a controlled environment or for high privileged system services coming from a few trusted vendors, but it wouldn't fit with the way OCI containers are used today, either locally or in a cluster, which is pulling random images from a registry.

That's only because, again, OCI is a terrible, antiquated and legacy format. Shipping applications as tarballs is a really, really bad idea. If it shipped signed dm-verity images, it could work just fine. It already signs the metadata anyway, so mechanisms to sign artifacts exist, it's just the format that is not fit for purpose in 2025.

OCI is an antiquated format, not fit for modern security requirements

gscrivano — Thu, 22 May 2025 07:27:19 +0000

What composefs does is to decouple the data from the metadata, so that it can be deduplicated among multiple images and make sure you can trust the referenced data file is what you really expect it to be, so the point you are making is about how we can trust the EROFS image itself.

The same policies you've in place for the dm-verity volume can be applied to the EROFS mount. I don't see in principle why we couldn't use dm-verity as well, but that wouldn't be different than using fs-verity+IMA on the image file itself. This configuration is not different than what you are proposing.

Whether you want to restrict the system to mount only signed images is a separate discussion (only in part it is technical) and no doubt that dealing only with signed images is better. That might work in a controlled environment or for high privileged system services coming from a few trusted vendors, but it wouldn't fit with the way OCI containers are used today, either locally or in a cluster, which is pulling random images from a registry.

OCI is an antiquated format, not fit for modern security requirements

bluca — Wed, 21 May 2025 20:50:48 +0000

> So to achieve the chain of trust we only need to validate the EROFS mount, which contains both the overlay redirect attribute and the fs-verity digest for each file.

Once again, that only proves that the digests match the contents. It doesn't prove the content is the one that was meant to be running. I can provide my own composefs, with perfectly valid metadata, but with my own content, and overmount yours, and it's game over. There's nothing you can do about it, it's just not possible to solve this with composefs, by construction.

Signed dm-verity does not have this problem, because the root of trust is the kernel keyring verifying the signature of the merkle tree.

OCI is an antiquated format, not fit for modern security requirements

hsiangkao — Wed, 21 May 2025 17:40:34 +0000

> Yes, and you can compare it to OCI. Which _is_ significantly simpler, as it doesn't _have_ inodes.
> I'm not arguing that EROFS or Squashfs are bad, they are just more complex, and I want something as simple as possible with the widest amount of tooling available.

How simple? tar consists of `tar header` and `data`. It was designed for tape devices and it doesn't even support metadata random access (because you can never image how rootdir looks like until the last `tar header` in case the last tar header is in the rootdir).

EROFS core on-disk format can be implemented in ~500 lines (for example, https://github.com/dmcgowan/go-erofs/blob/main/erofs.go) if you don't implement optimized binary search and xattrs.

It's basically just a combination of three basic on-disk parts: superblock + inodes + dirents if you could take a look of https://erofs.docs.kernel.org/en/latest/core_ondisk.html. Except for on-disk superblock, inodes and dirents can be arranged in a free form. dirents are designed for random access but you could just implement the naive way. I wonder how simpler than this form without extensibility?

Because EROFS implements many optional advanced features like ACL, FSDAX, Direct I/O, file-backed mounts, very optimized decompression subsystem with inplace I/Os etc. But it doesn't mean the on-disk format is complex.

OCI is an antiquated format, not fit for modern security requirements

Cyberax — Wed, 21 May 2025 17:29:22 +0000

Yes, and you can compare it to OCI. Which _is_ significantly simpler, as it doesn't _have_ inodes.

I'm not arguing that EROFS or Squashfs are bad, they are just more complex, and I want something as simple as possible with the widest amount of tooling available.

OCI is an antiquated format, not fit for modern security requirements

hsiangkao — Wed, 21 May 2025 16:36:16 +0000

> Squashfs is not too hard to support, as it's just barely more complex than tar. But then it also has a lot of tar's problems. EROFS is better, but it's also more complicated. And this means more space for potential issues.
> And file formats for something like container images should be as simple as possible.

I'm tired of writing comments on LWN.net because simply I don't get where those biased points come from.
EROFS core on-disk format (e.g. used for ComposeFS) is much simple, flexible and efficient:

- It doesn't have an old-styled centralized on-disk inode table as SquashFS like extX and minix; In fact, EROFS on-disk inodes can be placed on disk anywhere if needed as modern fses like XFS, BtrFS, etc., therefore it's quite easy to do incremental builds (e.g. add new inodes and data) without expending and rewriting a new inode table entirely;

- It doesn't have extra on-disk directory indices to speed up inode lookup "https://dr-emann.github.io/squashfs/squashfs.html#_directory_index" for large directories since without those directory indices, SquashFS directory can only search dirents in a simple linear way due to its on-disk dirent design; Unlike SquashFS, EROFS dirents are still simple and strictly sorted in alphabetical order and can do binary search natively. I've tested some AI datasets where each directory contains millions of files, and EROFS random access performance is even better than SOTA EXT4.

- The core on-disk format just have three parts: super-block, 32 or 64-byte inodes (instead of one layout for each type of inodes to save seamless space) and dirents: https://erofs.docs.kernel.org/en/latest/core_ondisk.html. I have no idea where is "more space for potential issues" because it just behaves as a fsblock-aligned archive format;

- EROFS uncompressed data is strictly fsblock-based which means data can be directly fetched via DMA to page cache without extra post-processing, instead of SquashFS unaligned data even if it supports uncompressed mode but still need a memcpy to handle unalignment; thus, EROFS also supports advanced runtime features natively like FSDAX (XIP), direct I/Os, etc.

OCI is an antiquated format, not fit for modern security requirements

Cyberax — Sat, 17 May 2025 18:17:33 +0000

> For read-only filesystems such as erofs and squashfs, diffing works well enough, running diffoscope produces good and readable output. It's not any different from tarballs.

Squashfs is not too hard to support, as it's just barely more complex than tar. But then it also has a lot of tar's problems. EROFS is better, but it's also more complicated. And this means more space for potential issues.

And file formats for something like container images should be as simple as possible.

> Other OSes I really don't care about, and I am pretty sure they are irrelevant for Flatpak too, which is the subject of the article.

Sure, but then it's back to the status quo: Flatpak will remain a unique snowflake with slowly decaying tooling.

OCI is an antiquated format, not fit for modern security requirements

gscrivano — Sat, 17 May 2025 16:35:54 +0000

> No, it cannot, because that "metadata volume" is just a collection of digests that is only known to userspace. The kernel has no idea what is good content and what is bad content, the only thing that matters is that the digests matches the file being read, if I build my own volume that compromises your /usr/bin/ls and overmount it, there's nothing you can do about it.

that is not true. The kernel knows about these digests and uses them at runtime to validate each data file when it is accessed, please take a look at how overlay uses these digests: https://docs.kernel.org/filesystems/overlayfs.html#fs-ver...

```
Verity can be used as a general robustness check to detect accidental changes in the overlayfs directories in use. But, with additional care it can also give more powerful guarantees. For example, if the upper layer is fully trusted (by using dm-verity or something similar), then an untrusted lower layer can be used to supply validated file content for all metacopy files. If additionally the untrusted lower directories are specified as “Data-only”, then they can only supply such file content, and the entire mount can be trusted to match the upper layer.
```

So to achieve the chain of trust we only need to validate the EROFS mount, which contains both the overlay redirect attribute and the fs-verity digest for each file.

OCI is an antiquated format, not fit for modern security requirements

bluca — Sat, 17 May 2025 14:43:55 +0000

> Why are you assuming that EROFS is mounted without any prior validation?

Once again, this is not about "prior validation". Of course you can validate images when downloading them. With composefs however you cannot validate them when they are _used_, ie: when a binary is loaded and executed from it. You can only do pre-validation, and cross your fingers that nothing gains the same privileges as your userspace component that mounted it, otherwise it's game over. That's a massive difference for any system where security is important (which should be, er, all of them!). One can deploy these kind of security policies on Windows and I believe also on OSX, so it's nothing new.

> in a security-sensitive configuration the same types of policies enforced on the dm-verity volume can also be applied to the EROFS metadata-only volume used in the composefs mount.

No, it cannot, because that "metadata volume" is just a collection of digests that is only known to userspace. The kernel has no idea what is good content and what is bad content, the only thing that matters is that the digests matches the file being read, if I build my own volume that compromises your /usr/bin/ls and overmount it, there's nothing you can do about it.

On the other hand I can show exactly the IPE policy that will block someone from executing a compromised /usr/bin/ls from an unverified filesystem that is overmounted on top of a verified dm-verity:

policy_name=ipe-policy policy_version=0.0.1

DEFAULT action=ALLOW
DEFAULT op=EXECUTE action=DENY
op=EXECUTE boot_verified=TRUE action=ALLOW
op=EXECUTE dmverity_signature=TRUE action=ALLOW

I am pretty sure there's no equivalent for composefs, ostree or any other workflows, because, again, tarballs are a terrible format for shipping executables in 2025, so piling workarounds after workarounds after workarounds just to maintain compatibility with tarballs and work around their severe limitations can only result in suboptimal solutions that make a lot of compromises. Starting from scratch with security as first class citizen is the only solution that doesn't result in getting painted into a corner.

OCI is an antiquated format, not fit for modern security requirements

gscrivano — Sat, 17 May 2025 13:50:01 +0000

composefs is simply an overlay on top of an EROFS mount. Why are you assuming that EROFS is mounted without any prior validation? While there may be cases where this makes sense (e.g. the user cares only about the deduplication aspect), in a security-sensitive configuration the same types of policies enforced on the dm-verity volume can also be applied to the EROFS metadata-only volume used in the composefs mount. Once the EROFS mount is trusted, the underlying data can also be trusted, since the fs-verity digest is sealed in the EROFS read-only image and that is validated at runtime. Therefore, I disagree that ComposeFS is worse from a security standpoint than having everything in a single image file.

OCI is an antiquated format, not fit for modern security requirements

bluca — Sat, 17 May 2025 11:26:57 +0000

One of the many really nice design decisions of Flatpak is that the runtimes are not provided by the application, they are provided by the ecosystem. So deduplication already happens for the elements that are actually in common, at the logical layer, rather than the block layer.