OCI is an antiquated format, not fit for modern security requirements

Posted May 17, 2025 13:50 UTC (Sat) by gscrivano (subscriber, #74830)
In reply to: OCI is an antiquated format, not fit for modern security requirements by bluca
Parent article: The future of Flatpak

composefs is simply an overlay on top of an EROFS mount. Why are you assuming that EROFS is mounted without any prior validation? While there may be cases where this makes sense (e.g. the user cares only about the deduplication aspect), in a security-sensitive configuration the same types of policies enforced on the dm-verity volume can also be applied to the EROFS metadata-only volume used in the composefs mount. Once the EROFS mount is trusted, the underlying data can also be trusted, since the fs-verity digest is sealed in the EROFS read-only image and that is validated at runtime. Therefore, I disagree that ComposeFS is worse from a security standpoint than having everything in a single image file.

OCI is an antiquated format, not fit for modern security requirements

Posted May 17, 2025 14:43 UTC (Sat) by bluca (subscriber, #118303) [Link] (14 responses)

> Why are you assuming that EROFS is mounted without any prior validation?

Once again, this is not about "prior validation". Of course you can validate images when downloading them. With composefs however you cannot validate them when they are _used_, ie: when a binary is loaded and executed from it. You can only do pre-validation, and cross your fingers that nothing gains the same privileges as your userspace component that mounted it, otherwise it's game over. That's a massive difference for any system where security is important (which should be, er, all of them!). One can deploy these kind of security policies on Windows and I believe also on OSX, so it's nothing new.

> in a security-sensitive configuration the same types of policies enforced on the dm-verity volume can also be applied to the EROFS metadata-only volume used in the composefs mount.

No, it cannot, because that "metadata volume" is just a collection of digests that is only known to userspace. The kernel has no idea what is good content and what is bad content, the only thing that matters is that the digests matches the file being read, if I build my own volume that compromises your /usr/bin/ls and overmount it, there's nothing you can do about it.

On the other hand I can show exactly the IPE policy that will block someone from executing a compromised /usr/bin/ls from an unverified filesystem that is overmounted on top of a verified dm-verity:

policy_name=ipe-policy policy_version=0.0.1

DEFAULT action=ALLOW
DEFAULT op=EXECUTE action=DENY
op=EXECUTE boot_verified=TRUE action=ALLOW
op=EXECUTE dmverity_signature=TRUE action=ALLOW

I am pretty sure there's no equivalent for composefs, ostree or any other workflows, because, again, tarballs are a terrible format for shipping executables in 2025, so piling workarounds after workarounds after workarounds just to maintain compatibility with tarballs and work around their severe limitations can only result in suboptimal solutions that make a lot of compromises. Starting from scratch with security as first class citizen is the only solution that doesn't result in getting painted into a corner.

OCI is an antiquated format, not fit for modern security requirements

Posted May 17, 2025 16:35 UTC (Sat) by gscrivano (subscriber, #74830) [Link] (11 responses)

> No, it cannot, because that "metadata volume" is just a collection of digests that is only known to userspace. The kernel has no idea what is good content and what is bad content, the only thing that matters is that the digests matches the file being read, if I build my own volume that compromises your /usr/bin/ls and overmount it, there's nothing you can do about it.

that is not true. The kernel knows about these digests and uses them at runtime to validate each data file when it is accessed, please take a look at how overlay uses these digests: https://docs.kernel.org/filesystems/overlayfs.html#fs-ver...

```
Verity can be used as a general robustness check to detect accidental changes in the overlayfs directories in use. But, with additional care it can also give more powerful guarantees. For example, if the upper layer is fully trusted (by using dm-verity or something similar), then an untrusted lower layer can be used to supply validated file content for all metacopy files. If additionally the untrusted lower directories are specified as “Data-only”, then they can only supply such file content, and the entire mount can be trusted to match the upper layer.
```

So to achieve the chain of trust we only need to validate the EROFS mount, which contains both the overlay redirect attribute and the fs-verity digest for each file.

OCI is an antiquated format, not fit for modern security requirements

Posted May 21, 2025 20:50 UTC (Wed) by bluca (subscriber, #118303) [Link] (10 responses)

> So to achieve the chain of trust we only need to validate the EROFS mount, which contains both the overlay redirect attribute and the fs-verity digest for each file.

Once again, that only proves that the digests match the contents. It doesn't prove the content is the one that was meant to be running. I can provide my own composefs, with perfectly valid metadata, but with my own content, and overmount yours, and it's game over. There's nothing you can do about it, it's just not possible to solve this with composefs, by construction.

Signed dm-verity does not have this problem, because the root of trust is the kernel keyring verifying the signature of the merkle tree.

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 7:27 UTC (Thu) by gscrivano (subscriber, #74830) [Link] (9 responses)

What composefs does is to decouple the data from the metadata, so that it can be deduplicated among multiple images and make sure you can trust the referenced data file is what you really expect it to be, so the point you are making is about how we can trust the EROFS image itself.

The same policies you've in place for the dm-verity volume can be applied to the EROFS mount. I don't see in principle why we couldn't use dm-verity as well, but that wouldn't be different than using fs-verity+IMA on the image file itself. This configuration is not different than what you are proposing.

Whether you want to restrict the system to mount only signed images is a separate discussion (only in part it is technical) and no doubt that dealing only with signed images is better. That might work in a controlled environment or for high privileged system services coming from a few trusted vendors, but it wouldn't fit with the way OCI containers are used today, either locally or in a cluster, which is pulling random images from a registry.

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 8:37 UTC (Thu) by bluca (subscriber, #118303) [Link] (8 responses)

> The same policies you've in place for the dm-verity volume can be applied to the EROFS mount. I don't see in principle why we couldn't use dm-verity as well, but that wouldn't be different than using fs-verity+IMA on the image file itself. This configuration is not different than what you are proposing.

No, it cannot, because only userspace knows which EROFS image is the right one in composefs. There's simply no way to do that, and I have shared already the very much real-world and used-in-production policy for dm-verity earlier. There's no equivalent for composefs.

> Whether you want to restrict the system to mount only signed images is a separate discussion (only in part it is technical) and no doubt that dealing only with signed images is better. That might work in a controlled environment or for high privileged system services coming from a few trusted vendors, but it wouldn't fit with the way OCI containers are used today, either locally or in a cluster, which is pulling random images from a registry.

That's only because, again, OCI is a terrible, antiquated and legacy format. Shipping applications as tarballs is a really, really bad idea. If it shipped signed dm-verity images, it could work just fine. It already signs the metadata anyway, so mechanisms to sign artifacts exist, it's just the format that is not fit for purpose in 2025.

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 12:40 UTC (Thu) by walters (subscriber, #7396) [Link] (5 responses)

> No, it cannot, because only userspace knows which EROFS image is the right one in composefs.

> There's simply no way to do that, and I have shared already the very much real-world and used-in-production policy for dm-verity earlier.

If dm-verity applies to one's use case and is already working, then it makes sense to continue to use it for sure.

> That's only because, again, OCI is a terrible, antiquated and legacy format. Shipping applications as tarballs is a really, really bad idea. If it shipped signed dm-verity images, it could work just fine.

The core claim we're making here with composefs (and especially integration with OCI) is that we can add the core dm-verity integrity properties by simply adding a label with a digest on the existing OCI format, without such a huge ecosystem break. There's also the sub-threads that management of many images is more efficient with page cache and disk sharing.

Now as per other sub-threads, indeed in https://github.com/composefs/composefs/issues/360 it is harder to wire up LSMs to composefs today. However, that's not *just* dm-verity, I would phrase it more as "dm-verity ecosystem" if that makes sense.

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 22:41 UTC (Thu) by bluca (subscriber, #118303) [Link] (4 responses)

A label with a digest is not enough to match dm-verity properties. It needs to be a cryptographic signature, verified in the kernel, by a keyring that cannot be modified from userspace, that covers the entire content plus metadata. Otherwise, again, there's no way to distinguish a valid volume with content that want from a valid volume with content that you don't want.

This necessaraily rules out ostree or similar, where content is added dynamically and switched at runtime.

And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity? That's what Azure Linux is going to do in its OCI runtime.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 12:03 UTC (Fri) by walters (subscriber, #7396) [Link] (3 responses)

> A label with a digest is not enough to match dm-verity properties.

I think this is the root of the problem; we are talking about different levels of this. As you know, before https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/... dm-verity signatures were verified in userspace - and that worked fine for a use case where the root hash is covered by being embedded in a UKI signed for secure boot or equivalent. That's the case with our current work in composefs-rs.

You are for sure correct (again as we are discussing in that composefs-rs issue) that usage for applications and wiring up with IPE or equivalent does become easier with an in-kernel key verification. That said I could imagine here also doing it in userspace where the userspace process doing the verification is running in a targeted SELinux domain e.g. with an extra capability to mark a mount as verified for the purpose of the LSM.

IOW you are arguing:

> A label with a digest is not enough to match dm-verity properties.

I would change that to:

> A label with a digest is not enough to match dm-verity+IPE/LSM properties.

And then we agree.

> And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity?

Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 13:01 UTC (Fri) by bluca (subscriber, #118303) [Link] (2 responses)

Well, in 2019 and earlier hardly anybody had heard of UKIs, so I very much doubt they were actually in use anywhere :-) It's not that there was a viable alternative, it's simply that there was less security and several threat models were left completely unaddressed. 2019 is when we started working on Az Boost which is where these threat models had to be addressed, and that's why that feature was added.
There is just no version of any userspace solution that fixes those threat models. One can try and imagine any creative setups with LSMs or what not, to try and create some 'super special trustmebro' userspace daemon that is supposed to be unhackable, and then this boi shows up and sends it all tumbling down the drain: PTRACE

Security decisions need to be made by more privileged components than the ones being checked. This is not a matter of implementations or workarounds or solutions, it's a design pattern. If you have the same privilege level as the thing checking if you have privileges, you _will_ find ways to subvert it.

For example, on Windows these days security policies are implemented by a completely different kernel, running at a higher privilege level than your OS's kernel, with hard security boundaries enforced by HyperV. Our org is working to bring this to Linux: https://www.youtube.com/watch?v=vmt4wlf3a1A
The direction of travel is the opposite of "just check it in userspace", and one day (TM) the dm-verity signature will be checked by a higher-privileged kernel, instead of the host kernel, so that an entire new class of threat models can be closed off too.

Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design. Nobody in their right mind would suggest that just running swtpm is a viable alternative for production usage on a secure host, or they'd be laughed out of the room.

That's why I keep saying that Linux is hopelessly behind Windows/OSX. Because it is. And crufty old stuff like OCI, that have cemented in the ecosystem an absolutely terrible image format (tarballs! What is this, 1982?), is a very large part of why this is the case, as projects like yours (for no faults of your own or your colleagues! You have to work with what is there, and I don't envy you one bit :-) ) are forced to do somersaults through flaming hoops to try and somewhat patch the leaky bucket, because god forbid docker switches to a sensible image format that's fit for purpose.

> Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

But that's again a shortcoming of OCI, being the terrible format that it is. And it doesn't affect Flatpak, because the Flatpak devs made a very clever and sensible decision to separate the runtimes from the apps, and the app developer doesn't supply the runtime, it chooses one. So deduplication happens at the runtime level. OCI doesn't have anything like that, because it's a binfire of an ecosystem. For our use case in Boost we copied this design, and the runtime is shared and developers don't bring their own, and we get the best of both worlds: strong integrity protection that's not currently possible otherwise, and file/page level sharing of DSOs. Once again this is not a problem that composefs created or can solve, it's just inherited from OCI, and has to find ways to work around it.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 18:41 UTC (Fri) by walters (subscriber, #7396) [Link] (1 responses)

> and then this boi shows up and sends it all tumbling down the drain: PTRACE

Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

> Security decisions need to be made by more privileged components than the ones being checked.

Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

> Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design.

That's a huge strawman. I know the point you're trying to make, but TPMs are really quite different than what's being discussed here.

I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

> So deduplication happens at the runtime level. OCI doesn't have anything like that,

Yeah, I have thought about this more than once. It would make a lot of sense for sure, but would also have ecosystem-splitting effects, though I do think that something like this would actually be doable as a standards change.

That said, it's important to point out that flatpak already supports OCI as a transport and absolutely nothing prevents one from implementing such a thing for docker/podman as a kind of opt-in today either.

OCI is an antiquated format, not fit for modern security requirements

Posted May 29, 2025 0:11 UTC (Thu) by bluca (subscriber, #118303) [Link]

> Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

Which is all nice and well, until you _need_ to have a component that is allowed to do such actions (eg: it needs to capture live dumps in order to keep a fleet maintainable), and it gets compromised

> Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

Which is why virtualization-based security levels are being worked on, to split the kernel into multiple privilege levels too.

> I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

Sure, it's an example, the point was to show that it is widely accepted that having hard security boundaries is widely accepted as good and necessary, and that delegating certain tasks to userspace and hoping for the best is not acceptable anymore for certain things, e.g. one wouldn't do that with the handling of a plain-text private key for a production system. The same principle applies to other security policies, in different contexts.

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 22:24 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Can I verify dm-verity signatures in Java or Go?

OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 22:35 UTC (Thu) by bluca (subscriber, #118303) [Link]

Yes, as it's PKCS7, but there's no reason to, as it's the kernel that verifies them

A process being able to compromise a less privileged one is not a vulnerability

Posted May 23, 2025 23:13 UTC (Fri) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Okay, so Flatpak can compromise any of the apps it spawns. That’s completely useless from an attacker’s perspective. It just means they have gone from a more privileged process to a less privileged one, which does not help them at all.

For all practical purposes, Flatpak is part of the trusted computing base of a desktop system. It can access any and all resources that the user can, and that’s enough to do pretty much anything the attacker wants. Advanced iOS malware just needs to escape the sandbox. It doesn’t need root or kernel privileges to do its job.

What is your actual goal here, and what is your threat model? Instead of trying to prevent Flatpak from compromising the processes it runs, I think your efforts would be far better spent ensuring that Flatpak itself is not compromised. Flatpak can be signed and then tell the kernel what signatures and/or hashes to expect for the binaries it runs.

A process being able to compromise a less privileged one is not a vulnerability

Posted May 29, 2025 0:20 UTC (Thu) by bluca (subscriber, #118303) [Link]

> Okay, so Flatpak can compromise any of the apps it spawns.

Uhm that's not really the point at all, I'd suggest to read the comment again