OCI is an antiquated format, not fit for modern security requirements

Posted May 22, 2025 22:41 UTC (Thu) by bluca (subscriber, #118303)
In reply to: OCI is an antiquated format, not fit for modern security requirements by walters
Parent article: The future of Flatpak

A label with a digest is not enough to match dm-verity properties. It needs to be a cryptographic signature, verified in the kernel, by a keyring that cannot be modified from userspace, that covers the entire content plus metadata. Otherwise, again, there's no way to distinguish a valid volume with content that want from a valid volume with content that you don't want.

This necessaraily rules out ostree or similar, where content is added dynamically and switched at runtime.

And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity? That's what Azure Linux is going to do in its OCI runtime.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 12:03 UTC (Fri) by walters (subscriber, #7396) [Link] (3 responses)

> A label with a digest is not enough to match dm-verity properties.

I think this is the root of the problem; we are talking about different levels of this. As you know, before https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/... dm-verity signatures were verified in userspace - and that worked fine for a use case where the root hash is covered by being embedded in a UKI signed for secure boot or equivalent. That's the case with our current work in composefs-rs.

You are for sure correct (again as we are discussing in that composefs-rs issue) that usage for applications and wiring up with IPE or equivalent does become easier with an in-kernel key verification. That said I could imagine here also doing it in userspace where the userspace process doing the verification is running in a targeted SELinux domain e.g. with an extra capability to mark a mount as verified for the purpose of the LSM.

IOW you are arguing:

> A label with a digest is not enough to match dm-verity properties.

I would change that to:

> A label with a digest is not enough to match dm-verity+IPE/LSM properties.

And then we agree.

> And if you have only pre-built, signed, immutable, monolithic images, just use erofs+dm-verity?

Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 13:01 UTC (Fri) by bluca (subscriber, #118303) [Link] (2 responses)

Well, in 2019 and earlier hardly anybody had heard of UKIs, so I very much doubt they were actually in use anywhere :-) It's not that there was a viable alternative, it's simply that there was less security and several threat models were left completely unaddressed. 2019 is when we started working on Az Boost which is where these threat models had to be addressed, and that's why that feature was added.
There is just no version of any userspace solution that fixes those threat models. One can try and imagine any creative setups with LSMs or what not, to try and create some 'super special trustmebro' userspace daemon that is supposed to be unhackable, and then this boi shows up and sends it all tumbling down the drain: PTRACE

Security decisions need to be made by more privileged components than the ones being checked. This is not a matter of implementations or workarounds or solutions, it's a design pattern. If you have the same privilege level as the thing checking if you have privileges, you _will_ find ways to subvert it.

For example, on Windows these days security policies are implemented by a completely different kernel, running at a higher privilege level than your OS's kernel, with hard security boundaries enforced by HyperV. Our org is working to bring this to Linux: https://www.youtube.com/watch?v=vmt4wlf3a1A
The direction of travel is the opposite of "just check it in userspace", and one day (TM) the dm-verity signature will be checked by a higher-privileged kernel, instead of the host kernel, so that an entire new class of threat models can be closed off too.

Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design. Nobody in their right mind would suggest that just running swtpm is a viable alternative for production usage on a secure host, or they'd be laughed out of the room.

That's why I keep saying that Linux is hopelessly behind Windows/OSX. Because it is. And crufty old stuff like OCI, that have cemented in the ecosystem an absolutely terrible image format (tarballs! What is this, 1982?), is a very large part of why this is the case, as projects like yours (for no faults of your own or your colleagues! You have to work with what is there, and I don't envy you one bit :-) ) are forced to do somersaults through flaming hoops to try and somewhat patch the leaky bucket, because god forbid docker switches to a sensible image format that's fit for purpose.

> Because (depending on how it's being implemented to a degree, I'd be curious to a link to the code) it would have the ecosystem splitting problem, and composefs is inherently going to be more efficient by sharing page and disk automatically across images.

But that's again a shortcoming of OCI, being the terrible format that it is. And it doesn't affect Flatpak, because the Flatpak devs made a very clever and sensible decision to separate the runtimes from the apps, and the app developer doesn't supply the runtime, it chooses one. So deduplication happens at the runtime level. OCI doesn't have anything like that, because it's a binfire of an ecosystem. For our use case in Boost we copied this design, and the runtime is shared and developers don't bring their own, and we get the best of both worlds: strong integrity protection that's not currently possible otherwise, and file/page level sharing of DSOs. Once again this is not a problem that composefs created or can solve, it's just inherited from OCI, and has to find ways to work around it.

OCI is an antiquated format, not fit for modern security requirements

Posted May 23, 2025 18:41 UTC (Fri) by walters (subscriber, #7396) [Link] (1 responses)

> and then this boi shows up and sends it all tumbling down the drain: PTRACE

Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

> Security decisions need to be made by more privileged components than the ones being checked.

Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

> Another example: it's the entire reason TPMs are separate enclaves, with hardware-enforced boundaries. You don't just have a TPM userspace process that pinky swears never to leak your key, because that would not be a sensible design.

That's a huge strawman. I know the point you're trying to make, but TPMs are really quite different than what's being discussed here.

I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

> So deduplication happens at the runtime level. OCI doesn't have anything like that,

Yeah, I have thought about this more than once. It would make a lot of sense for sure, but would also have ecosystem-splitting effects, though I do think that something like this would actually be doable as a standards change.

That said, it's important to point out that flatpak already supports OCI as a transport and absolutely nothing prevents one from implementing such a thing for docker/podman as a kind of opt-in today either.

OCI is an antiquated format, not fit for modern security requirements

Posted May 29, 2025 0:11 UTC (Thu) by bluca (subscriber, #118303) [Link]

> Denying that is a key target of LSMs (plus of course commonly seccomp, running as non-root uids and (user) namespacing).

Which is all nice and well, until you _need_ to have a component that is allowed to do such actions (eg: it needs to capture live dumps in order to keep a fleet maintainable), and it gets compromised

> Yes, although the Linux kernel is all one privilege level; implementing components in userspace we can actually e.g. have the thing parsing signatures and doing crypto actually dropping a lot of other ambient privileges.

Which is why virtualization-based security levels are being worked on, to split the kernel into multiple privilege levels too.

> I hope you'd agree that basically what we're talking about is having one bit of the kernel wire up some state to another bit of the kernel; there's no relationship to hardware.

Sure, it's an example, the point was to show that it is widely accepted that having hard security boundaries is widely accepted as good and necessary, and that delegating certain tasks to userspace and hoping for the best is not acceptable anymore for certain things, e.g. one wouldn't do that with the handling of a plain-text private key for a production system. The same principle applies to other security policies, in different contexts.