Adding package information to ELF objects

Posted Nov 5, 2021 7:35 UTC (Fri) by pabs (subscriber, #43278)
In reply to: Adding package information to ELF objects by zuki
Parent article: Adding package information to ELF objects

These all sound like workarounds to me.

Adding package information to ELF objects

Posted Nov 5, 2021 8:11 UTC (Fri) by pabs (subscriber, #43278) [Link] (6 responses)

To clarify, this proposal is working around the fact that containers (and other situations where binaries are combined and distributed together) have no mechanism to provide provenance of the files within them. The solution should be to add that provenance feature, alongside the container images, not to workaround the missing feature by embedding provenance information within the binaries.

As an example, the Debian project produces live image ISOs. Those images combine binaries from many packages in one file. Each of those ISO images has next to it a file containing the list of binary packages and package versions used to build it. This is the right way to go about solving this problem.

https://cdimage.debian.org/debian-cd/current-live/amd64/i...

Adding package information to ELF objects

Posted Nov 5, 2021 9:04 UTC (Fri) by zuki (subscriber, #41808) [Link] (1 responses)

One persons's workaround is another persons's solution ;)
Having a text file with a list of package versions _somewhere_ is one workaround-slash-solution. Attaching this information directly to the ELF file is another workaround-slash-solution. Both approaches have their advantages and can coexist peacefully.

As discussed in the proposal, attaching the information to the ELF files makes it visible in the place where it's is very useful: crash dumps. I'd say that having a flat text file somewhere is not as useful for this purpose.

Adding package information to ELF objects

Posted Nov 5, 2021 9:11 UTC (Fri) by pabs (subscriber, #43278) [Link]

The ELF solution strikes me as a logically incorrect design, but I guess this is one of those worse is better situations that I'll just have to learn to ignore. Thanks for the discussion.

Adding package information to ELF objects

Posted Nov 5, 2021 9:22 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (2 responses)

Detached metadata opens up a bunch of additional failure modes (eg, what if an image gets rebuilt and uploaded, but somehow the metadata doesn't get uploaded as well?) that are much harder to trigger if the data is in the binaries themselves.

I can think of one real-world (if corner) case that this probably does trip up, though:
1) Have a shim-unsigned package that produces a binary
2) Upload that shim binary to Microsoft and obtain a signed copy
3) Strip that signature from the binary and add it to a shim-signed package
4) Build shim-signed in an identical environment to shim-unsigned, with the last step being to add the signature

If the fact that these are two separate packages were to result in different embedded data, the signature obviously wouldn't apply.

(This isn't a problem at the moment because the Debian shim-signed source package just contains the signed binaries, but it would be nice to have a world where the builds were reproducible enough to avoid that)

Adding package information to ELF objects

Posted Nov 5, 2021 9:38 UTC (Fri) by zuki (subscriber, #41808) [Link]

That is a good point. We'll probably have to exclude shim from this, or maybe customize the note to be identical in the signed and unsigned versions. But shim is already very very special, we'll just have to make another exception for it.

Adding package information to ELF objects

Posted Nov 5, 2021 12:07 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]

shim's in PE format so wouldn't be directly affected by this proposal.

Adding package information to ELF objects

Posted Nov 6, 2021 19:54 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

From the perspective of the orchestration system, provenance information is (usually) available. Every container was built from some well-known image and can be rebuilt if necessary, and any reasonable orchestration system should be tracking that information in some sort of database or other system. If you're smart, you've also mounted most or all of the filesystem as RO, so that the container cannot easily become broken and require rebuilding in the first place. From this perspective, there is no missing feature to add, because you're tracking all of the information which is required for normal operation of the system. Sure, that information may not be directly *accessible* from inside the container, but the container normally does not need to know its own provenance (and probably should not care, in most cases).

In principle, you could use that provenance information for crash dumps. That's how we do it at Google, in fact - we can tell the exact version that was checked into source control, and display the exact line where the faulting instruction happened, because we built the container in the first place, and so we know where everything in it came from. This is one of the benefits* of having a monorepo without (much) branching, as we can just point to one CL number instead of, say, fifty, and it's also one of the reasons** that Bazel makes such a big fuss about exhaustively tracking and declaring your entire dependency hierarchy.

The problems only really arise when a crash dump gets separated from the orchestration system's provenance data, or when the orchestration system's provenance data is inadequate (or when you don't have an orchestration system and are just manually building Docker images from random crap, of course, which is an unfortunately common practice in some shops). You might also have the "my tools suck" problem, where you theoretically have all of the information (provenance data) you need, but converting it into a useful form (a Git hash or version number that upstream can recognize and deal with) is too hard.

* There are also drawbacks, which are irrelevant here, but somebody will bring them up if I don't acknowledge that they exist.
** The main reason is "cache invalidation is hard, and rebuilding the entire universe from scratch is slow." But good provenance data is definitely important too.