|
|
Subscribe / Log in / New account

Fedora change aims for 99% package reproducibility

By Joe Brockmeier
March 31, 2025

The effort to ensure that open-source software is reproducible has been gathering steam over the years, and gaining traction with major Linux distributions. Debian, for example, has been working toward reproducible builds for more than a decade; it can now produce official live CDs of the current stable release that are reproducible. Fedora started on the path much later, but it has progressed far enough that the project is now considering a change proposal for the Fedora 43 development cycle, expected to be released in October, with a goal of making 99% of Fedora's package builds reproducible. So far, reaction to the proposal seems favorable and focused primarily on how to achieve the goal—with minimal pain for packagers—rather than whether to attempt it.

Defining reproducible builds

The Reproducible Builds project defines a build as reproducible if "given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts". In a 2023 hackfest report, Zbigniew Jędrzejewski-Szmek said that Fedora has not prioritized reproducible builds in the past because Fedora has more control over its build process than Debian and other distributions. Because Debian allows maintainers to generate source packages on their local system and to upload some locally built packages for distribution to users, he said that "trust in the contents of both source and binary packages is low." (Debian's build daemons build most binary packages from source for distribution to users, but there are exceptions.) Fedora, on the other hand, exercises much more control over packages.

In Fedora, all packages that are distributed to users are built in the centralized, strongly controlled infrastructure. All source rpms are built from "dist-git": a git repository which contains the build "recipe" and a cryptographic hash of package sources, so it is relatively easy to verify what changed between package versions, what "inputs" went into a particular source package, and in what environment the binary packages were built.

However, even though Fedora has a tighter control over its packages, Jędrzejewski-Szmek said that one of the benefits of reproducible builds was to help detect and mitigate any kind of supply-chain attack on Fedora's builders and allow others to perform independent verification that the package sources match the binaries that are delivered by Fedora. It's interesting to note that Fedora had embarked on this work before the XZ backdoor drew even more attention to supply-chain attacks.

He acknowledges that Debian is more advanced in its reproducible builds processes, and notes that Fedora is setting a different definition for reproducible builds. This definition excludes signatures and some metadata and focuses solely on the payload of packaged files in a given RPM:

A build is reproducible if given the same source code, build environment and build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and parts of metadata.

The reason Fedora is pursuing a different definition of reproducible build is that it cannot achieve "bit-by-bit" reproducibility by the original definition. This is because of differences in the package format and the way that Fedora builds its packages. RPMs embed the package signature in the RPM when they are built, but Debian uses detached signatures. RPMs also include information, such as the build time (BUILDTIME) and build host (BUILDHOST) in the RPM's header, that can affect reproducibilty. There was a discussion about allowing these variables to be overridden. However, the prevailing opinion was that the information provided by BUILDHOST is useful, and overriding its inclusion is not desirable. The contents, however, should still be "bit-by-bit" identical, even though that phrase does not turn up in Fedora's definition.

The openSUSE project, which also distributes software using the RPM format, sets BUILDHOST to "reproducible", according to Jan Zerebecki. The actual build host is printed in the build logs, and interested users can search openSUSE's build logs to find the host.

Path to reproducibility

For BUILDTIME, openSUSE sets the build time to the date of the latest changelog entry. This is provided to builds by the SOURCE_DATE_EPOCH environment variable. This is where Fedora's reproducible builds work began, with a change that was made during the Fedora 38 development cycle to "clamp" the modification time (mtime) of packaged files to SOURCE_DATE_EPOCH. This ensured that the mtimes were independent of the time of an actual build. Packagers were given the ability to opt-out of this if, for some reason, their package would be broken by the new behavior.

During the Fedora 41 development cycle, the project implemented another change in the RPM build process to remove common sources of irreproducibility. That change made use of a Rust program, add-determinism, that attempts to standardize metadata in binary or source files to ensure consistency. It is similar to Debian's strip-nondeterminism, which is a Perl library that is part of the debhelper tool for building Debian packages. Using strip-nondeterminism, the debhelper tool removes non-deterministic information such as timestamps and filesystem ordering from various file and archive formats. The Fedora project chose to write its own tool because it was undesirable to pull Perl into the build root for every package.

According to the new change proposal, the modifications to Fedora's build infrastructure to date have allowed it to make 90% of package builds reproducible. The goal now is to reach 99% of package builds. It appears that Fedora has gotten as much mileage out of infrastructure changes, without requiring individual packagers to deal with reproducibility problems, as it can. To get to 99% the project is going to have to ask packagers to treat reproducibility problems in their packages as bugs.

The change owners—Jędrzejewski-Szmek, Davide Cavalca, and Jelle van der Waa—would package the fedora-repro-build utility to allow developers to make local rebuilds of packages built in Koji (Fedora's build system) to test their reproducibility. It will also require standing up a public instance of rebuilderd, which is a system for providing independent verification that binary packages can be reproduced from source code. It can scan a package repository's metadata for new or updated packages and then queue them for rebuilding, and it provides an API to query for the reproducibility status of packages. Rebuilderd can also, optionally, use the diffoscope tool to generate a report of differences. The Arch Linux reproducible status page provides a good example of rebuilderd in use.

If accepted, the proposal would also require an update to Fedora's packaging guidelines that would say packages should (not, at least currently, "must") build reproducibly and allow bugs to be filed against packages when they are not reproducible.

Aside from the security benefits of reproducibility, the proposal also makes the case that it will lead to packages of higher quality. Irreproducible bits in packages are quite often "caused by an error or sloppiness in the code". For example, dependence on hardware architecture in architecture-independent (noarch) packages is "almost always unwanted and/or a bug", and reproducibility tests can uncover those bugs.

The proposal acknowledges that some packages will have problems with reproducibility that cannot be fixed easily. For example, Haskell packages are not currently reproducible when compiled by more than one thread, though a fix is being worked on. Packages produced with Go have debug data that is not reproducible because the GNU Debugger index file (.gdb_index) can be of varying size even given the same input. No fix is yet in the works for that. Another known problem is that the Linux kernel uses an ephemeral key for module signatures. LWN covered a patch set from Thomas Weißschuh that may solve that problem.

Feedback

In the discussion thread on Fedora's Discourse forum, Fedora's infrastructure lead Kevin Fenzi asked, "where will this [rebuilderd] instance live and who will maintain it? 🙂" He also noted it would be good to have documentation on setting up a rebuilderd instance. "Otherwise I like the idea!" Cavalca said that the reproducibility work was currently using an Amazon Web Services (AWS) account sponsored by Meta, but "we can look at moving into Fedora infra if there's a preference for that". Fenzi replied that it might be good to keep running the work outside Fedora infrastructure to make it more independent. "Although of course we could run one and then others could run others and compare".

Daniel P. Berrangé asked if rebuilderd could be integrated with Koji so that maintainers did not have to learn another build tool. "I'm pretty unenthusiastic about dealing with yet another standalone web service providing post-build testing." Jędrzejewski-Szmek said that using Koji to perform the build was an interesting idea, but "we also want our rebuilds to be as independent as possible", so it would still be desirable to do them in a system other than Koji. Rebuilding a package the second time in the same build environment means "we are not testing much".

Miroslav Suchý, a member of Fedora's infrastructure team, wondered if rebuilderd could submit builds to Fedora's Copr build system instead of standing up yet another build system in Fedora. This led to a discussion about Copr's capabilities and whether it would integrate well with rebuilderd. Jędrzejewski-Szmek noted that rebuilderd is a "complete project that does things in its own way" and it may be complicated to try to teach it to talk to an external service asynchronously.

Integrating rebuilderd tooling and reports into Fedora's existing infrastructure has been a recurring theme in the discussion. Simon de Vlieger said he was not set on having builds performed in Koji, but wanted the project "to integrate well with Fedora's pre-existing tools and things so it has the highest chance of people actually using it" and performing as people expect.

Next

The next step for the proposal is to file a ticket with the Fedora Engineering Steering Committee (FESCo), at least one week after the proposal was announced. In this case, that would be no sooner than March 26. If FESCo approves, the owners can begin work on the proposal with an eye to completion by October, when Fedora 43 is planned for release.

Most of Fedora's users have probably not noticed the reproducibility work in Fedora thus far and won't appreciate any difference when they install Fedora 43 (or 44, 45, and so on). However, given the continual efforts of bad actors to find and exploit supply-chain weaknesses in open-source projects, it is a valuable effort nonetheless.



to post comments

Debian's exceptions to packages built on buildds are the non-free packages

Posted Apr 1, 2025 11:46 UTC (Tue) by sthibaul (✭ supporter ✭, #54477) [Link]

> Debian's build daemons build most binary packages from source for distribution to users, but there are exceptions.

To be noted: the only exceptions are the obvious contrib, non-free and non-free-firmware packages. And even for contrib, it's only the packages that really cannot be built without non-free packages, that are usually built by maintainers.

I.e. the released real Debian (the main archive where all the free packages are) is always built on buildds.

Debian does not sign packages

Posted Apr 1, 2025 11:58 UTC (Tue) by bluca (subscriber, #118303) [Link]

> RPMs embed the package signature in the RPM when they are built, but Debian uses detached signatures.

Debian packages are not signed at all. Only the repository metadata is signed. Yes, this means that downloading and installing a package manually (without going through a repository) provides no integrity protections whatsoever.

things......

Posted Apr 1, 2025 13:26 UTC (Tue) by r1w1s1 (guest, #169987) [Link]

things get complicated over the time...

Testing

Posted Apr 2, 2025 5:52 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

> Rebuilding a package the second time in the same build environment means "we are not testing much".

As often, the discussion seems to fall a bit short on the testing side. This is not specific to reproducibility, it affects all software in general. Disclaimer: I've only read the LWN article, nothing it points to.

If some feature / parameter is not tested then it does not work. The test suite and efforts are the real specification. But testing (and claiming) build reproducibility is surprisingly hard, probably as hard as testing race conditions (which funny enough can expose build reproducibility issues). You can build in 4 different environments and claim victory until someone tries a slightly different version of some very minor build time dependency that everyone forgot about. While this forgotten dependency may have no security consequence, it's always enough to break the checksums.

In my case the "breakthrough" happened when the same toolchain was available on both Linux and Windows (+ some MinGW magic). That exposed the very "last" round of reproducibility issues. All other issues after that were recent regressions. Two random systems are incredibly unlikely to have some unexpected environment / dependency differences that a Linux and a Windows build systems don't also have. Now you're entering the realm of "theoretical" bugs that are possible in theory but never happen in practice.

It seems possible to build (RPM) packages on "foreign" Linux distributions, maybe that would provide good test coverage?

Testing

Posted Apr 2, 2025 8:21 UTC (Wed) by bmwiedemann (subscriber, #71319) [Link]

You need to build with the same gcc, glibc, headers etc... so normally you don't get identical results on different distributions.

The exception is when you build in a container/chroot. Then only embedding uname -r output would break reproducibility (several packages do that).

And the usual rules apply:
https://reproducible-builds.org/docs/commandments/

Having worked on a OS that has all its 3000 packages bit-reproducible, race-conditions were indeed among the hardest. You are not guaranteed to trigger them anywhere and you cannot be certain they are gone after a fix.
Some race only occurred on a build-machine with HDD. Another race only showed up on the fastest machine with NVMe (8GB/s)

OTOH, when a race is sufficiently rare, it is still possible to verify official binaries with few tries.

Determinism on installed packages

Posted Apr 3, 2025 14:24 UTC (Thu) by jcpunk (subscriber, #95796) [Link] (1 responses)

In addition to the build issues, it would be nice if the installed package had some determinism on its own.

Initially, you'd think this is automatic given the binary consistency. However, if your package contains any symlinks, the mtime on the symlink is set to the timestamp when the archive is unpacked, whereas the binaries have the mtime of their compilation.

So, even if the binaries in the package are reproducible, the files produced by the package have differences on your filesystem.

This is primarily of interest to me for container workflows, if the links in /usr/bin had an mtime of what they linked to, then /usr/bin/ would be deterministic and potentially the layers would be easily duplicated.

Determinism on installed packages

Posted Apr 11, 2025 23:18 UTC (Fri) by zuki (subscriber, #41808) [Link]

> However, if your package contains any symlinks, the mtime on the symlink is set to the timestamp when the archive is unpacked, whereas the binaries have the mtime of their compilation.

This does not seem to match what I see on Fedora. Maybe rpm gets this right?

$ ls -l /usr/bin/udevadm /usr/lib/systemd/systemd-udevd
-rwxr-xr-x 1 root root 644040 Mar 7 01:00 /usr/bin/udevadm
lrwxrwxrwx 1 root root 17 Mar 7 01:00 /usr/lib/systemd/systemd-udevd -> ../../bin/udevadm
$ rpm -q --changelog systemd|head -n1
* Fri Mar 07 2025 Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> - 257.4-3

As you can see, the package was built Mar 7, and this means the timestamp of 00:00:00 UTC, which is 01:00:00 CET, which is my timezone. Fedora sets SOURCE_DATE_EPOCH from that changelog timestamp, and the mtimes of files in the package are clamped to that.


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds