Fedora change aims for 99% package reproducibility
The effort to ensure that open-source software is reproducible has been gathering steam over the years, and gaining traction with major Linux distributions. Debian, for example, has been working toward reproducible builds for more than a decade; it can now produce official live CDs of the current stable release that are reproducible. Fedora started on the path much later, but it has progressed far enough that the project is now considering a change proposal for the Fedora 43 development cycle, expected to be released in October, with a goal of making 99% of Fedora's package builds reproducible. So far, reaction to the proposal seems favorable and focused primarily on how to achieve the goal—with minimal pain for packagers—rather than whether to attempt it.
Defining reproducible builds
The Reproducible Builds project defines a
build as reproducible if "given the same source code, build
environment and build instructions, any party can recreate bit-by-bit
identical copies of all specified artifacts
". In a 2023
hackfest report, Zbigniew Jędrzejewski-Szmek said that Fedora has
not prioritized reproducible builds in the past because Fedora has
more control over its build process than Debian and other
distributions. Because Debian allows maintainers to
generate source packages on their local system and to upload
some locally built packages for distribution to users, he said that "trust
in the contents of both source and binary packages is low
."
(Debian's build daemons build most binary packages from
source for distribution to users, but
there are exceptions.)
Fedora, on the other hand, exercises much more control over packages.
In Fedora, all packages that are distributed to users are built in the centralized, strongly controlled infrastructure. All source rpms are built from "dist-git": a git repository which contains the build "recipe" and a cryptographic hash of package sources, so it is relatively easy to verify what changed between package versions, what "inputs" went into a particular source package, and in what environment the binary packages were built.
However, even though Fedora has a tighter control over its packages, Jędrzejewski-Szmek said that one of the benefits of reproducible builds was to help detect and mitigate any kind of supply-chain attack on Fedora's builders and allow others to perform independent verification that the package sources match the binaries that are delivered by Fedora. It's interesting to note that Fedora had embarked on this work before the XZ backdoor drew even more attention to supply-chain attacks.
He acknowledges that Debian is more advanced in its reproducible builds processes, and notes that Fedora is setting a different definition for reproducible builds. This definition excludes signatures and some metadata and focuses solely on the payload of packaged files in a given RPM:
A build is reproducible if given the same source code, build environment and build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and parts of metadata.
The reason Fedora is pursuing a different definition of
reproducible build is that it cannot
achieve "bit-by-bit
" reproducibility by the
original definition. This is because of differences in the package
format and the way that Fedora builds its packages. RPMs embed the
package signature in the RPM when they are built, but Debian uses
detached signatures. RPMs also include information, such as the build
time (BUILDTIME) and build host (BUILDHOST) in the RPM's
header, that can affect reproducibilty. There was a discussion
about allowing these variables to be overridden. However, the
prevailing opinion was that the information provided by
BUILDHOST is useful, and
overriding its inclusion is not desirable. The contents, however,
should still be "bit-by-bit
" identical, even though that phrase
does not turn up in Fedora's definition.
The openSUSE project, which also distributes software using the RPM format, sets BUILDHOST to "reproducible", according to Jan Zerebecki. The actual build host is printed in the build logs, and interested users can search openSUSE's build logs to find the host.
Path to reproducibility
For BUILDTIME, openSUSE sets the build time to the date of the latest changelog entry. This is provided to builds by the SOURCE_DATE_EPOCH environment variable. This is where Fedora's reproducible builds work began, with a change that was made during the Fedora 38 development cycle to "clamp" the modification time (mtime) of packaged files to SOURCE_DATE_EPOCH. This ensured that the mtimes were independent of the time of an actual build. Packagers were given the ability to opt-out of this if, for some reason, their package would be broken by the new behavior.
During the Fedora 41 development cycle, the project implemented another change in the RPM build process to remove common sources of irreproducibility. That change made use of a Rust program, add-determinism, that attempts to standardize metadata in binary or source files to ensure consistency. It is similar to Debian's strip-nondeterminism, which is a Perl library that is part of the debhelper tool for building Debian packages. Using strip-nondeterminism, the debhelper tool removes non-deterministic information such as timestamps and filesystem ordering from various file and archive formats. The Fedora project chose to write its own tool because it was undesirable to pull Perl into the build root for every package.
According to the new change proposal, the modifications to Fedora's build infrastructure to date have allowed it to make 90% of package builds reproducible. The goal now is to reach 99% of package builds. It appears that Fedora has gotten as much mileage out of infrastructure changes, without requiring individual packagers to deal with reproducibility problems, as it can. To get to 99% the project is going to have to ask packagers to treat reproducibility problems in their packages as bugs.
The change owners—Jędrzejewski-Szmek, Davide Cavalca, and Jelle van der Waa—would package the fedora-repro-build utility to allow developers to make local rebuilds of packages built in Koji (Fedora's build system) to test their reproducibility. It will also require standing up a public instance of rebuilderd, which is a system for providing independent verification that binary packages can be reproduced from source code. It can scan a package repository's metadata for new or updated packages and then queue them for rebuilding, and it provides an API to query for the reproducibility status of packages. Rebuilderd can also, optionally, use the diffoscope tool to generate a report of differences. The Arch Linux reproducible status page provides a good example of rebuilderd in use.
If accepted, the proposal would also require an update to Fedora's packaging guidelines that would say packages should (not, at least currently, "must") build reproducibly and allow bugs to be filed against packages when they are not reproducible.
Aside from the security benefits of reproducibility, the proposal
also makes the case that it will lead to packages of higher
quality. Irreproducible bits in packages are quite often "caused by
an error or sloppiness in the code
". For example, dependence on
hardware architecture in architecture-independent (noarch) packages is
"almost always unwanted and/or a bug
", and reproducibility tests
can uncover those bugs.
The proposal acknowledges that some packages will have problems with reproducibility that cannot be fixed easily. For example, Haskell packages are not currently reproducible when compiled by more than one thread, though a fix is being worked on. Packages produced with Go have debug data that is not reproducible because the GNU Debugger index file (.gdb_index) can be of varying size even given the same input. No fix is yet in the works for that. Another known problem is that the Linux kernel uses an ephemeral key for module signatures. LWN covered a patch set from Thomas Weißschuh that may solve that problem.
Feedback
In the discussion thread on Fedora's Discourse forum, Fedora's
infrastructure lead Kevin Fenzi asked,
"where will this [rebuilderd] instance live and who will maintain
it? 🙂
" He also noted it would be good to have documentation on
setting up a rebuilderd instance. "Otherwise I like the
idea!
" Cavalca said
that the reproducibility work was currently using an Amazon Web
Services (AWS) account sponsored by Meta, but "we can look at
moving into Fedora infra if there's a preference for that
". Fenzi
replied
that it might be good to keep running the work outside Fedora
infrastructure to make it more independent. "Although of course we could run one and
then others could run others and compare
".
Daniel P. Berrangé asked
if rebuilderd could be integrated with Koji so that maintainers did
not have to learn another build tool. "I'm pretty unenthusiastic
about dealing with yet another standalone web service providing
post-build testing.
" Jędrzejewski-Szmek said
that using Koji to perform the build was an interesting idea, but "we
also want our rebuilds to be as independent as possible
", so it
would still be desirable to do them in a system other than
Koji. Rebuilding a package the second time in the same build
environment means "we are not testing much
".
Miroslav Suchý, a member of Fedora's infrastructure team,
wondered
if rebuilderd could submit builds to Fedora's Copr build system
instead of standing up yet another build system in Fedora. This led to
a discussion about Copr's capabilities and whether it would integrate
well with rebuilderd. Jędrzejewski-Szmek noted
that rebuilderd is a "complete project that does things in its own
way
" and it may be complicated to try to teach it to talk to an
external service asynchronously.
Integrating rebuilderd tooling and reports into Fedora's existing
infrastructure has been a recurring theme in the discussion. Simon de
Vlieger said
he was not set on having builds performed in Koji, but wanted the
project "to integrate well with Fedora's pre-existing tools and things so it has the
highest chance of people actually using it
" and performing as
people expect.
Next
The next step for the proposal is to file a ticket with the Fedora Engineering Steering Committee (FESCo), at least one week after the proposal was announced. In this case, that would be no sooner than March 26. If FESCo approves, the owners can begin work on the proposal with an eye to completion by October, when Fedora 43 is planned for release.
Most of Fedora's users have probably not noticed the reproducibility work in Fedora thus far and won't appreciate any difference when they install Fedora 43 (or 44, 45, and so on). However, given the continual efforts of bad actors to find and exploit supply-chain weaknesses in open-source projects, it is a valuable effort nonetheless.
Posted Apr 1, 2025 11:46 UTC (Tue)
by sthibaul (✭ supporter ✭, #54477)
[Link]
To be noted: the only exceptions are the obvious contrib, non-free and non-free-firmware packages. And even for contrib, it's only the packages that really cannot be built without non-free packages, that are usually built by maintainers.
I.e. the released real Debian (the main archive where all the free packages are) is always built on buildds.
Posted Apr 1, 2025 11:58 UTC (Tue)
by bluca (subscriber, #118303)
[Link]
Debian packages are not signed at all. Only the repository metadata is signed. Yes, this means that downloading and installing a package manually (without going through a repository) provides no integrity protections whatsoever.
Posted Apr 1, 2025 13:26 UTC (Tue)
by r1w1s1 (guest, #169987)
[Link]
Posted Apr 2, 2025 5:52 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (1 responses)
As often, the discussion seems to fall a bit short on the testing side. This is not specific to reproducibility, it affects all software in general. Disclaimer: I've only read the LWN article, nothing it points to.
If some feature / parameter is not tested then it does not work. The test suite and efforts are the real specification. But testing (and claiming) build reproducibility is surprisingly hard, probably as hard as testing race conditions (which funny enough can expose build reproducibility issues). You can build in 4 different environments and claim victory until someone tries a slightly different version of some very minor build time dependency that everyone forgot about. While this forgotten dependency may have no security consequence, it's always enough to break the checksums.
In my case the "breakthrough" happened when the same toolchain was available on both Linux and Windows (+ some MinGW magic). That exposed the very "last" round of reproducibility issues. All other issues after that were recent regressions. Two random systems are incredibly unlikely to have some unexpected environment / dependency differences that a Linux and a Windows build systems don't also have. Now you're entering the realm of "theoretical" bugs that are possible in theory but never happen in practice.
It seems possible to build (RPM) packages on "foreign" Linux distributions, maybe that would provide good test coverage?
Posted Apr 2, 2025 8:21 UTC (Wed)
by bmwiedemann (subscriber, #71319)
[Link]
The exception is when you build in a container/chroot. Then only embedding uname -r output would break reproducibility (several packages do that).
And the usual rules apply:
Having worked on a OS that has all its 3000 packages bit-reproducible, race-conditions were indeed among the hardest. You are not guaranteed to trigger them anywhere and you cannot be certain they are gone after a fix.
OTOH, when a race is sufficiently rare, it is still possible to verify official binaries with few tries.
Posted Apr 3, 2025 14:24 UTC (Thu)
by jcpunk (subscriber, #95796)
[Link] (1 responses)
Initially, you'd think this is automatic given the binary consistency. However, if your package contains any symlinks, the mtime on the symlink is set to the timestamp when the archive is unpacked, whereas the binaries have the mtime of their compilation.
So, even if the binaries in the package are reproducible, the files produced by the package have differences on your filesystem.
This is primarily of interest to me for container workflows, if the links in /usr/bin had an mtime of what they linked to, then /usr/bin/ would be deterministic and potentially the layers would be easily duplicated.
Posted Apr 11, 2025 23:18 UTC (Fri)
by zuki (subscriber, #41808)
[Link]
This does not seem to match what I see on Fedora. Maybe rpm gets this right?
$ ls -l /usr/bin/udevadm /usr/lib/systemd/systemd-udevd
As you can see, the package was built Mar 7, and this means the timestamp of 00:00:00 UTC, which is 01:00:00 CET, which is my timezone. Fedora sets SOURCE_DATE_EPOCH from that changelog timestamp, and the mtimes of files in the package are clamped to that.
Debian's exceptions to packages built on buildds are the non-free packages
Debian does not sign packages
things......
Testing
Testing
https://reproducible-builds.org/docs/commandments/
Some race only occurred on a build-machine with HDD. Another race only showed up on the fastest machine with NVMe (8GB/s)
Determinism on installed packages
Determinism on installed packages
-rwxr-xr-x 1 root root 644040 Mar 7 01:00 /usr/bin/udevadm
lrwxrwxrwx 1 root root 17 Mar 7 01:00 /usr/lib/systemd/systemd-udevd -> ../../bin/udevadm
$ rpm -q --changelog systemd|head -n1
* Fri Mar 07 2025 Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> - 257.4-3