|
|
Subscribe / Log in / New account

Development

Reproducible builds

By Jake Edge
April 12, 2017

LibrePlanet

At his LibrePlanet 2017 talk, Vagrant Cascadian gave an overview of the reproducible builds project, which seeks to make it so that all software projects can be reliably built in such a way that users can ensure that the source code provided is the same as what was used to build a binary. His talk was partly aimed at getting attendees ready for a two-slot hands-on workshop on how to actually turn a software project into one that can be reproducibly built. LibrePlanet was held March 25-26 in Cambridge, Massachusetts at the Stata Center on the campus of MIT.

[Vagrant Cascadian]

Cascadian has been involved in free software for a long time. He remembers getting a whole bunch of Linux distribution CDs in the mail and finding one in particular, Debian, that stood out, in part because of its social contract. But he soon realized that even though the source code is available, there is no way to be sure that the binaries that get installed actually come from that source. Obviously, if there was no connection between the two, it would be noticeable, so the kinds of changes that could slip through are the "small, insidious changes".

In addition, reproducibility is a key component of the scientific method. If you are building software and it is not reproducible, "how is that science?" There are some simple checks that could be done using checksums or hashes of the output of a test suite, for example, but that only tests areas that we already know are problematic. The project wants to find things that we don't know about, so it is focused on creating binaries that are bit-for-bit identical.

Software is built from more than just the source code, and the binary that results is affected by various other things: the build instructions, toolchain (compiler, linker, libraries, and so on), and the environment (time of build, running kernel version string, and others). The environment is what generally makes reproducible builds difficult; by and large those pieces aren't really needed. If that gets removed, and the same versions of the toolchain pieces are used, it should result in identical binaries that can then be verified by anyone.

Cascadian noted that the famous "Reflections on Trusting Trust [PDF]" lecture by Ken Thompson in 1984 and pointed out that little has been done to fix the problem in the intervening years. David A. Wheeler's Diverse Double-Compiling technique could be used to combat attacks of the nature that Thompson described. However, in order to use the double-compiling technique, reproducible builds are needed.

[Stata Center]

Reproducibility is important for other reasons too, Cascadian said. He pointed to an off-by-one error in OpenSSH (CVE-2002-0083) that led to privilege escalation. It could be fixed using a hex editor—or it could be reintroduced that way. In addition, we had never seen a "trusting trust" attack until 2015, when the XcodeGhost malware used a compiler backdoor to add malicious code to some 4000 apps in Apple's AppStore.

Furthermore, if you are not running the software you think you are, it undermines all of the promises that free software brings. You can still run the code, "I guess", but studying the code is severely hampered if other code is included behind the scenes. You can try to fix the code, but it is moot if other code can be injected. And you certainly don't want to share the code if you don't know what's actually in it. So it undermines the four freedoms.

Reproducible builds have been mentioned on the Debian mailing lists since back in 2007. In late 2014, Debian started automatically rebuilding the 25,000 source packages in its archive. Currently, it is building 1600-2200 packages per day for each of four different architectures (amd64, i386, arm64, and armhf). The reproducible builds project has gotten to the point where all but 5% of the software in Debian testing, which amounts to 1300 packages, can be built reproducibly.

The biggest problem area for making a package that can build reproducibly is timestamps embedded in the binary. That is how he got involved in the project. He is a maintainer of the U-Boot boot loader project and noticed that it was listed as a reproducible build, but knew that was impossible due to the inclusion of build timestamps in the binary. The best way forward is for projects to remove the timestamps entirely and use a commit ID or commit timestamp. But for those projects that really need the build timestamp, adding support for the SOURCE_DATE_EPOCH environment variable will allow building reproducibly.

There are other common problems that make bit-for-bit identical binaries difficult. That includes things like time zones, file sort order, build paths, and locales. At this point, the project is working on the "last mile" problems; work is progressing on handling build path differences, for example.

He noted that he had mostly talked about Debian, but there are a "huge number of other projects" that are also working on the problem. Several Linux distributions (Fedora, openSUSE, Tails, Arch) are part of the effort, as are applications such as Bitcoin and Tor Browser. NixOS and GNU Guix are particularly interesting because they already incorporate the idea of reproducibility to some extent.

Moving forward, he said, there is of course more work to do. Since Debian can reproducibly build 95% of its 25,000 packages, though, it is clearly edging out of the proof-of-concept stage. He would like to see a way for users to be able to only install reproducible packages and to be able to specify a threshold of other users who have built the code identically before a package will be installed. Eventually distributions with support for that will come out; Debian will be one of them, but not in the next release that is due soon. He would also like to see reproducible builds as a standard development practice in the free-software world.

He concluded by thanking several organizations that have supported the developers working on the project: the Core Infrastructure Initiative, ProfitBricks, and Codethink. He also thanked the developers and others who are working hard on reproducible builds. He reminded attendees of the upcoming workshop and suggested that they bring their favorite project along to work on making it reproducibly buildable.

[I would like to thank the Linux Foundation for travel assistance to Cambridge, MA for LibrePlanet.]

Comments (4 posted)

Brief items

Development quotes of the week

...if you are just somebody that would like to start contributing with anything:
  • Choose a project that you like
  • Download the code
  • Compile the application
  • Choose anything easy to be your first fix
  • Create a Fix for that
  • Let it sink for a few hours and feel the inner peace
  • Go outside, see a movie, go on a date.
  • When you are back, take a *deep breath*
  • Send the patch
  • it’s ok if you faint later
Tomaz Canabrava

Let me disabuse you of any myths. I have worked in software for 20 years. I have worked in large enterprises, and scrappy startups. This software is by FAR the largest, most complex codebase I have ever interacted with. Submission of any new code was seriously considered and reviewed before it entered production (sometimes to a pedantic degree), after which JD put all new code through 10s of thousands of hours of testing on production equipment. Production and release cycles take on the order of months to ensure that we don't kill people. These are not riding lawnmowers. They are 30-ton combines, and 20 ton tractors tilling fields, with massive horsepower behind them. They have a real potential to end peoples lives in the event of failure, and these tractors do (in testing) fail in spectacular ways. If a team of hundred of engineers struggle with their codebase internally, Joe Farmer isn't going to have a fucking clue how to repair their software correctly.

Now should you, in theory, have the right to modify equipment you own? Sure. Absolutely. Hell, John Deere tractors run on open source software. But trust me on this, locking this down is a very good idea.

If you have the drive to make open source tractor software AND can make absolutely certain no-one ever dies from code you write, then go do it. Just keep in mind that the engineers that work on this shit really care about keeping people safe.

throwaway_jddev (Thanks to Paul Wise)

The DRM community really has come a long, long, way. Great to see it so thriving and healthy that people are actively dusting off ancient drivers which never got merged, deleting most of them in the process, and getting them in just because the process works so well.
Daniel Stone

Cap'n Proto's capability system does not allow one to send a promise to a third party. It's possible in theory but in practice it'll lead to pain, suffering and CORBA.
Cyberax (Thanks to Jeroen Nijhof)

Comments (3 posted)

The new contribution workflow for GNOME

The GNOME Project has announced a streamlined contribution system built around a Flatpak-based build system. "No specific distribution required. No specific version required. No dependencies hell. Reproducible, if it builds for me it will build for you. All with an UI and integrated, no terminal required. Less than five minutes of downloading plus building and you are contributing."

Comments (11 posted)

Haas: New Features Coming in PostgreSQL 10

Here's an extensive summary of new features in the upcoming PostgreSQL 10 release from Robert Haas. "PostgreSQL has had physical replication -- often called streaming replication -- since version 9.0, but this requires replicating the entire database, cannot tolerate writes in any form on the standby server, and is useless for replicating across versions or database systems. PostgreSQL has had logical decoding -- basically change capture -- since version 9.4, which has been embraced with enthusiasm, but it could not be used for replication without an add-on of some sort. PostgreSQL 10 adds logical replication which is very easy to configure and which works at table granularity, clearly a huge step forward. It will copy the initial data for you and then keep it up to date after that."

Comments (4 posted)

Nginx 1.12 Released

The Nginx web server version 1.12 has been released, "incorporating new features and bug fixes from the 1.11.x mainline branch - including variables support and other improvements in the stream module, HTTP/2 fixes, support for multiple SSL certificates of different types, improved dynamic modules support, and more." The changelog has more details.

Comments (10 posted)

Portable Computing Language (pocl) v0.14 released

Pocl aims to become a performance portable open source (MIT-licensed) implementation of the OpenCL standard. Version 0.14 adds support for LLVM/Clang 4.0 and 3.9 and a new binary format that enables running OpenCL programs on hosts without online compiler support. There is also initial support for out-of-order command queue task scheduling and plenty of bug fixes.

Comments (none posted)

Stone: Ubuntu rejoins the GNOME fold

Daniel Stone considers the future of the Linux desktop in the light of Ubuntu's return to GNOME. "The world in 2017, however, is a very different place. KMS provides us truly device-independent display control, Vulkan and EGL provide us GPU acceleration independent of window system, xkbcommon provides shared keyboard mechanics, and logind lets us do all these things without ever being root. GBM allocates our buffers, and the universal allocator, borne out of discussions with the whole community including NVIDIA, will soon join the family. Mir leans heavily on all these technologies, so the change is a bit less seismic than you might think."

Comments (11 posted)

Page editor: Rebecca Sobol
Next page: Announcements>>


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds