|
|
Subscribe / Log in / New account

Lots of progress for Debian's reproducible builds

By Jake Edge
January 21, 2015

Over the last year or two, there has been a lot of talk about "reproducible builds"; that is, for two builds of a given source package to produce byte-for-byte identical binaries. Projects like Bitcoin and Tor have a strong interest in allowing their users to verify that the binaries they distribute correspond exactly to the published source code. For Linux distributions, doing the same for their repositories is much bigger job—hundreds or thousands of source code packages would need to be built in a reproducible way.

As it turns out, at least one distribution is taking that job on. The Debian Reproducible Builds project has recently gotten more than 80% of packages to build reproducibly, as Jérémy Bobbio (aka Lunar) reported. It requires an experimental toolchain to do so, but now covers some 17,000+ packages. Given that Debian's package repository is generally a superset of other distributions' repositories (or close), the work the project is doing should, at minimum, provide other interested distributions with pointers toward ... well ... reproducing this work for themselves.

There are a number of issues that stand in the way of reproducible (or deterministic) builds. First off, the contents of the binaries built for each package are dependent on the build environment, which includes things like tool versions, system time, build paths, host names, and so on. There are also a few more subtle factors, such as that both the ordering of file names in the filesystem and the locale affect how tar creates an archive file. Two seemingly identical filesystem trees can produce different tar files on different systems. Once you have handled all of those factors, though, it is also necessary to record that information with the package so that others can duplicate the results.

The solution to the latter problem for Debian is the .buildinfo file that is based on the format of the .changes file (which indicates what has changed in a new version of a package). .buildinfo records all of the packages required to build the package, along with the version numbers of each. It also has some basic information about the package, its version, hashes of the .deb files, the build path used, and so on. Multiple .deb files of the same package and version that are built on separate machines must all match the hash in .buildinfo in order to have duplicated the build.

The .buildinfo files can then be signed by Debian developers (DDs). The signature asserts that each signing DD was able to reproduce the package exactly using the information found in the file. Those signatures will be kept in separate files that are referenced from a "Build-Signed-Off-By" entry in the "Packages" files. The presence of those signatures will allow users to have confidence in the packages without actually rebuilding them (using the reproducible mechanism, of course) themselves.

For package maintainers who want to make their package reproducible, the project has a How-to page. It contains a recommendation that packagers use the debhelper packaging style, but has tips for those using other styles (including "roll your own"). The experimental toolchain contains modified versions of debhelper and cdbs to incorporate the changes needed for deterministic builds.

There is also a list of the kinds of problems a maintainer may encounter when trying to make their package build reproducibly. This includes issues like the data.tar file (which is the core of a .deb package) being created in the wrong order. The solution to that is to set the locale appropriately and to sort directory listings before handing them off to tar. There are also examples for dealing with timestamps in a whole raft of different kinds of generated files, as well as handling a number of other build problems that lead to non-deterministic packages.

Beyond the changes to debhelper and cdbs, the project has also changed a variety of other pieces of the Debian build infrastructure, including dpkg, build tools for various languages (e.g. Java, Python, R, Haskell), and certain library bindings (e.g. Qt for Python). Most of that work was to handle either timestamps or file-name-ordering problems. All of the changes are making their way upstream so that the normal toolchain can hopefully be used down the road.

While Debian is currently focused on the jessie (8.0) release, Bobbio would like to see reproducible builds become a focus for the following release:

Reproducible builds are not going to change anything for most of our users. They simply don't care how they get software on their computer. But they care to get the right software without having to worry about it. That's our responsibility, as developers. Enabling users to trust their software is important and a major contribution, we as Debian, can make to the wider free software movement. Once Jessie is released, we should make a collective effort to make reproducible builds [a] highlight of our next release.

It is clear that a lot of work is going into the project over the last few months, with eye-opening results. A look at the project history shows that the whole effort has really only been going for a year and a half or so. There is undoubtedly a long tail of packages that will strongly resist reproducibility, so there is still lots of work to do. Given the progress so far, though, having Debian 9.0 be entirely reproducible doesn't seem out of reach.


Index entries for this article
SecurityDeterministic builds


to post comments

Lots of progress for Debian's reproducible builds

Posted Jan 22, 2015 13:37 UTC (Thu) by pabs (subscriber, #43278) [Link]

FYI the .changes file is a set of instructions for how the archive software (dak) should alter the Debian archive.

Lots of progress for Debian's reproducible builds

Posted Jan 22, 2015 13:46 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

> The Debian Reproducible Builds project has recently gotten more than 80% of packages to build reproducibly

That number.

Well, if it has taken 1.5 years to this point, the remaining 20% of the work will be done in 6 years, more or less.

Lots of progress for Debian's reproducible builds

Posted Jan 22, 2015 16:53 UTC (Thu) by Lunar^ (guest, #47323) [Link]

We are also tracking the status of specific package sets so we can focus our efforts to what matters to most users. Another aspect is that until now, it has been a priority of very few Debian contributors. If every maintainers start to pay attention to build reproducibility, it will also help quite a bit for remaining packages.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 22, 2015 22:07 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link] (18 responses)

This is really awesome. I've been following the reproducible (deterministic) build work with interest, and I'm really impressed with the speed of progress. Let's face it, 80% of the huge Debian repository, in such a short time, is quite an accomplishment. I'd like to add a few notes.

First, Debian's not the only one. Fedora is also working on reproducible builds, and I believe there are others.

Second, reproducible builds are also a step towards countering the trusting trust attack (attacks on toolchains). My approach for countering the trusting trust attack, called diverse double-compiling (DDC), first requires that the toolchain portions you care about have a reproducible (deterministic) build. If the whole toolchain is reproducible, it's suddenly much easier to use DDC to counter the trusting trust attack.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 22, 2015 22:33 UTC (Thu) by PaXTeam (guest, #24616) [Link] (16 responses)

DDC doesn't counter the trusting trust attack. it's only a method to produce a trusted toolchain by another, but it does *not* help producing the initial trusted toolchain - you have to do it the hard way, which was Thompson's point.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 22, 2015 22:41 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

You can use multiple compilers (PCC to compile CLang to compile GCC). It's theoretically possible for a sufficiently advanced superintelligent agent to write a software that can recognize the source of these compilers, but it's exceedingly unlikely.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 22, 2015 22:56 UTC (Thu) by PaXTeam (guest, #24616) [Link] (8 responses)

it doesn't matter how many compilers you chain together, you have to start with a trusted one or Thompson's attack will apply. also pattern matching one compiler's AST is about the same work as matching another (read: scales linearly with the number of compilers), it's nothing to do with likeliness.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 23, 2015 12:44 UTC (Fri) by gnb (subscriber, #5132) [Link] (1 responses)

The work of backdooring the compiler scales linearly with the number of compilers, but I wouldn't expect the difficulty of making sure each of the relevant compilers ships with the backdoor included to be linear: you have to patch each of N compilers without any of these attempts being noticed.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 23, 2015 18:40 UTC (Fri) by paulj (subscriber, #341) [Link]

If only binaries came in standardised formats, and had general ways to inject code, and lots of well-known hooks to allow that code to execute (well-knowns hooks supplied by the format, by the runtime, and by the specific compiler code). Oh wait, they do.

In other news, "viruses" are a *lot* more sophisticated since Thompson's POC.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 26, 2015 10:11 UTC (Mon) by epa (subscriber, #39769) [Link] (4 responses)

I think the point is that while in principle Thompson's attack will apply, in practice if you have a big tower of compilers (pcc -> clang -> gcc -> gcc v1.0 -> clang -> etc) it will not be possible for the evil code to be propagated all the way through that chain because it's simply too complicated a problem.

As the other poster said you would need a "superintelligent agent" to write a program that's somehow capable of recognizing and Trojaning code from all these different compilers, plus goodness knows how many others that might be thrown into the mix.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 26, 2015 11:43 UTC (Mon) by tao (subscriber, #17563) [Link]

You could always hack binutils instead... Or make...

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 26, 2015 11:43 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

Even just restricting Thompson's attack to compilers (which is artificially restricting his point), it's not complicated at all, because in modern systems all those compilers come in standardised binary formats with some well-known places you can hook your own code into.

Though, I think some people here have unreasonable assumptions about how isolated different compiler authors are from each other. E.g. another article this very week is discussing GCC AST exports for Emacs, and contains a link to a GCC thread where Sun compiler people are posting patches to GCC. Personally, I think it's highly unreasonable to assume that people who develop very in-depth expertise in compilers will only ever work on one implementation. Not my anecdotal experience at all!

If you don't artificially restrict Thompson's attack, then you've still got the rest of the system to verify: all the software, firmware, microcode and hardware.

Good luck with that.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 26, 2015 22:23 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

So simply add a couple of cross-compilations to the mix. That also applies to make and other low-level utilities.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 28, 2015 18:27 UTC (Wed) by paulj (subscriber, #341) [Link]

When Debian finish making their build reproducible, they can then try work on reproducible cross-compiles.

Though, I don't see why it would be impossible for an ELF virus to contain payloads for a variety of target architectures (especially those known to be used by the reproducible, cross-compile, build checker), and select the appropriate one to infect newly created files. So, I'm not sure what that would achieve.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted May 29, 2015 1:45 UTC (Fri) by indolering (guest, #102865) [Link]

Perfect security is a myth, the name of the game is increasing the attack cost. Deterministic builds allow us to dramatically increase the cost of certain attacks, which is a net gain in terms of security.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 23, 2015 19:03 UTC (Fri) by smoogen (subscriber, #97) [Link] (5 responses)

I would expect that people looking to attack repeatable builds would figure on where people are most likely to put blind trust in something and exploit that.

Easiest? If the .changes files has to do certain things to make sure the archive is altered to "meet matching criteria" Stick something there which will get run every time that sticks in the exploit.

Hard? Everyone looks at the compiler to "deal with Thompson attack" but no one looks at m4, yacc, lex, Makefiles etc where it is easier to stick in some line noise that no one understands and then explain it away as an accident if they do.

Harder? If the system clock or other parts of the build system have to be derandomized to make sure that parts of various code are built exactly correct.. figure out which settings in such a build system are beneficial for your in-code exploit.

In any case, I could actually see where this sort of build system actually makes trust attacks easier because people will blindly say "well it built the same as the first one... must be clean." and not look at what actually was done to make it build exactly like the other part. [I also wonder what kinds of compiler optimizations have to be turned off to make sure that the code compiles the same.]

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 23, 2015 19:06 UTC (Fri) by smoogen (subscriber, #97) [Link]

I didn't put this in originally and should have. Nothing I am saying is to mean that I don't think that verifiable build systems aren't needed or have good uses. I just think that the amount of "This solves ...." is being over stated and quickly leads to -funroll-loops level of blindness.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 29, 2015 15:55 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

I also wonder what kinds of compiler optimizations have to be turned off to make sure that the code compiles the same
Thanks to -frandom-seed, none to speak of. You just need to provide the same seed. (Well, none to speak of in the deterministic set of default optimizations. I presume that profile-guided optimizations are also out of the question, unless you distribute the .gcno files, etc, which is unlikely to happen!)

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Feb 3, 2015 22:29 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

Of course this is wrong. Profile-guided optimizations *that depend on nondeterministic test runs* are out of the question. If the test run is deterministic (thus the branches taken are unchanging across multiple runs), then of course profile-guided optimizations are not incompatible with reproducible builds.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Feb 4, 2015 8:55 UTC (Wed) by cesarb (subscriber, #6266) [Link] (1 responses)

Does profile-guided optimization add profiling code to all branches, or does it use a timer (and a signal) to sample the basic blocks? If it uses a timer, it won't be deterministic.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Feb 17, 2015 19:42 UTC (Tue) by nix (subscriber, #2304) [Link]

It instruments arcs between basic blocks, so it's deterministic.

Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues

Posted Jan 23, 2015 18:37 UTC (Fri) by paulj (subscriber, #341) [Link]

Your DDC work is cool, and people working on doing reproducible builds of entire distros is cool, but, really, it's *reinforcing* Thompson's point about trust-roots, not countering. Anyway....

Watch this one next Saturday

Posted Jan 27, 2015 0:15 UTC (Tue) by MarkVandenBorre (subscriber, #26071) [Link]

Reproducible builds: Moving Beyond Single Points of Failure for Software Distribution

Posted Jan 27, 2015 0:40 UTC (Tue) by jburgess777 (guest, #96085) [Link]

Lots of progress for Debian's reproducible builds

Posted Jan 29, 2015 7:33 UTC (Thu) by robbe (guest, #16131) [Link] (2 responses)

A bit of background for those that don't know it: Jérémy is a very active Tails¹ developer. Tails being one of the major anonymity environments² utilizing (and complementing) Tor, and is a Debian derivative.

It's great that Tails is not sitting on its changes, but contributing them upstream to Debian and beyond. Thanks to Jérémy and the others!

¹ https://tails.boum.org/
² Whonix is the other big one

Lots of progress for Debian's reproducible builds

Posted Jan 29, 2015 17:58 UTC (Thu) by Lunar^ (guest, #47323) [Link] (1 responses)

Sorry to disappoint you, but I have been mostly active in the Tor Project. My contributions to Tails have been sporadic at best. But you are right, contributing back to Debian is one of Tails' focus.

Lots of progress for Debian's reproducible builds

Posted Jan 29, 2015 20:02 UTC (Thu) by robbe (guest, #16131) [Link]

I stand corrected (and am not disappointed)! Both projects are important work.


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds