Lots of progress for Debian's reproducible builds
Over the last year or two, there has been a lot of talk about "reproducible builds"; that is, for two builds of a given source package to produce byte-for-byte identical binaries. Projects like Bitcoin and Tor have a strong interest in allowing their users to verify that the binaries they distribute correspond exactly to the published source code. For Linux distributions, doing the same for their repositories is much bigger job—hundreds or thousands of source code packages would need to be built in a reproducible way.
As it turns out, at least one distribution is taking that job on. The Debian Reproducible Builds project has recently gotten more than 80% of packages to build reproducibly, as Jérémy Bobbio (aka Lunar) reported. It requires an experimental toolchain to do so, but now covers some 17,000+ packages. Given that Debian's package repository is generally a superset of other distributions' repositories (or close), the work the project is doing should, at minimum, provide other interested distributions with pointers toward ... well ... reproducing this work for themselves.
There are a number of issues that stand in the way of reproducible (or deterministic) builds. First off, the contents of the binaries built for each package are dependent on the build environment, which includes things like tool versions, system time, build paths, host names, and so on. There are also a few more subtle factors, such as that both the ordering of file names in the filesystem and the locale affect how tar creates an archive file. Two seemingly identical filesystem trees can produce different tar files on different systems. Once you have handled all of those factors, though, it is also necessary to record that information with the package so that others can duplicate the results.
The solution to the latter problem for Debian is the .buildinfo file that is based on the format of the .changes file (which indicates what has changed in a new version of a package). .buildinfo records all of the packages required to build the package, along with the version numbers of each. It also has some basic information about the package, its version, hashes of the .deb files, the build path used, and so on. Multiple .deb files of the same package and version that are built on separate machines must all match the hash in .buildinfo in order to have duplicated the build.
The .buildinfo files can then be signed by Debian developers (DDs). The signature asserts that each signing DD was able to reproduce the package exactly using the information found in the file. Those signatures will be kept in separate files that are referenced from a "Build-Signed-Off-By" entry in the "Packages" files. The presence of those signatures will allow users to have confidence in the packages without actually rebuilding them (using the reproducible mechanism, of course) themselves.
For package maintainers who want to make their package reproducible, the project has a How-to page. It contains a recommendation that packagers use the debhelper packaging style, but has tips for those using other styles (including "roll your own"). The experimental toolchain contains modified versions of debhelper and cdbs to incorporate the changes needed for deterministic builds.
There is also a list of the kinds of problems a maintainer may encounter when trying to make their package build reproducibly. This includes issues like the data.tar file (which is the core of a .deb package) being created in the wrong order. The solution to that is to set the locale appropriately and to sort directory listings before handing them off to tar. There are also examples for dealing with timestamps in a whole raft of different kinds of generated files, as well as handling a number of other build problems that lead to non-deterministic packages.
Beyond the changes to debhelper and cdbs, the project has also changed a variety of other pieces of the Debian build infrastructure, including dpkg, build tools for various languages (e.g. Java, Python, R, Haskell), and certain library bindings (e.g. Qt for Python). Most of that work was to handle either timestamps or file-name-ordering problems. All of the changes are making their way upstream so that the normal toolchain can hopefully be used down the road.
While Debian is currently focused on the jessie (8.0) release, Bobbio would like to see reproducible builds become a focus for the following release:
It is clear that a lot of work is going into the project over the last few months, with eye-opening results. A look at the project history shows that the whole effort has really only been going for a year and a half or so. There is undoubtedly a long tail of packages that will strongly resist reproducibility, so there is still lots of work to do. Given the progress so far, though, having Debian 9.0 be entirely reproducible doesn't seem out of reach.
Index entries for this article | |
---|---|
Security | Deterministic builds |
Posted Jan 22, 2015 13:37 UTC (Thu)
by pabs (subscriber, #43278)
[Link]
Posted Jan 22, 2015 13:46 UTC (Thu)
by dgm (subscriber, #49227)
[Link] (1 responses)
That number.
Well, if it has taken 1.5 years to this point, the remaining 20% of the work will be done in 6 years, more or less.
Posted Jan 22, 2015 16:53 UTC (Thu)
by Lunar^ (guest, #47323)
[Link]
Posted Jan 22, 2015 22:07 UTC (Thu)
by david.a.wheeler (subscriber, #72896)
[Link] (18 responses)
This is really awesome. I've been following the reproducible (deterministic) build work with interest, and I'm really impressed with the speed of progress. Let's face it, 80% of the huge Debian repository, in such a short time, is quite an accomplishment. I'd like to add a few notes.
First, Debian's not the only one. Fedora is also working on reproducible builds, and I believe there are others.
Second, reproducible builds are also a step towards countering the trusting trust attack (attacks on toolchains).
My approach for countering the trusting trust attack, called diverse double-compiling (DDC), first requires that the toolchain portions you care about have a reproducible (deterministic) build. If the whole toolchain is reproducible, it's suddenly much easier to use DDC to counter the trusting trust attack.
Posted Jan 22, 2015 22:33 UTC (Thu)
by PaXTeam (guest, #24616)
[Link] (16 responses)
Posted Jan 22, 2015 22:41 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
Posted Jan 22, 2015 22:56 UTC (Thu)
by PaXTeam (guest, #24616)
[Link] (8 responses)
Posted Jan 23, 2015 12:44 UTC (Fri)
by gnb (subscriber, #5132)
[Link] (1 responses)
Posted Jan 23, 2015 18:40 UTC (Fri)
by paulj (subscriber, #341)
[Link]
In other news, "viruses" are a *lot* more sophisticated since Thompson's POC.
Posted Jan 26, 2015 10:11 UTC (Mon)
by epa (subscriber, #39769)
[Link] (4 responses)
As the other poster said you would need a "superintelligent agent" to write a program that's somehow capable of recognizing and Trojaning code from all these different compilers, plus goodness knows how many others that might be thrown into the mix.
Posted Jan 26, 2015 11:43 UTC (Mon)
by tao (subscriber, #17563)
[Link]
Posted Jan 26, 2015 11:43 UTC (Mon)
by paulj (subscriber, #341)
[Link] (2 responses)
Though, I think some people here have unreasonable assumptions about how isolated different compiler authors are from each other. E.g. another article this very week is discussing GCC AST exports for Emacs, and contains a link to a GCC thread where Sun compiler people are posting patches to GCC. Personally, I think it's highly unreasonable to assume that people who develop very in-depth expertise in compilers will only ever work on one implementation. Not my anecdotal experience at all!
If you don't artificially restrict Thompson's attack, then you've still got the rest of the system to verify: all the software, firmware, microcode and hardware.
Good luck with that.
Posted Jan 26, 2015 22:23 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Jan 28, 2015 18:27 UTC (Wed)
by paulj (subscriber, #341)
[Link]
Though, I don't see why it would be impossible for an ELF virus to contain payloads for a variety of target architectures (especially those known to be used by the reproducible, cross-compile, build checker), and select the appropriate one to infect newly created files. So, I'm not sure what that would achieve.
Posted May 29, 2015 1:45 UTC (Fri)
by indolering (guest, #102865)
[Link]
Posted Jan 23, 2015 19:03 UTC (Fri)
by smoogen (subscriber, #97)
[Link] (5 responses)
Easiest? If the .changes files has to do certain things to make sure the archive is altered to "meet matching criteria" Stick something there which will get run every time that sticks in the exploit.
Hard? Everyone looks at the compiler to "deal with Thompson attack" but no one looks at m4, yacc, lex, Makefiles etc where it is easier to stick in some line noise that no one understands and then explain it away as an accident if they do.
Harder? If the system clock or other parts of the build system have to be derandomized to make sure that parts of various code are built exactly correct.. figure out which settings in such a build system are beneficial for your in-code exploit.
In any case, I could actually see where this sort of build system actually makes trust attacks easier because people will blindly say "well it built the same as the first one... must be clean." and not look at what actually was done to make it build exactly like the other part. [I also wonder what kinds of compiler optimizations have to be turned off to make sure that the code compiles the same.]
Posted Jan 23, 2015 19:06 UTC (Fri)
by smoogen (subscriber, #97)
[Link]
Posted Jan 29, 2015 15:55 UTC (Thu)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Feb 3, 2015 22:29 UTC (Tue)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Feb 4, 2015 8:55 UTC (Wed)
by cesarb (subscriber, #6266)
[Link] (1 responses)
Posted Feb 17, 2015 19:42 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Jan 23, 2015 18:37 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Jan 27, 2015 0:15 UTC (Tue)
by MarkVandenBorre (subscriber, #26071)
[Link]
Posted Jan 27, 2015 0:40 UTC (Tue)
by jburgess777 (guest, #96085)
[Link]
Description: http://events.ccc.de/congress/2014/Fahrplan/events/6240.html
Posted Jan 29, 2015 7:33 UTC (Thu)
by robbe (guest, #16131)
[Link] (2 responses)
It's great that Tails is not sitting on its changes, but contributing them upstream to Debian and beyond. Thanks to Jérémy and the others!
¹ https://tails.boum.org/
Posted Jan 29, 2015 17:58 UTC (Thu)
by Lunar^ (guest, #47323)
[Link] (1 responses)
Posted Jan 29, 2015 20:02 UTC (Thu)
by robbe (guest, #16131)
[Link]
Lots of progress for Debian's reproducible builds
Lots of progress for Debian's reproducible builds
We are also tracking the status of specific package sets so we can focus our efforts to what matters to most users. Another aspect is that until now, it has been a priority of very few Debian contributors. If every maintainers start to pay attention to build reproducibility, it will also help quite a bit for remaining packages.
Lots of progress for Debian's reproducible builds
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Perfect security is a myth, the name of the game is increasing the attack cost. Deterministic builds allow us to dramatically increase the cost of certain attacks, which is a net gain in terms of security.
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
I also wonder what kinds of compiler optimizations have to be turned off to make sure that the code compiles the same
Thanks to -frandom-seed, none to speak of. You just need to provide the same seed. (Well, none to speak of in the deterministic set of default optimizations. I presume that profile-guided optimizations are also out of the question, unless you distribute the .gcno files, etc, which is unlikely to happen!)
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Awesome! Fedora also, and this is a step towards countering "Trusting Trust" toolchain issues
Watch this one next Saturday
Reproducible builds: Moving Beyond Single Points of Failure for Software Distribution
Slides: http://events.ccc.de/congress/2014/Fahrplan/system/attach...
Video: http://media.ccc.de/browse/congress/2014/31c3_-_6240_-_en...
Lots of progress for Debian's reproducible builds
² Whonix is the other big one
Sorry to disappoint you, but I have been mostly active in the Tor Project. My contributions to Tails have been sporadic at best.
But you are right, contributing back to Debian is one of Tails' focus.
Lots of progress for Debian's reproducible builds
Lots of progress for Debian's reproducible builds