By Jake Edge
June 26, 2013
Ensuring that the binary installed on a system corresponds to the source code for a project can be a tricky task. For those who build their own
packages (e.g. Gentoo users), it is in principle a lot easier, but most
Linux users probably delegate that job to their distribution's build
system. That does turn the distribution into a single point of failure,
however—any
compromise of its infrastructure could lead to the release of malicious
packages. Recognizing when that kind of compromise has happened, so that
alarms can be sounded, is not particularly easy to do, though it may become
fairly important over the coming years.
So how do security-conscious users determine that the published source code
(from the project or the distribution) is being reliably built into the
binaries that get installed on their systems? And how do regular users
feel confident that they are not getting binaries compromised by an
attacker (or government)? It is a difficult problem to
solve, but it also important to do so.
Last week, we quoted Mike Perry in our "quotes of the week", but he had
more to say in that liberationtech
mailing list post: "I don't believe that software development
models based on single party trust can actually be secure against
serious adversaries anymore, given the current trends in computer
security and 'cyberwar'". His argument is that the playing field
has changed; we are no longer just defending against attackers with limited
budgets who are mostly interested in monetary rewards from their efforts.
He continued:
This means that software development has to evolve beyond the simple
models of "Trust my gpg-signed apt archive from my trusted build
machine", or even projects like Debian going to end up distributing
state-sponsored malware in short order.
There are plenty of barriers in the way.
Even building from source will not result in a bit-for-bit
copy of the binary in question—things like the link time stamp or build
path strings within the binary
may be different. But differences in the toolchains used, the contents
of the system's or dependencies' include files, or other differences that
don't affect the
operation of the program may also lead to differences in the resulting
binary.
The only reliable way to reproduce a binary is to build it in the
exact same environment (including time stamps) that it was
originally built in. For binaries
shipped by distributions, that may be difficult to do as all of the
build environment information needed may not (yet?) be available. But, two
people can pull
the code from a repository, build it in the same environment with the same
build scripts, and each bit in the resulting binaries should be the same.
That is the idea behind Gitian, which
uses virtualization or containers that can be used to create identical build
environments in
multiple locations. Using Gitian, two (or more) entities can build from
source and compare the hashes of the binaries, which should be the same.
In addition, those people, projects, or organizations can sign the hash
with their GPG key, providing a "web of trust" for a specific binary.
Gitian is being used by projects like Bitcoin and the Tor browser (work on the latter was done by
Perry), both of which are particularly
security-sensitive. In both cases, Gitian is used to set up an
Ubuntu container or virtual machine (VM) to build the binaries (for Linux,
anyway), so support for other distributions (or even different versions of
Ubuntu)
would require a different setup.
That points to a potential problem with Gitian: scalability. For a few
different sensitive projects, creating the scripts and information needed
to build the containers or VMs of interest may not be a huge hurdle. But
for a distribution to set things up so that all of its packages can
be independently verified may well be. In addition, there is a question of
finding people to actually build all the packages so that the hashes can be
compared. Each time a distribution updates its toolchain or the package
dependencies,
those changes would need to be reflected in the Gitian (or some similar
system) configuration, packages would need to be built by both the
distribution and at least one (more is better, of course) other "trusted"
entity before consumers (users) could fully trust the binaries. Given the
number of packages, Linux distributions, and toolchain versions, it would
result in
a combinatorial explosion of builds required.
Beyond that, though, there is still an element of trust inherent in that
verification method. The compiler and other parts of the toolchain are
being trusted to produce correct code for the source in question. The
kernel's KVM and container implementation is also being trusted. To
subvert a binary using the compiler would require a "Trusting Trust" type
of attack, while some kind of nefarious (and undetected) code in the kernel
could potentially subvert the binary.
The diversity of the underlying
kernels (i.e. the host kernel for the container or VM) may help alleviate
most of the concern with that problem—though it can never really eliminate
it. Going deeper, the hardware itself could be malicious. That may sound
overly paranoid (and may in fact be), but when lives are possibly on the
line, as with the Tor browser, it's important to at least think about the
possibility. Money, too, causes paranoia levels to rise, thus Bitcoin's
interest in verification.
In addition to those worries, there is yet another: source code auditing.
Even if the compiler is reproducibly creating "correct" binaries from the
input source code, vigilance is needed to ensure that nothing untoward
slips into the source. In something as complex as a browser, even
run-of-the-mill bugs could be dangerous to the user, but some kind of
targeted malicious code injection would be far worse. In the free software
world, we tend to be sanguine about the dangers of malicious source code
because it is all "out in the open". But if people aren't actually
scrutinizing the source code, malicious code can sneak in, and we have seen
some instances of that in the past. Source availability is no panacea for
avoiding either intentional or accidental security holes.
By and large, the distributions do an excellent job of safeguarding their
repositories and build systems, but there have been lapses there as well
along the way. For now, trusting the distributions or building from source
and trusting the compiler—or some intermediate compiler—is all that's
available. For certain packages, that may change somewhat using Gitian or
similar schemes.
The compiler issue may be alleviated with David A. Wheeler's Diverse Double-Compiling
technique, provided there is an already-trusted compiler at
hand.
As Wheeler has said, one can write a new
compiler to use in the diverse double-compilation, though that compiler
needs to be compiled itself, of course. It may be hard to imagine a
"Trusting Trust" style attack on a completely unknown compiler, but it
isn't impossible to imagine.
As mentioned above, source-binary verification is a hard problem.
Something has to be trusted: the
hardware, the kernel, the compiler, Gitian, or, perhaps, the distribution.
It's a
little hard to see how Gitian could be applied to entire distribution
repositories, so looking into other techniques may prove fruitful. Simply
recording the entire build environment, including versions of all the tools
and dependencies, would make it easier to verify the correspondence of
source and binaries, but even simple tools (e.g. tar as reported
by Jos van den Oever) may have unexplained differences. For now, focusing
the effort
on the most security-critical projects may be the best we can do.
(
Log in to post comments)