|
|
Log in / Subscribe / Register

Bootstrappable builds

Bootstrappable builds

Posted Jan 7, 2021 0:53 UTC (Thu) by dvdeug (subscriber, #10998)
Parent article: Bootstrappable builds

It's interesting, but it seems to be wandering into purely academic technicalities. Once you have bootstrapped a modern GCC from non-GCC source, you're done for GCC's part; you have a trusted compiler assuming you can trust the source code, in the sense of Ken Thompson's article. The problem is not making the core smaller; the problem is the GCC tarball, uncompressed, is 740 MB, and even after stripping away various documentation and non-C/C++ directories, it's still over 200 MB. How do you trust that?

In 2008, there was a problem with Debian SSH keys due to an actual patch to OpenSSH in Debian. This was accidental and a patch to OpenSSH. It would have been harder but possible to do it intentionally and via a patch to GCC, so it would recognized OpenSSH and miscompile it as needed. It could be all written out in code, and nobody would be the wiser unless they knew what they were doing GCC-wise and were poking at that section of code.

It's not a bad concern, but it seems at this point to be more about something fun and interesting instead of something that provides any more trust in practice.


to post comments

Bootstrappable builds

Posted Jan 7, 2021 3:13 UTC (Thu) by pabs (subscriber, #43278) [Link]

For trusting source code, there will always be too much code for one developer or one organisation to review. So we need distributed code review, which is being worked on in the rust community.

https://github.com/crev-dev/crev
https://github.com/crev-dev/cargo-crev

Bootstrappable builds

Posted Jan 8, 2021 3:23 UTC (Fri) by goraxe (guest, #42374) [Link] (8 responses)

There have been malware in the wild that does attack tool chains and software has been put out that has had backdoors inserted by software houses affected by this type of malware. There is no guarantee that non gcc C compiler is trusted.

So the bootstrapping from tiny understandable principles is pretty interesting especially if the results are bit for bit comparable as this gives cryptographic verification options.

I could see this having utility in build farms like travis ci, paas systems like aws lambda, google app engine etc. If you need truly trusted binaries this seems like a very viable way of getting them

Bootstrappable builds

Posted Jan 8, 2021 6:48 UTC (Fri) by dvdeug (subscriber, #10998) [Link] (7 responses)

I'm not sure I understand what type of malware you are referring to.

GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with. If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

I don't see how this has utility in build farms, either. The issue where bootstrapping matters is in compilers where attacks can be hidden in the binaries. You're not going to build GCC fresh on every system, and there's a serious question whether downloading a trusted source and building it on a million systems is any safer than downloading a trusted binary and building it on a million systems. If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

What about the linker?

Posted Jan 8, 2021 7:00 UTC (Fri) by eru (subscriber, #2753) [Link] (1 responses)

Should we not also take into account the "ld", "ar" and other bintools? I think the Thompson attack would also work with "ld" and any other tool that is used in the process of generating the final program.

So the bootstrap process should either start with a compiler that directly produces an executable, or also bootstrap the linker without depending on any existing linker.

What about the linker?

Posted Jan 10, 2021 1:11 UTC (Sun) by JoeBuck (guest, #2330) [Link]

The traditional gcc bootstrapping process can work with all of the binutils tools, built together in the same tree (this was pioneered by the Cygnus folks maybe 30 years ago); everything is built again with the new compiler, linker, and assembler to eliminate dependencies. We can demonstrate that the classic attacks in the Thompson paper either don't exist, or have affected every C compiler since the dawn of the language, by starting with unrelated compilers, doing the bootstraps, maybe going through a number of compiler versions and even throwing in cross-compilers, involving a mix of free and proprietary compilers, and verifying that in the end the binaries are the same (for some systems, timestamps have to be filtered out of object files when doing the comparison, but for most, we wind up with every byte identical).

However, we still inherit dependencies from system libraries, and this can include macros and inline functions. Someone could perhaps sneak an attack into a system library function that needs to be coded in assembler for optimal performance, and have this wind up in the compiler. So efforts like these that start with a tiny compiler and a tiny library can eliminate that threat as well.

But I think efforts like this, while fascinating, are a lot less important than they used to be because the real threat these days is in the microcode, the system under the system.

Bootstrappable builds

Posted Jan 8, 2021 20:46 UTC (Fri) by josh (subscriber, #17465) [Link] (4 responses)

> GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with.

As long as the GCC binary didn't have something added that subverts subsequent GCC binaries.

> If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

Only if you start with two different independent compilers, though. The top-level comment of this thread just said "bootstrapped a modern GCC from non-GCC source", which doesn't say anything about diverse double-compilation (using two different non-GCC compilers to compile GCC).

> If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

Source is harder, though, for multiple reasons.

First, "trusting trust"-style attacks would be difficult to obfuscate; it's one thing to hide a security hole, and quite another to hide code that detects a code pattern from a compiler and modifies it such that it affects subsequently compiled code.

The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

And finally, malicious source code is more difficult to deny intent about. With a malicious binary, you could try to claim some internal process was subverted, or blame a random employee, or contractor, or other similar diversions. With malicious source code, you'll have a harder time blaming anything other than malice.

Bootstrappable builds

Posted Jan 12, 2021 23:49 UTC (Tue) by dvdeug (subscriber, #10998) [Link] (3 responses)

> Only if you start with two different independent compilers, though.

I was assuming that you compared to an existing GCC binary. You don't actually have to start from two different non-GCC compilers; if you start from one non-GCC compiler and compare it to the product of an existing GCC binary, if the binaries are the same, then the attack isn't present. If you want to start from two different independent compilers, there's enough of them around.

Also, a "trusting trust" attack for GCC 2.7.2 released in 1995 that targets a chain of compilers eventually building GCC 10 for AMD64, an architecture released in 2000, is inconceivable. (Toss in a pass through Itanium if you think AMD64 is even mildly plausible.) It would be challenging enough to make the attack survive cross-compiling from GCC 10 for AMD64 to GCC 10 for MIPS/ARM/HPPA/PowerPC and back to GCC 10 for AMD64.

> Source is harder, though, for multiple reasons.

The OpenSSL bug was added through a patch. I'm not implying in any way it wasn't an accident, but it was a serious security hole added through source change. For our purposes, the patch fixed a latent bug; OpenSSL relied on reading uninitialized variables, and there's a large bit of rules lawyering on StackOverflow, enough that whatever the actual standard says, a change that detected such a problem and "accidentally" opened up a similar bug, even if limited to certain circumstances, could be plausibly denied to be malicious.

> The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

A bad actor wouldn't post it for review upstream; you toss into Red Hat or Debian or FreeBSD's patches, or stick it into some insecure mirror's copy of the source. Or you use direct access to the git repository.

> And finally, malicious source code is more difficult to deny intent about.

If GCC is bootstrapping itself and producing a different binary from another GCC bootstrap started from a different compiler, there's almost certainly malicious action. (There have been cases where stage 2 and stage 3 won't match, because the starting compiler miscompiled GCC, but not in an unsurvivable way, but you can run a stage 4 and it will match stage 3 and the final stage from other builds.) Once you've discovered the "trusting trust" attack, you can disassemble the binary and it will be obvious that malice was involved, because that couldn't happen by accident.

With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way. Once we've established malice, if Debian or Red Hat were shipping a compromised source or binary, it would trace back to the same paths, and much the same group of people could have slipped it into the supply chain.

Again, the base issue is real, but I think when you start toggling in bootloaders, you've left real-world concerns behind.

Bootstrappable builds

Posted Jan 13, 2021 0:56 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)

> With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way.

Hasn't this already happened? Didn't somebody slip a "if (userid = 0) then" into some program a while back?

And a lot of people are wondering if the NSA or whoever it was deliberately chose a bunch of Elliptic Curve Cryptography constants that were flawed to slip into a standard...

Cheers,
Wol

Bootstrappable builds

Posted Jan 13, 2021 3:20 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

I remember hearing that too, but wasn't it caught in a code review?

Bootstrappable builds

Posted Jan 13, 2021 4:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

That was Linux kernel. An attacker hacked the public CVS mirror to include this code but this was caught by Larry McVoy noticing that BitKeeper history doesn't match.

Here's the fine article from the LWN: https://lwn.net/Articles/57135/

Debian 2008 keys bug

Posted Jan 8, 2021 23:55 UTC (Fri) by aaronmdjones (subscriber, #119973) [Link] (2 responses)

The patch was to OpenSSL, specifically its random number generator, not OpenSSH. OpenSSH just happens to use OpenSSL for its RSA key generation, and RSA key generation requires a good source of random numbers.

Debian 2008 keys bug

Posted Jan 9, 2021 2:11 UTC (Sat) by plugwash (subscriber, #29694) [Link] (1 responses)

Any key generation requires random numbers and AIUI openssh relied on openssl for all it's random number needs.

The key generation issue was awful, but at least you could recognise bad keys (debian shipped a package "openssh-blacklist for a long time because of this), but even worse was that traditional implementations of DSA use random numbers during the signature process and can leak bits of the key if that randomness is not sufficiently random.

This meant that any DSA key that had been merely used with the bad openssl had to be considered compromised. Since there was no way of detecting such keys, this lead to a ban in use of DSA keys on Debians infrastructure (no idea if other organsitions followed suite).

Debian was very fortunate that while it is theoretically possible to transfer keys between the gnupg world and the openssl/openssh/x509 world it was enough of a PITA that people very rarely did. So gnupg (which is the root of identity/trust in the Debian project) could still be considered safe.

Debian 2008 keys bug

Posted Jan 10, 2021 13:34 UTC (Sun) by aaronmdjones (subscriber, #119973) [Link]

> Any key generation requires random numbers and AIUI openssh relied on openssl for all it's random number needs.

Back then, it did, yes. OpenSSH 6.5 (adding support for Ed25519 keys) didn't arrive for another 6 years, and OpenSSH 6.8 (allowing it to be built without OpenSSL) didn't arrive for another year after that. These days you can build it without, and then it will use urandom(4) [Linux, among others] or arc4random(3) [OpenBSD].


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds