|
|
Log in / Subscribe / Register

Bootstrappable builds

Bootstrappable builds

Posted Jan 8, 2021 3:23 UTC (Fri) by goraxe (guest, #42374)
In reply to: Bootstrappable builds by dvdeug
Parent article: Bootstrappable builds

There have been malware in the wild that does attack tool chains and software has been put out that has had backdoors inserted by software houses affected by this type of malware. There is no guarantee that non gcc C compiler is trusted.

So the bootstrapping from tiny understandable principles is pretty interesting especially if the results are bit for bit comparable as this gives cryptographic verification options.

I could see this having utility in build farms like travis ci, paas systems like aws lambda, google app engine etc. If you need truly trusted binaries this seems like a very viable way of getting them


to post comments

Bootstrappable builds

Posted Jan 8, 2021 6:48 UTC (Fri) by dvdeug (subscriber, #10998) [Link] (7 responses)

I'm not sure I understand what type of malware you are referring to.

GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with. If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

I don't see how this has utility in build farms, either. The issue where bootstrapping matters is in compilers where attacks can be hidden in the binaries. You're not going to build GCC fresh on every system, and there's a serious question whether downloading a trusted source and building it on a million systems is any safer than downloading a trusted binary and building it on a million systems. If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

What about the linker?

Posted Jan 8, 2021 7:00 UTC (Fri) by eru (subscriber, #2753) [Link] (1 responses)

Should we not also take into account the "ld", "ar" and other bintools? I think the Thompson attack would also work with "ld" and any other tool that is used in the process of generating the final program.

So the bootstrap process should either start with a compiler that directly produces an executable, or also bootstrap the linker without depending on any existing linker.

What about the linker?

Posted Jan 10, 2021 1:11 UTC (Sun) by JoeBuck (guest, #2330) [Link]

The traditional gcc bootstrapping process can work with all of the binutils tools, built together in the same tree (this was pioneered by the Cygnus folks maybe 30 years ago); everything is built again with the new compiler, linker, and assembler to eliminate dependencies. We can demonstrate that the classic attacks in the Thompson paper either don't exist, or have affected every C compiler since the dawn of the language, by starting with unrelated compilers, doing the bootstraps, maybe going through a number of compiler versions and even throwing in cross-compilers, involving a mix of free and proprietary compilers, and verifying that in the end the binaries are the same (for some systems, timestamps have to be filtered out of object files when doing the comparison, but for most, we wind up with every byte identical).

However, we still inherit dependencies from system libraries, and this can include macros and inline functions. Someone could perhaps sneak an attack into a system library function that needs to be coded in assembler for optimal performance, and have this wind up in the compiler. So efforts like these that start with a tiny compiler and a tiny library can eliminate that threat as well.

But I think efforts like this, while fascinating, are a lot less important than they used to be because the real threat these days is in the microcode, the system under the system.

Bootstrappable builds

Posted Jan 8, 2021 20:46 UTC (Fri) by josh (subscriber, #17465) [Link] (4 responses)

> GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with.

As long as the GCC binary didn't have something added that subverts subsequent GCC binaries.

> If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

Only if you start with two different independent compilers, though. The top-level comment of this thread just said "bootstrapped a modern GCC from non-GCC source", which doesn't say anything about diverse double-compilation (using two different non-GCC compilers to compile GCC).

> If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

Source is harder, though, for multiple reasons.

First, "trusting trust"-style attacks would be difficult to obfuscate; it's one thing to hide a security hole, and quite another to hide code that detects a code pattern from a compiler and modifies it such that it affects subsequently compiled code.

The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

And finally, malicious source code is more difficult to deny intent about. With a malicious binary, you could try to claim some internal process was subverted, or blame a random employee, or contractor, or other similar diversions. With malicious source code, you'll have a harder time blaming anything other than malice.

Bootstrappable builds

Posted Jan 12, 2021 23:49 UTC (Tue) by dvdeug (subscriber, #10998) [Link] (3 responses)

> Only if you start with two different independent compilers, though.

I was assuming that you compared to an existing GCC binary. You don't actually have to start from two different non-GCC compilers; if you start from one non-GCC compiler and compare it to the product of an existing GCC binary, if the binaries are the same, then the attack isn't present. If you want to start from two different independent compilers, there's enough of them around.

Also, a "trusting trust" attack for GCC 2.7.2 released in 1995 that targets a chain of compilers eventually building GCC 10 for AMD64, an architecture released in 2000, is inconceivable. (Toss in a pass through Itanium if you think AMD64 is even mildly plausible.) It would be challenging enough to make the attack survive cross-compiling from GCC 10 for AMD64 to GCC 10 for MIPS/ARM/HPPA/PowerPC and back to GCC 10 for AMD64.

> Source is harder, though, for multiple reasons.

The OpenSSL bug was added through a patch. I'm not implying in any way it wasn't an accident, but it was a serious security hole added through source change. For our purposes, the patch fixed a latent bug; OpenSSL relied on reading uninitialized variables, and there's a large bit of rules lawyering on StackOverflow, enough that whatever the actual standard says, a change that detected such a problem and "accidentally" opened up a similar bug, even if limited to certain circumstances, could be plausibly denied to be malicious.

> The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

A bad actor wouldn't post it for review upstream; you toss into Red Hat or Debian or FreeBSD's patches, or stick it into some insecure mirror's copy of the source. Or you use direct access to the git repository.

> And finally, malicious source code is more difficult to deny intent about.

If GCC is bootstrapping itself and producing a different binary from another GCC bootstrap started from a different compiler, there's almost certainly malicious action. (There have been cases where stage 2 and stage 3 won't match, because the starting compiler miscompiled GCC, but not in an unsurvivable way, but you can run a stage 4 and it will match stage 3 and the final stage from other builds.) Once you've discovered the "trusting trust" attack, you can disassemble the binary and it will be obvious that malice was involved, because that couldn't happen by accident.

With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way. Once we've established malice, if Debian or Red Hat were shipping a compromised source or binary, it would trace back to the same paths, and much the same group of people could have slipped it into the supply chain.

Again, the base issue is real, but I think when you start toggling in bootloaders, you've left real-world concerns behind.

Bootstrappable builds

Posted Jan 13, 2021 0:56 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)

> With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way.

Hasn't this already happened? Didn't somebody slip a "if (userid = 0) then" into some program a while back?

And a lot of people are wondering if the NSA or whoever it was deliberately chose a bunch of Elliptic Curve Cryptography constants that were flawed to slip into a standard...

Cheers,
Wol

Bootstrappable builds

Posted Jan 13, 2021 3:20 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

I remember hearing that too, but wasn't it caught in a code review?

Bootstrappable builds

Posted Jan 13, 2021 4:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

That was Linux kernel. An attacker hacked the public CVS mirror to include this code but this was caught by Larry McVoy noticing that BitKeeper history doesn't match.

Here's the fine article from the LWN: https://lwn.net/Articles/57135/


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds