LWN: Comments on "Bootstrappable builds"

Bootstrappable builds

geert — Wed, 20 Jan 2021 10:15:30 +0000

Why not? It doesn't make a difference if the logic gates are implemented by relays or semiconductors.

Bootstrappable builds

immibis — Tue, 19 Jan 2021 17:57:59 +0000

Build the computer out of relays, surely.

Of course, such a computer will occupy the size of at least a refrigerator, and execute perhaps 10 instructions per second.

But you cannot possibly introduce fabrication defects into a relay-based design that passes its tests.

Now you can use this to bootstrap your software for other computers that can actually run at practical speeds.

Bootstrappable builds

gdt — Mon, 18 Jan 2021 03:54:13 +0000

The fabrication is shown to be correct using traceability. That is, every part of the proof is expressed in matching parts in hardware, and there is no additional hardware. This leads to a very different hardware design, one which will not perform well (eg, it's desirable to have a very long instruction word, as that makes traceability easier, but there's a high cost to fetching such instructions from memory. Especially since instruction caches and pipelines are very difficult to model, and so are usually not present).

That's the design issue for responding to Spectre. We want mathematical proof that processor designs don't leak state between processes. But we don't want to pay the price for the extreme proof and traceability of cryptographic processors.

Choosing hardware for bootstraping and diverse double-compiling

GNUtoo — Sun, 17 Jan 2021 08:55:12 +0000

There are many issues that needs to be fixed to get a robust free software and open source infrastructure.

Not all the issues affect everybody in the same way, so it's still good to fix them.

For instance for the Management Engine or equivalent, even if it's present in most recent computers, in some cases it's possible to avoid it completely.

For storage devices (SSD, HDD, etc) firmwares, it's still possible to workaround by booting off an SPI flash or raw NANDs and using LUKS on the mass storage device in a way that makes it very difficult for the firmware to attack the host system through modification of the data as everything is encrypted on it. For instance Coreboot/Libreboot+GRUB or u-boot/barebox + Linux + an initramfs can achieve that pretty easily.

So being able to take out of the equation the compiler and what was needed to produce it (both the software and hardware) also makes trusting the software much more easier.

In the case of Mes, as I understand it, it still depends on the system used to do the compilation which includes both the software and the hardware. In addition, getting the same binary out of a diverse double-compilation only ensure that either the backdoor is the same or that both have no backdoor.

The issue is also that while we have some information on real world attacks (XcodeGhost) and that we basically know what it takes to do very simple compiler modifications that propagate themselves inside subsequent compilers being built, it's hard to really understand the threat as some of the companies and government agencies that work on offensive security have large budgets and try to keep a big part of their work secret. In addition, not everything they do is published (for instance Edward Snowden probably didn't retrieve and give everything that he had access to to the Journalists which probably didn't publish everything either).

That said, if we want to bootstrap a C compiler, we still need hardware and software.

If I understood correctly with something like the stage0 implementation, we won't need a kernel nor an operating system, and given enough work it could be used to somehow bootstrap a compiler, kernel and operating system.

So I wonder what type of hardware would make sense to run a stage0 implementation:
- If you use Coreboot / Libreboot on a desktop (to avoid the issue of the embedded controller) with an I945 chipset (as there is free code to initialize the GPU / display controller), and find peripherals that you can somehow trust, you still end up having to build Coreboot / Libreboot, so it's probably not the best option here. And you probably cannot review the assembly of the Coreboot / Libreboot image as they are way too big. As for writing a smaller version of them, they'd probably still end up being quite complex if we need RAM or access to a display controller. We can use the CPU cache as RAM quite easily though, but I'm unsure if that would be sufficient for the display controller part of the GPU and a very basic stage0. Installing that code would also be quite challenging as you'd need to trust some SPI flash programmer as well.
- Another approach would be to find an ARM SOC that has a bootrom that has been dumped and reviewed where users can easily input code and that has a display controller that don't need complex software to be used. This still bring in the hardware as something users have to trust blindly.
- Yet another approach would be to use FPGAs like an ECP5 with something like LiteX and the free toolchain for it. Here you could review the HDL code but this brings in way more software dependencies as you need to actually produce the FPGA image.
- Another option would be to use very old and well known hardware (like an Altair 8800) and somehow manage to use that to bootstrap the stage0. Though they are probably not always easy to find.

There are also procedure in place that could increase trust through randomness: the key signing ceremony, which is a procedure to setup an HSM can also be modified to be used for installing software but not necessarily for producing it.

For instance if you want to install Coreboot / Libreboot and that you trust your SPI flash programmer and can build an image in a reproducible way, you can get random computer, remove the radios, install the software to talk to the SPI flash programmer and make all of them read/write the image to the SPI flash in a random order without giving any of them the ability to know if there will be another computer that will do the same thing right after. This way any computer can detect if what's on the SPI flash is what is supposed to be there or if another computer has modified it. None of the computers will also be able to predict if they'll be the last computer to check that flash chip.

Denis.

Bootstrappable builds

Cyberax — Wed, 13 Jan 2021 04:02:08 +0000

That was Linux kernel. An attacker hacked the public CVS mirror to include this code but this was caught by Larry McVoy noticing that BitKeeper history doesn't match.

Here's the fine article from the LWN: https://lwn.net/Articles/57135/

Bootstrappable builds

mathstuf — Wed, 13 Jan 2021 03:20:26 +0000

I remember hearing that too, but wasn't it caught in a code review?

Bootstrappable builds

Wol — Wed, 13 Jan 2021 00:56:41 +0000

> With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way.

Hasn't this already happened? Didn't somebody slip a "if (userid = 0) then" into some program a while back?

And a lot of people are wondering if the NSA or whoever it was deliberately chose a bunch of Elliptic Curve Cryptography constants that were flawed to slip into a standard...

Cheers,
Wol

Bootstrappable builds

dvdeug — Tue, 12 Jan 2021 23:49:26 +0000

> Only if you start with two different independent compilers, though.

I was assuming that you compared to an existing GCC binary. You don't actually have to start from two different non-GCC compilers; if you start from one non-GCC compiler and compare it to the product of an existing GCC binary, if the binaries are the same, then the attack isn't present. If you want to start from two different independent compilers, there's enough of them around.

Also, a "trusting trust" attack for GCC 2.7.2 released in 1995 that targets a chain of compilers eventually building GCC 10 for AMD64, an architecture released in 2000, is inconceivable. (Toss in a pass through Itanium if you think AMD64 is even mildly plausible.) It would be challenging enough to make the attack survive cross-compiling from GCC 10 for AMD64 to GCC 10 for MIPS/ARM/HPPA/PowerPC and back to GCC 10 for AMD64.

> Source is harder, though, for multiple reasons.

The OpenSSL bug was added through a patch. I'm not implying in any way it wasn't an accident, but it was a serious security hole added through source change. For our purposes, the patch fixed a latent bug; OpenSSL relied on reading uninitialized variables, and there's a large bit of rules lawyering on StackOverflow, enough that whatever the actual standard says, a change that detected such a problem and "accidentally" opened up a similar bug, even if limited to certain circumstances, could be plausibly denied to be malicious.

> The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

A bad actor wouldn't post it for review upstream; you toss into Red Hat or Debian or FreeBSD's patches, or stick it into some insecure mirror's copy of the source. Or you use direct access to the git repository.

> And finally, malicious source code is more difficult to deny intent about.

If GCC is bootstrapping itself and producing a different binary from another GCC bootstrap started from a different compiler, there's almost certainly malicious action. (There have been cases where stage 2 and stage 3 won't match, because the starting compiler miscompiled GCC, but not in an unsurvivable way, but you can run a stage 4 and it will match stage 3 and the final stage from other builds.) Once you've discovered the "trusting trust" attack, you can disassemble the binary and it will be obvious that malice was involved, because that couldn't happen by accident.

With source code, it'd be relatively easy to miscompile a bug into a target like OpenSSL to open a security hole in a plausibly deniable way. Once we've established malice, if Debian or Red Hat were shipping a compromised source or binary, it would trace back to the same paths, and much the same group of people could have slipped it into the supply chain.

Again, the base issue is real, but I think when you start toggling in bootloaders, you've left real-world concerns behind.

Bootstrappable builds

eru — Sun, 10 Jan 2021 17:37:54 +0000

You are thinking of the VIPER. I recall reading a story about it in some magazine, possibly BYTE. Quick googling turned up the following 1987 paper from Royal Signals and Radar Establishment, with proper old military document vibe (marked UNCLASSIFIED, looks like an old photocopy)

https://apps.dtic.mil/dtic/tr/fulltext/u2/a194561.pdf

Debian 2008 keys bug

aaronmdjones — Sun, 10 Jan 2021 13:34:45 +0000

> Any key generation requires random numbers and AIUI openssh relied on openssl for all it's random number needs.

Back then, it did, yes. OpenSSH 6.5 (adding support for Ed25519 keys) didn't arrive for another 6 years, and OpenSSH 6.8 (allowing it to be built without OpenSSL) didn't arrive for another year after that. These days you can build it without, and then it will use urandom(4) [Linux, among others] or arc4random(3) [OpenBSD].

What about the linker?

JoeBuck — Sun, 10 Jan 2021 01:11:32 +0000

The traditional gcc bootstrapping process can work with all of the binutils tools, built together in the same tree (this was pioneered by the Cygnus folks maybe 30 years ago); everything is built again with the new compiler, linker, and assembler to eliminate dependencies. We can demonstrate that the classic attacks in the Thompson paper either don't exist, or have affected every C compiler since the dawn of the language, by starting with unrelated compilers, doing the bootstraps, maybe going through a number of compiler versions and even throwing in cross-compilers, involving a mix of free and proprietary compilers, and verifying that in the end the binaries are the same (for some systems, timestamps have to be filtered out of object files when doing the comparison, but for most, we wind up with every byte identical).

However, we still inherit dependencies from system libraries, and this can include macros and inline functions. Someone could perhaps sneak an attack into a system library function that needs to be coded in assembler for optimal performance, and have this wind up in the compiler. So efforts like these that start with a tiny compiler and a tiny library can eliminate that threat as well.

But I think efforts like this, while fascinating, are a lot less important than they used to be because the real threat these days is in the microcode, the system under the system.

Debian 2008 keys bug

plugwash — Sat, 09 Jan 2021 02:11:00 +0000

Any key generation requires random numbers and AIUI openssh relied on openssl for all it's random number needs.

The key generation issue was awful, but at least you could recognise bad keys (debian shipped a package "openssh-blacklist for a long time because of this), but even worse was that traditional implementations of DSA use random numbers during the signature process and can leak bits of the key if that randomness is not sufficiently random.

This meant that any DSA key that had been merely used with the bad openssl had to be considered compromised. Since there was no way of detecting such keys, this lead to a ban in use of DSA keys on Debians infrastructure (no idea if other organsitions followed suite).

Debian was very fortunate that while it is theoretically possible to transfer keys between the gnupg world and the openssl/openssh/x509 world it was enough of a PITA that people very rarely did. So gnupg (which is the root of identity/trust in the Debian project) could still be considered safe.

Debian 2008 keys bug

aaronmdjones — Fri, 08 Jan 2021 23:55:23 +0000

The patch was to OpenSSL, specifically its random number generator, not OpenSSH. OpenSSH just happens to use OpenSSL for its RSA key generation, and RSA key generation requires a good source of random numbers.

Bootstrappable builds

Wol — Fri, 08 Jan 2021 21:09:35 +0000

This to me is the perfect description of why Science IS NOT Mathematics.

Mathematics is a provably correct logical model of what we think the world should be.

Science is a description of what the world is. (Or rather, Science is the work involved in making sure reality and theory agree - most practitioners unfortuanately try to make reality agree with theory, rather than the other way round :-)

Cheers,
Wol

Bootstrappable builds

josh — Fri, 08 Jan 2021 20:46:13 +0000

> GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with.

As long as the GCC binary didn't have something added that subverts subsequent GCC binaries.

> If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

Only if you start with two different independent compilers, though. The top-level comment of this thread just said "bootstrapped a modern GCC from non-GCC source", which doesn't say anything about diverse double-compilation (using two different non-GCC compilers to compile GCC).

> If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

Source is harder, though, for multiple reasons.

First, "trusting trust"-style attacks would be difficult to obfuscate; it's one thing to hide a security hole, and quite another to hide code that detects a code pattern from a compiler and modifies it such that it affects subsequently compiled code.

The source of GCC or Clang might be huge, but any *one* change is much smaller and more reviewable.

And finally, malicious source code is more difficult to deny intent about. With a malicious binary, you could try to claim some internal process was subverted, or blame a random employee, or contractor, or other similar diversions. With malicious source code, you'll have a harder time blaming anything other than malice.

Bootstrappable builds

jhhaller — Fri, 08 Jan 2021 17:58:04 +0000

It's not just the ME, there's firmware everywhere, in storage (both controller and drives), in the NIC, in the BIOS or other bootstrap code.

If one is trying to defend against state actors, there is no end to the potential attacks, especially if they are only attacking one entity.
Once they know the defense, it's easier to discover other places to attack.

I remember a British effort to build a mathematically verified computer, so that the results could be provably correct. The problem, as I remember,
is that the computer was a physical device which could have existing and new defects, even if the design was proved correct,
yielding the provably correct program potentially giving an incorrect answer. There is no way to prove that the fabrication of the
verified design was correct. I can't find the original source, I believe this was done in the 80's.

What about the linker?

eru — Fri, 08 Jan 2021 07:00:12 +0000

Should we not also take into account the "ld", "ar" and other bintools? I think the Thompson attack would also work with "ld" and any other tool that is used in the process of generating the final program.

So the bootstrap process should either start with a compiler that directly produces an executable, or also bootstrap the linker without depending on any existing linker.

Bootstrappable builds

dvdeug — Fri, 08 Jan 2021 06:48:50 +0000

I'm not sure I understand what type of malware you are referring to.

GCC bootstraps itself, which means the final copy of GCC binaries for a certain architecture and GCC version should not depend on what compiler you started with. If you start with two different compilers, you don't need to absolutely trust them; if they came from different sources and any attack they'd be using would be different, you can simply compare the final versions and if the binaries are the same, which starting compiler you used was truly irrelevant, and the "trusting trust" attack is moot.

I don't see how this has utility in build farms, either. The issue where bootstrapping matters is in compilers where attacks can be hidden in the binaries. You're not going to build GCC fresh on every system, and there's a serious question whether downloading a trusted source and building it on a million systems is any safer than downloading a trusted binary and building it on a million systems. If you can get a hacked binary into the pathway, you can get hacked source code into the pathway.

Bootstrappable builds

marcH — Fri, 08 Jan 2021 06:23:49 +0000

https://en.wikipedia.org/wiki/Attack_surface#Surface_redu...

Bootstrappable builds

goraxe — Fri, 08 Jan 2021 03:23:16 +0000

There have been malware in the wild that does attack tool chains and software has been put out that has had backdoors inserted by software houses affected by this type of malware. There is no guarantee that non gcc C compiler is trusted.

So the bootstrapping from tiny understandable principles is pretty interesting especially if the results are bit for bit comparable as this gives cryptographic verification options.

I could see this having utility in build farms like travis ci, paas systems like aws lambda, google app engine etc. If you need truly trusted binaries this seems like a very viable way of getting them

Bootstrappable builds

dgm — Thu, 07 Jan 2021 12:36:27 +0000

You can, to some extent. At this level of complexity trust is basically statistical, meaning that you trust bacause it would be rather difficult to tamper all the pieces and go undetected for long. But you cannot be certain.

The only absolutely trustable computer is the one you create yourself from discrete logic and only runs software written by yourself.

Bootstrappable builds

tsr2 — Thu, 07 Jan 2021 12:05:33 +0000

The hardware/firmware backdoor that means we can't trust any of this ever, on Intel hardware at least, is Intel ME.

https://en.wikipedia.org/wiki/Intel_Management_Engine

Bootstrappable builds

andrewsh — Thu, 07 Jan 2021 08:50:52 +0000

E.g. Kotlin.

Bootstrappable builds

pabs — Thu, 07 Jan 2021 03:13:08 +0000

For trusting source code, there will always be too much code for one developer or one organisation to review. So we need distributed code review, which is being worked on in the rust community.

https://github.com/crev-dev/crev
https://github.com/crev-dev/cargo-crev

Bootstrappable builds

pabs — Thu, 07 Jan 2021 01:04:11 +0000

Its sad to see that lots of programming languages and build tools aren't bootstrappable, or are only tortiously bootstrappable from a minimal Linux system.

Bootstrappable builds

dvdeug — Thu, 07 Jan 2021 00:53:00 +0000

It's interesting, but it seems to be wandering into purely academic technicalities. Once you have bootstrapped a modern GCC from non-GCC source, you're done for GCC's part; you have a trusted compiler assuming you can trust the source code, in the sense of Ken Thompson's article. The problem is not making the core smaller; the problem is the GCC tarball, uncompressed, is 740 MB, and even after stripping away various documentation and non-C/C++ directories, it's still over 200 MB. How do you trust that?

In 2008, there was a problem with Debian SSH keys due to an actual patch to OpenSSH in Debian. This was accidental and a patch to OpenSSH. It would have been harder but possible to do it intentionally and via a patch to GCC, so it would recognized OpenSSH and miscompile it as needed. It could be all written out in code, and nobody would be the wiser unless they knew what they were doing GCC-wise and were poking at that section of code.

It's not a bad concern, but it seems at this point to be more about something fun and interesting instead of something that provides any more trust in practice.