LWN.net Logo

Verifying the source code for binaries

By Jake Edge
June 26, 2013

Ensuring that the binary installed on a system corresponds to the source code for a project can be a tricky task. For those who build their own packages (e.g. Gentoo users), it is in principle a lot easier, but most Linux users probably delegate that job to their distribution's build system. That does turn the distribution into a single point of failure, however—any compromise of its infrastructure could lead to the release of malicious packages. Recognizing when that kind of compromise has happened, so that alarms can be sounded, is not particularly easy to do, though it may become fairly important over the coming years.

So how do security-conscious users determine that the published source code (from the project or the distribution) is being reliably built into the binaries that get installed on their systems? And how do regular users feel confident that they are not getting binaries compromised by an attacker (or government)? It is a difficult problem to solve, but it also important to do so.

Last week, we quoted Mike Perry in our "quotes of the week", but he had more to say in that liberationtech mailing list post: "I don't believe that software development models based on single party trust can actually be secure against serious adversaries anymore, given the current trends in computer security and 'cyberwar'". His argument is that the playing field has changed; we are no longer just defending against attackers with limited budgets who are mostly interested in monetary rewards from their efforts. He continued:

This means that software development has to evolve beyond the simple models of "Trust my gpg-signed apt archive from my trusted build machine", or even projects like Debian going to end up distributing state-sponsored malware in short order.

There are plenty of barriers in the way. Even building from source will not result in a bit-for-bit copy of the binary in question—things like the link time stamp or build path strings within the binary may be different. But differences in the toolchains used, the contents of the system's or dependencies' include files, or other differences that don't affect the operation of the program may also lead to differences in the resulting binary.

The only reliable way to reproduce a binary is to build it in the exact same environment (including time stamps) that it was originally built in. For binaries shipped by distributions, that may be difficult to do as all of the build environment information needed may not (yet?) be available. But, two people can pull the code from a repository, build it in the same environment with the same build scripts, and each bit in the resulting binaries should be the same.

That is the idea behind Gitian, which uses virtualization or containers that can be used to create identical build environments in multiple locations. Using Gitian, two (or more) entities can build from source and compare the hashes of the binaries, which should be the same. In addition, those people, projects, or organizations can sign the hash with their GPG key, providing a "web of trust" for a specific binary.

Gitian is being used by projects like Bitcoin and the Tor browser (work on the latter was done by Perry), both of which are particularly security-sensitive. In both cases, Gitian is used to set up an Ubuntu container or virtual machine (VM) to build the binaries (for Linux, anyway), so support for other distributions (or even different versions of Ubuntu) would require a different setup.

That points to a potential problem with Gitian: scalability. For a few different sensitive projects, creating the scripts and information needed to build the containers or VMs of interest may not be a huge hurdle. But for a distribution to set things up so that all of its packages can be independently verified may well be. In addition, there is a question of finding people to actually build all the packages so that the hashes can be compared. Each time a distribution updates its toolchain or the package dependencies, those changes would need to be reflected in the Gitian (or some similar system) configuration, packages would need to be built by both the distribution and at least one (more is better, of course) other "trusted" entity before consumers (users) could fully trust the binaries. Given the number of packages, Linux distributions, and toolchain versions, it would result in a combinatorial explosion of builds required.

Beyond that, though, there is still an element of trust inherent in that verification method. The compiler and other parts of the toolchain are being trusted to produce correct code for the source in question. The kernel's KVM and container implementation is also being trusted. To subvert a binary using the compiler would require a "Trusting Trust" type of attack, while some kind of nefarious (and undetected) code in the kernel could potentially subvert the binary.

The diversity of the underlying kernels (i.e. the host kernel for the container or VM) may help alleviate most of the concern with that problem—though it can never really eliminate it. Going deeper, the hardware itself could be malicious. That may sound overly paranoid (and may in fact be), but when lives are possibly on the line, as with the Tor browser, it's important to at least think about the possibility. Money, too, causes paranoia levels to rise, thus Bitcoin's interest in verification.

In addition to those worries, there is yet another: source code auditing. Even if the compiler is reproducibly creating "correct" binaries from the input source code, vigilance is needed to ensure that nothing untoward slips into the source. In something as complex as a browser, even run-of-the-mill bugs could be dangerous to the user, but some kind of targeted malicious code injection would be far worse. In the free software world, we tend to be sanguine about the dangers of malicious source code because it is all "out in the open". But if people aren't actually scrutinizing the source code, malicious code can sneak in, and we have seen some instances of that in the past. Source availability is no panacea for avoiding either intentional or accidental security holes.

By and large, the distributions do an excellent job of safeguarding their repositories and build systems, but there have been lapses there as well along the way. For now, trusting the distributions or building from source and trusting the compiler—or some intermediate compiler—is all that's available. For certain packages, that may change somewhat using Gitian or similar schemes.

The compiler issue may be alleviated with David A. Wheeler's Diverse Double-Compiling technique, provided there is an already-trusted compiler at hand. As Wheeler has said, one can write a new compiler to use in the diverse double-compilation, though that compiler needs to be compiled itself, of course. It may be hard to imagine a "Trusting Trust" style attack on a completely unknown compiler, but it isn't impossible to imagine.

As mentioned above, source-binary verification is a hard problem. Something has to be trusted: the hardware, the kernel, the compiler, Gitian, or, perhaps, the distribution. It's a little hard to see how Gitian could be applied to entire distribution repositories, so looking into other techniques may prove fruitful. Simply recording the entire build environment, including versions of all the tools and dependencies, would make it easier to verify the correspondence of source and binaries, but even simple tools (e.g. tar as reported by Jos van den Oever) may have unexplained differences. For now, focusing the effort on the most security-critical projects may be the best we can do.


(Log in to post comments)

Verifying the source code for binaries

Posted Jun 27, 2013 3:08 UTC (Thu) by jmorris42 (subscriber, #2203) [Link]

I have pondered this one before. How about this as a first cut? All terminology will be RPM based but the ideas of course work anywhere.

When the distributor builds the package, the rpmbuild process has a new option for the .spec file. When invoked you add /usr/share/doc/{packagename}/validation_info to the package. It contains everything needed to validate a future rebuild of that binary rpm. First it includes a complete list of every package installed, essentially an rpm -qa dump. Then you can include SHA sums of the libraries/compilers, etc. just to be extra paranoid, although any rebuilding environment will probably fail those in the first phases.

When rebuilding you begin by rebuilding the base set and toolchain of the distro's source packages with a different compiler. This won't be very close but once you can get to a point where you can install the rebuilt packages you rebuild everything using itself. This should get you pretty close and you won't have ever touched a binary from the vendor. Now the fun part. You rebuild everything twice and compare where your differences are. Instead of comparing pure binary blobs you dump the elf sections. Timestamps, etc. should always vary in the same places. Now you can compare the binary packages from the distribution against this baseline and ensure they only differ in those same places.

Now you have a process to pull in updates and validate those before installing them into your test environment. Now when an update of a critical package drops you can install the exact same set of toolchain packages, rebuild it twice and compare the set of variations to the new suspect. During this phase you could even install the build environment from the previously validated distributor binaries and make use of the detailed sha sums of each library and compiler binary.

An automated tool to do these threeway compares of elf binaries would make this notion practical enough for one or two really dedicated projects to run independent verifications of popular distributions. As for making it automated enough for everyone to run, that would be a harder problem.

Verifying the source code for binaries

Posted Jun 27, 2013 6:21 UTC (Thu) by paulj (subscriber, #341) [Link]

A completely unknown compiler can easily be attacked. You do not need to know the internals of the compiler, you just need a well-known 'hook' to run your malign code. Such hooks can include well-known function calls, various features of ELF, etc.

Verifying the source code for binaries

Posted Jun 27, 2013 14:14 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

"A completely unknown compiler can easily be attacked. You do not need to know the internals of the compiler, you just need a well-known 'hook' to run your malign code. Such hooks can include well-known function calls, various features of ELF, etc." - I mostly disagree.

You're right that there are ways to attack completely unknown compilers. A very few techniques are easier (though I think few would agree that they're really easy). But most of these kinds of attacks are, well, really hard. Attacks are generally quite sensitive to what's being attacked; it's hard to write a program when you don't know exactly what it's supposed to do. For example, why do you assume that the generated code is ELF?!? Many trivial compilers generate their own bytecode, and you then run the bytecode interpreter... making many kinds of embedding (like ELF embedding) quite useless. Even if it generates ELF, that doesn't mean you know where to hook it; you'd be surprised what varies at the low levels. As I discuss in the DDC paper, you want the checking compiler to be as diverse as you can make it; the more different it is, the harder it is to attack.

DDC

Posted Jun 27, 2013 6:46 UTC (Thu) by paulj (subscriber, #341) [Link]

Note that if you already have a trusted compiler (i.e. you wrote it yourself¹), that is at least capable of compiling the compiler(s) you need to compile the rest of the code, then you have no need for DDC. You can just use the trusted compiler directly...

1. Though to be truly sure you can trust it, you must control the creation or be able to fully verify the inability to introduce subversion of *every* part of the *entire* compile time system yourself. I.e. not just the software, but everything from the hardware on up, and all tools involved (including any fabrication of chips).

DDC

Posted Jun 27, 2013 14:05 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

"if you already have a trusted compiler (i.e. you wrote it yourself¹), that is at least capable of compiling the compiler(s) you need to compile the rest of the code, then you have no need for DDC. You can just use the trusted compiler directly..." - Writing a compiler that generates good code for arbitrary programs is hard. Writing a compiler that may run slowly and may generate very slow code, to compile a single program, is really easy. Besides, that just moves the attack - now the attacker just has to attack one other compiler executable, the one you wrote. With DDC, the attacker has to subvert all the compiler executables used, which is far more difficult.

"Though to be truly sure you can trust it, you must control the creation or be able to fully verify the inability to introduce subversion of *every* part of the *entire* compile time system yourself. I.e. not just the software, but everything from the hardware on up, and all tools involved (including any fabrication of chips)." - Well, feel free to do that, sounds great! Most of the rest of us lack the money and time for all that. Most of us are just looking for justified confidence. In that case, we can use multiple mechanisms to operate as checks on each other. That means we can make the attack very costly, at a low cost to the defender, by using differing approaches as checks on each other.

This is how we handle issues in the rest of life, too. After all, I use (trust) a bank with my money, I don't run my own. Instead of trying to create my own bank, I depend on independent checks (e.g., government regulators to oversee banks, and insurance organizations to reimburse me in case of fraud). The problem isn't trust; we trust stuff all the time. I see the "trusting trust" problem as the challenge that it's been historically impractical to have any kind of independent verification of compiler executables. DDC provides a process to enable independent verification.

DDC

Posted Jun 27, 2013 14:54 UTC (Thu) by paulj (subscriber, #341) [Link]

Well, the compiler you write doesn't have to be good, it just has to be able to compile the source of a more decent compiler. Then you use that to compile itself and everything. This is the same argument as in your PhD about C_t and the untrusted compiler. Why would you ever need to bother verifying the untrusted compiler binary builds its own source reproducibly, if you already have a trusted compiler binary than can compile that source?

Save the effort, throw away the untrusted compiler binary, use the trusted one to bootstrap!

And you're still completely missing Thompson's point, which I don't believe was a terribly complicated one. I'm *NOT* saying it is practical to build the entire system, to go make your own Fab, etc. Nor is Thompson. Again, his point is you may either:

* Re-use the work of others and consequently have to invest at least some degree of trust in them.

OR

* Essentially build your system (inc any tools) from the ground-up, other than those that are simple enough to fully inspect as free from being capable of subversion.

If one alternative is not practical, then we are left with the other. Thompson is saying it is nigh impossible to avoid having to trust. He even gives societal level examples.

You claim DDC (using *2* or more untrusted compiler binaries, and checking they build the same) may help reduce the degree of trust required, and I *AGREE*. The only claim I disagree with is that it *fully* eliminates the need for trust. It does not.

Indeed, you agree it does not 100%, but that nothing is 100%. On that we can agree. However, that means we need to quantify just how close to 100% we get. I would suggest to get a handle on this you need to examine the specific compilers concerned and actually check those involved in handling and releasing binaries for them are diverse (if that's even reliably possible). E.g. imagine I decide to use, say, gcc and llvm, to DDC my Fedora system. I download the Fedora sources and I download the gcc and llvm binary RPMs. Are they diverse in providence? No! They're both provided by Fedora and there are surely at least a number of people who could influence *both* binaries! Ok, so instead I get my LLVM binary from Debian. Though, wait, if you're involved in Fedora packaging, does that mean you can't be a Debian compiler packager? Hmm, that's not an entirely safe assumption either, is it?

DDC could well be useful, but there's a whole lot of human intel, of checking who is involved in your supply chain, making sure there are no obvious overlaps, that it rests on. Even if there are no obvious overlaps, that still is no guarantee that the Debian DD for llvm will not collude with, say, the Fedora one. If that sounds far-fetched, well everyone has a price and certain agencies can waive both big carrots and sticks, if they really wanted. Unlikely, but far from impossible - and that's *assuming* when you do DDC you *have* done your supply-chain-diversity homework.

I'm sorry, but your confidence that the assumption that binaries for different compilers are essentially immune to the same or coördinated subversion is far from a safe one. It is important in security to recognise the limitations. Where it really matters, it can be life-*critical* that people do not over-estimate the degree to which they can trust their system.

For that reason I would say "DDC may somewhat counter" - not "fully".

Note: "Write your own compiler" in my previous comment implies actually writing the binary yourself. E.g. perhaps bootstrapping using a very simple assembler writing directly in binary, manipulating memory in a verifiable way (perhaps using an analyser, or making your own ROM), then on up to creating your simple compiler. A lot of effort, though not at all beyond the realms of possibility if you truly require a compiler binary you can trust (ignoring the question of verifying the remaining system, inc hardware).

DDC

Posted Jun 27, 2013 15:03 UTC (Thu) by paulj (subscriber, #341) [Link]

bah "certain agencies can *wave*" - not waive. ;) I'll leave the other, less-meaning-changing typos and missing words alone. :)

Speaking of which, either David's PhD or paper mentions that these agencies may well have enough resources and/or determination to get around the diversity assumption, and DDC wouldn't protect against those. However, such agencies are surely *the* major concern for a good number of people. ;)

Those large organizations

Posted Jun 27, 2013 16:34 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

"Speaking of which, either David's PhD or paper mentions that these agencies may well have enough resources and/or determination to get around the diversity assumption, and DDC wouldn't protect against those. However, such agencies are surely *the* major concern for a good number of people. ;)

I do mention that some agencies may have enough resources to perform the trusting trust attack. Section 2.6 does note the potential for problems, and section 3.1 does note that "a highly resourced organization (such as a government) might decide to undertake" a trusting trust attack.

But in spite of what people see in James Bond movies, these organizations do not have infinite amounts of resources. Yes, these organizations can perform a trusting trust attack on a popular compiler. Maybe even a few. But it's far more difficult to successfully attack all compilers, especially ones in the far past or future, including ones not even written yet for CPUs that do not exist yet. Including compilers you might write, specifically to be different. Most defense involves changes the costs so that the attack isn't worth it, and that's the opportunity DDC provides. DDC provides a relatively inexpensive independent verification process on compiler executables, one that can be performed after-the-fact.

Also, various large organizations can independently watch the compiler executables, which means that in some sense they can watch each other. They can do this by independently funding DDC processes within themselves. It's not entirely clear that they would do so, or that they'd report problems. When something can be used for defense or offense there's an "equities" trade-off that such organizations have to consider. But the possibility is intriguing.

DDC

Posted Jun 27, 2013 16:19 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

"Why would you ever need to bother verifying the untrusted compiler binary builds its own source reproducibly, if you already have a trusted compiler binary than can compile that source? Save the effort, throw away the untrusted compiler binary, use the trusted one to bootstrap!" - Section 4.6, "Why not always use the trusted compiler?", answers that question.

"And you're still completely missing Thompson's point... his point is you may either: * Re-use the work of others and consequently have to invest at least some degree of trust in them. OR * Essentially build your system (inc any tools) from the ground-up, other than those that are simple enough to fully inspect as free from being capable of subversion." - Your list is closer to my viewpoint. But that is not what "Reflections on Trusting Trust" says. It says, "You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code." Oh but wait, we now have a mechanism for verifying code that you did not totally create yourself. Yes, you may have created the DDC trusted compiler, but now we can use it to check the compiler under test, which is not the DDC trusted compiler. Therefore, the trusting trust attack is busted, because we now do have a verification process for code you didn't create yourself, and the process can be applied at any time in the future.

"You claim DDC (using *2* or more untrusted compiler binaries, and checking they build the same) may help reduce the degree of trust required, and I *AGREE*. The only claim I disagree with is that it *fully* eliminates the need for trust. It does not.... "

We're back to the semantics of what the "fully" means. My point is that the trusting trust attack was all about the impossibility of independent verification that the executable corresponds to the source code. With DDC, we now have the possibility of independent verification of that the source and executable correspond. You may not like my use of "fully" in that way, but since my goal is to allow independent verification, it does indeed fully provide a mechanism to perform independent verification.

Now I completely agree that, if applying DDC, you must ask questions about how independent the verifications are. But that is much easier to address; once you know that diversity is a goal, you can come up with all sorts of ways to provide it. We can even perform DDC multiple times to make it really hard for an attacker to counter.

"if you're involved in Fedora packaging, does that mean you can't be a Debian compiler packager? Hmm, that's not an entirely safe assumption either, is it?" - Fedora and Debian are far more alike than different. Remember, the goal is to maximize diversity. For example, I have some 20+-year-old machines, with their original executables, that I hope to some day use as checking systems. They have radically different operating systems, different compilers, different CPUs, different executable formats. Other people can do the same, with a different set of systems, in case people don't trust me. Good luck, Mr. Attacker.

"I'm sorry, but your confidence that the assumption that binaries for different compilers are essentially immune to the same or coördinated subversion is far from a safe one." - Huh? I never said immune, I said in section 6 that, "Diversity can greatly reduce the likelihood that trusted compiler cT and the DDC environments have relevant triggers and payloads, often at far less cost than other approaches." It's still fully countered because we now have a cost-effective way to do independent verification, and as I stated earlier, the goal was to create a process to enable independent verification. You now get to decide what level of independent verification you're comfortable with.

"It is important in security to recognise the limitations.". Of course. See section 8.14, "How can an attacker counter DDC?", which discusses those limitations. Since we have a formal proof that the goal is met when the assumptions are met, countering DDC involves making one of the assumptions false.

"For that reason I would say "DDC may somewhat counter" - not "fully"." So you agree that DDC is helpful, you just don't like the word "fully". I'd say that our difference is because my goal differs from yours, probably because we disagree on what the fundamental problem is. I believe that the fundamental problem was a lack of a verification process for program-handling systems like compilers. My goal was, therefore, to provide and prove an independent verification process. DDC fully does this.

DDC

Posted Jun 27, 2013 17:04 UTC (Thu) by paulj (subscriber, #341) [Link]

That section doesn't really answer the question for the case where C_t really is trusted. If C_t can compile the performant compiler source S_p to obtain C_1 = c(S_p,C_t), then just use performant compiler binary C_p to compile the rest of the system. There is no need to DDC (i.e. c(S_p,C_1) )!

The answer /does/ make sense if C_t is *not* actually completely trusted. Which is the justification in the PhD. However, in that case, the result of the process can not be 100% trusted either - as you agreed before. And you're right, we disagree on the semantics of "fully" - for me it really does mean "fully", for you it seems to mean something less than fully.

It says "You can't trust code that you did not totally create yourself.…"

How is that inconsistent what I stated his point was?! Given that Thompson then further generalises this point from code to systems.

Regarding the 20-year old machines, I actually mention that scenario in my critique, including the ability to trust old cryptographic checksums. You should offer a DDC service using these old, attacker-proof machines, and tell the rest of us whether our binaries are OK. We'll trust you! :). However, clearly, this approach does not generalise.

Re the assumptions: At no stage do you show those assumptions are universally true, or alternatively show ways to evaluate whether the assumptions hold or not when DDC is run. Indeed, it's tautologous: DDC reliably detects subversions if we assume the system hasn't been sufficiently subverted - if the attacker subverts the system, then DDC can not.

Your machine proof depends on having a known trusted compiler C_t - which is not particularly useful. Your diversity arguments rest on the critical assumption that the distribution paths of at least 1 of your compiler binaries is wholly independent from your other compiler binaries, and no attacker could ever compromise them. That doesn't seem like an assumption that is clearly universally true, to me. I've given a number of examples here and in my critique how that assumption could break down, and those are not exhaustive. The PhD itself states this assumption may not hold in the face of well-resourced attackers such as governments!

I do think it's important that "fully" really means "fully" when we claim to have countered some attack. Indeed, Thompson's "attack" simply fundamentally can not be fully countered without, as he points out, building everything yourself. You're free to disagree. Others will make up their own mind.

(NB: Again, this is about the justification of "Fully" - I am not trying to say DDC could never be useful :). ).

DDC

Posted Jun 27, 2013 18:08 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

"That section doesn't really answer the question for the case where C_t really is trusted. If C_t can compile the performant compiler source S_p to obtain C_1 = c(S_p,C_t), then just use performant compiler binary C_p to compile the rest of the system. There is no need to DDC (i.e. c(S_p,C_1) )!"

As the section 4.6 explains, "First, there are many reasons compiler cT might not be suitable for general use", and "Second, using a different trusted compiler cT greatly increases the confidence that the compiler executable cA corresponds with source code sA. When a second compiler cT is used as part of DDC, an attacker must subvert multiple executables and executable-generation processes to perform the 'trusting trust' attack without detection."

"And you're right, we disagree on the semantics of "fully" - for me it really does mean "fully", for you it seems to mean something less than fully." - No, I mean fully. DDC fully counters the inability to independently verify an executable.

"You should offer a DDC service using these old, attacker-proof machines, and tell the rest of us whether our binaries are OK. We'll trust you! :). However, clearly, this approach does not generalise." - It generalizes just fine. It's how we handle banks; we let some organizations run banks, and other organizations do independent checks to reduce the probability of fraud. Most people expect to have a mechanism to enable independent evaluation. The "trusting trust" attack is surprising because it appears that it is impossible to have any meaningful independent evaluation of executables. DDC enables the independent evaluation, and thus, fully counters the problem of having no possibility of independent evaluation.

"Re the assumptions: At no stage do you show those assumptions are universally true, or alternatively show ways to evaluate whether the assumptions hold or not when DDC is run. Indeed, it's tautologous: DDC reliably detects subversions if we assume the system hasn't been sufficiently subverted - if the attacker subverts the system, then DDC can not." - No tautology here. It's not that you blindly assume it, it's that the defender specifically selects compilers most likely to have this property. In fact, not only does an attacker have to subvert the system used for DDC, but he has to subvert it the same way. You can even use subverted compilers in DDC, as long as they don't have the same triggers and payloads. As far as independence goes, section 6 goes into that in detail.

"Your machine proof depends on having a known trusted compiler C_t - which is not particularly useful. Your diversity arguments rest on the critical assumption that the distribution paths of at least 1 of your compiler binaries is wholly independent from your other compiler binaries, and no attacker could ever compromise them. That doesn't seem like an assumption that is clearly universally true, to me. I've given a number of examples here and in my critique how that assumption could break down, and those are not exhaustive. The PhD itself states this assumption may not hold in the face of well-resourced attackers such as governments!" - You're missing a key aspect: The defender gets to determine which compilers to use as the independent test. And the defender gets to use all evidence at his disposal to determine this. If you have a concern about one system, use a different one for DDC. And as noted above, you can even use a subverted compiler for DDC, as long as it's not subverted the same way.

"I do think it's important that "fully" really means "fully" when we claim to have countered some attack. Indeed, Thompson's "attack" simply fundamentally can not be fully countered without, as he points out, building everything yourself. You're free to disagree. Others will make up their own mind." I agree with you that the word "fully" must mean "fully". That is not the point of disagreement at all.

We disagree on what the problem is, therefore, we disagree on whether or not I've solved it :-). I believe the problem was that there was no process for independently verifying the executables produced by compilers and similar program-handling programs. DDC provides such a process. Now that we have a process, we can decide on how strong that independence must be for some given circumstance. This is the same issue for banks, railroads, and just about anything else we depend on. At this point I can't rename the paper anyway, but I'm not at all convinced by your argument.

Nobody needed Thompson, or anyone else, to tell them that a malicious CPU could change computational results. That's not the point. We already knew that, long before the 1970s. Thompson focused on "program-handling" systems like compilers that can regenerate themselves. It appeared that, because they can process themselves as well as other programs, they can inhibit any meaningful independent verification of executables. DDC enables independent evaluation... and that changes everything.

"(NB: Again, this is about the justification of "Fully" - I am not trying to say DDC could never be useful :). )." - Well, that's not a bad start :-).

DDC

Posted Jun 27, 2013 20:45 UTC (Thu) by paulj (subscriber, #341) [Link]

No, we don't disagree on the problem. We disagree on whether fully solves it.

The problem: Single party trust

Posted Jun 27, 2013 18:20 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

Let's look at Mike Perry's statement again: "I don't believe that software development models based on single party trust can actually be secure against serious adversaries anymore, given the current trends in computer security and 'cyberwar'".

The problem, as he sees it, is single party trust. Sure, we have to trust something. What we need are mechanisms to independently verify the software (and eventually other components) we depend on. Being able to reproduce the same executable from source, to verify it, helps a lot... but that begs the question about compilers. DDC then addresses the problem of compilers, because it provides an independent party test.

The problem, as both Mike Perry and I see it, is single party trust. DDC provides independent verification for compilers, and thus, helps break single party trust.

Put simply, we need a way to "trust but verify".

DDC

Posted Jul 7, 2013 20:55 UTC (Sun) by dmag (subscriber, #17775) [Link]

> You claim DDC [..] may help reduce the degree of trust required, and I *AGREE*. The only claim I disagree with is that it *fully* eliminates the need for trust. It does not.

You are conflating two completely different topics.

The original paper said "I can make one tiny change to subvert every program, and there is nothing you can do to detect it". David's paper said "I can write one small compiler and detect your subversion!". Therefore, David *fully* countered the 'Trusting Trust' problem from the original paper.

David did not (and cannot) fix the underlying "we have to trust something" problems that you point out. But trust problems are "Turtles all the way down":

If David does a bunch of work and says "GCC is not subverted", that doesn't help anyone else -- unless they trust David. But even if they trust David, maybe an evil maid will subvert his keyboard/monitor/CPU/OS so he cannot detect the subversion. Or maybe someone will drug/brainwash/blackmail David into telling everyone "GCC is not subverted, mmmkay?". Or maybe your phone/computer/email have been hacked so you see a different message than he sent. Or maybe David's copy really isn't subverted, but yours still is.

But who cares? Attackers aren't fixated on tactics, they are focused on results. David has just made it *harder* to pull off this one particular attack. And that has repercussions for all future attackers: They may decide that the complexity/cost of this particular attack is too high, and choose some simpler/cheaper way to get their results. This XKCD comes to mind: http://xkcd.com/538/

DDC

Posted Jul 8, 2013 8:38 UTC (Mon) by paulj (subscriber, #341) [Link]

"I can write one small compiler" - but Thompson *made that exact point*: you need to write the programme handling programmes *yourself* OR trust! From his paper:

"You can't trust code that you did not totally create yourself."

And note that that "totally" can extend a long way, as in a previous comment. If DDC requires you to write your own compiler in order to boot-strap the trust-chain, then that is clearly exactly Thompsons point.

DDC then goes a step further and claims that diversity of untrusted, potentially hostile gives you the *same* trust boot-strap, on the assumption that, even if all the compilers used were subverted, they can not co-operate. This is a qualitative risk judgement. While it may be that the risks are low enough in many situations that this technique does indeed give sufficient assurance, it should be pretty clear these are not absolutely inviolable. In some situations (e.g. well-resourced nation state actors pitted against other), DDC may not provide sufficient assurance. That's clearly not a "full" detection method, to my thinking.

It's funny how your comment contains both "[DDC] fully countered" and "[DDC] has just made it *harder* to pull off this ... attack".

DDC

Posted Jul 8, 2013 14:10 UTC (Mon) by dmag (subscriber, #17775) [Link]

> It's funny how your comment contains both "[DDC] fully countered" and "[DDC] has just made it *harder* to pull off this ... attack".

That's not a contradiction. You continue to conflate two different things.

1) It "fully" countered the original statement "there is nothing you can do about it." Now we have something we can do.

2) It does not "fully" fix computer security and trust. Because that is not going to happen. Ever.

It doesn't matter if you melt your own sand into Silicon, design your own processor, write your own compiler, write your own wireless drivers, etc. Someone could sneak in late a night and alter a line of code or replace your silicon chip with a subverted clone or brainwash/blackmail you into never revealing what you found. And even if one person manages to build a "fully trusted" computer, who will trust him or her? It's an existential problem. Turtles all the way down.

You can continue to say "maybe every compiler in the world is subverted", and nobody can ever disprove your statement.

So in theory, your statement is true. But in practice, it would cost a lot of money/resources to "subvert every compiler". For any particular goal, there are probably a dozen cheaper options: grab some off-the-shelf zero-day exploits that exist for every OS. Pay off Skype/Flash/Windows to put in a secret NSA key. Write a virus that targets just your computer and give it to your friends.

DCC is pushing the "subverted compiler" threat into "movie plot" territory, where it's no longer reasonable to try to defend against it specifically.

https://en.wikipedia.org/wiki/Movie_plot_threat

Verifying the source code for binaries

Posted Jun 27, 2013 8:59 UTC (Thu) by abogani (subscriber, #57602) [Link]

" Simply recording the entire build environment, including versions of all the tools and dependencies, would make it easier to verify the correspondence of source and binaries "

This is exactly what NixOS does.

Verifying the source code for binaries

Posted Jun 27, 2013 18:03 UTC (Thu) by cov (subscriber, #84351) [Link]

If timestamps and other metadata are what's causing trouble, it seems a lot more efficient to add a flag to GCC to clone some existing binary's values than fire up a virtual machine.

Verifying the source code for binaries

Posted Jun 28, 2013 15:31 UTC (Fri) by ssam (subscriber, #46587) [Link]

How about an elf-diff that can ignore trivial changes (like --ignore-space-change in regular diff)

OBS

Posted Jun 30, 2013 21:29 UTC (Sun) by garloff (subscriber, #319) [Link]

Open Build Service uses VMs with well-defined config and set of packages used to build packages. Many packages put some effort to avoid compile time stamps etc. Build-compare does the job of comparing the binaries... and is used to determine whether dependent packages need to be rebuilt.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds