LWN.net Logo

Diverse Double-Compiling (DDC) counters "trusting trust"

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 21, 2013 19:02 UTC (Fri) by david.a.wheeler (subscriber, #72896)
In reply to: Diverse Double-Compiling (DDC) counters "trusting trust" by paulj
Parent article: Van den Oever: Is that really the source code for this software?

A few comments...

"assuming that different compilers have independent authors, or independent sets of people involved in their release, seems a weak assumption to me." - Actually, I believe it's pretty strong. With some exceptions, different compilers are written by different people. And if they ARE the same people, well, use a different one, or write your own for use in testing. But you do not need to write a "production-quality compiler" to verify a compiler using DDC, you just need one that functions correctly for one program... and that makes it a much easier problem.

"Further, as Thompson pointed out, the compiler is just one vector, and he explicitly noted his attack could be implemented in _any_ programme-handling programme. Thus, it is not just the compiler, but the whole environment the compiler and any other programme-handling programme run in that must be verified or trusted." - Sure, but DDC can used for them, too. You don't have to limit it to just the "compiler" as traditionally defined. I specifically talk about using DDC to test the compiler+operating system as a system.

"Finally, assume that increasing numbers of additional compilers might buy more re-assurance. That does not mean it is practical though. Even with just *1* compiler, tcc, you ran into issues (bugs) that made reliable builds to the same binary difficult. The more compilers you add to the mix, the more of these problems you run into, and the less practical it becomes." - Ah, but the problems only normally happen the first time you're applying DDC, and in any case, it's usually hard to do something when no one has ever done it before.

"My critique gives more examples of assumptions that equate to having to write/verify everything yourself OR trust, as was Thompson's point." - I think we fundamentally differ on what the problem is, leading us to differences on whether or not I've solved it :-). I believe the problem is having to totally trust any one thing, without being able to verify it. If you can independently verify something, then it's no longer the total "trusting trust" problem that Thompson pointed out. Sure, you have to trust something, but if you can independently verify that thing you trust in multiple ways, it's nowhere near as risky.


(Log in to post comments)

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 21, 2013 20:40 UTC (Fri) by paulj (subscriber, #341) [Link]

Actually, I believe it's pretty strong. With some exceptions, different compilers are written by different people. And if they ARE the same people, well, use a different one, or write your own for use in testing. But you do not need to write a "production-quality compiler" to verify a compiler using DDC, you just need one that functions correctly for one program... and that makes it a much easier problem.

I'll be honest, I don't have direct experience of the compiler world. However, in my area of networking, the more specialised you get, the fewer people there are in that speciality. Further, those people move around. They go from one major player to another major player, to start-ups, back to major players; they go from working on open-source stuff to proprietary software and back, and vice-versa. I don't know specifically about compiler writers, but I bet there's a similar flow of people between different organisations and projects there. To me, the notion that different compilers will inherently have independent sets of authors AND other involved engineers (maintenance, release engineering), is highly inconsistent with freedoms which I, and I suspect many others, know engineers/authors avail of in their careers.

Further, even if the people were independent, that *still* doesn't mean the software will be untainted by the others. I'm sure some of those authors will at times try out the competing software on their systems at times. I'm just rehashing from my critique, so I get the feeling you mustn't have read it. ;)

Ah, but the problems only normally happen the first time you're applying DDC, and in any case, it's usually hard to do something when no one has ever done it before.

Agreed, it's possible that you could get agreement among compiler people that exact-reproducibility was important. Sufficient number of them might make their compilers support such a thing. Other parts of the build environment might also have to be adapted too. It's *possible*, but that world doesn't yet exist, it seems. Exactly how easy it would be to get that world AND maintain it, I don't know.

I think we fundamentally differ on what the problem is, leading us to differences on whether or not I've solved it :-).

Agreed. You've taken an extremely narrow, restricted view of Thompson's attack. That it was specifically about subverting a compiler binary. Even with that restricted view, I believe your technique still requires one to *write* their own compiler from scratch AND/OR trust an existing binary. Further, that restricted view of his attack/point is not right - his paper clearly intended the compiler attack to be an illustration of a much more general class of attacks, in order to make fundamental points about trust.

The technique in your thesis is very useful, and seems to me like it could indeed help increase the level of assurance we might have that our software hasn't been subverted. No argument there. It's just that "Fully" qualifier in the claim that's jarring - it's not supported, not even with the deliberately restricted view taken of Thompson's attack. ;)

Thompson's point still stands, indeed it is re-inforced by how your technique requires a trusted or verified boot-strap compiler: We must build the whole system ourselves, OR we must trust in those who supplied the parts we did not build.

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 21, 2013 22:50 UTC (Fri) by david.a.wheeler (subscriber, #72896) [Link]

"I don't know specifically about compiler writers, but I bet there's a similar flow of people between different organisations and projects there.". - Certainly it happens, but it's less common than you might think. Learning how to write a compiler is a standard Computer Science class, and in many cases students actually write a compiler as part of the class. Thus, there are a huge number of people who know how to do it. It's not like there are only 10 people in the world who know how to write compilers. Especially since, for DDC, it only needs to work at all for one program... slow is just fine.

"To me, the notion that different compilers will inherently have independent sets of authors AND other involved engineers (maintenance, release engineering), is highly inconsistent with freedoms which I, and I suspect many others, know engineers/authors avail of in their careers.". - Even if there's overlap, the issue is that someone would have to be able to subvert the binary... which means they'd have to have a fair amount of trust, or break in somehow in the build chain, to do it. Typically the "new guy" isn't trusted to the same degree, making it harder to do.

"I'm sure some of those authors will at times try out the competing software on their systems at times.". - So? Trying out software doesn't perform the attack. Even if you create a modified malicious compiler on your local system, that means nothing. For an attack to work against DDC, an attacker has to subvert the binaries of multiple compilers, as used by the organization applying DDC.

"You've taken an extremely narrow, restricted view of Thompson's attack. That it was specifically about subverting a compiler binary. Even with that restricted view, I believe your technique still requires one to *write* their own compiler from scratch AND/OR trust an existing binary. Further, that restricted view of his attack/point is not right - his paper clearly intended the compiler attack to be an illustration of a much more general class of attacks, in order to make fundamental points about trust."

It's not narrow at all. I clearly state in my paper that DDC isn't limited to "compilers" as they are traditionally defined. Indeed, I specifically note that the DDC "compiler" could be an assembler, could include an entire operating system, and could apply to hardware (not just microcode) as well.

But as far as what Thompson meant, well, let's look what Thompson says in "Reflections on Trusting Trust". He says that "You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.)" He says his attack is specifically about program-handling code: "I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode". And the reason can't trust others' code is because you're unable to do source-level verification: "No amount of source-level verification or scrutiny will protect you from using untrusted code."

The fundamental problem is that with program-handling programs, the creator of the executable can create self-perpetuating attacks. So... let's employ multiple different program-handling programs, make it hard for them to collude, and use them to check each other. Now we're no longer dependent on any single supplier to decide if we get malicious code or not.

"The technique in your thesis is very useful, and seems to me like it could indeed help increase the level of assurance we might have that our software hasn't been subverted. No argument there. It's just that "Fully" qualifier in the claim that's jarring - it's not supported, not even with the deliberately restricted view taken of Thompson's attack. ;) Thompson's point still stands, indeed it is re-inforced by how your technique requires a trusted or verified boot-strap compiler: We must build the whole system ourselves, OR we must trust in those who supplied the parts we did not build."

You keep ignoring a third way - use multiple parties to check each other. One of those parties could even be you! Outside of the computing field this use of multiple checks is the normal way of gaining trust. Instead of blindly trusting business B, we might get an independent auditor to check out the books. If we're really concerned, we'll get two auditors and make sure that the external organizations that manage B's accounts actually hold the money claimed. Historically, compilers and other program-handling programs were like a business without auditing; we had to trust whatever it produced. DDC creates an auditing step; sure, it's possible to fool it through collusion, but there are many well-known ways to make collusion hard. Moving from blind trust to an auditable process is a big step forward.

Put simply: The problem, as I see it, is the previous impracticality of independent verification of program-handling programs. DDC fully eliminates that problem, because it does provide a mechanism for independent verification. Now the problem changes to "how much verification is enough" instead of "I guess we just hope it's okay." How and when do you apply that mechanism? Cue Neo's voice: "Where we go from there is a choice I leave to you."

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 21, 2013 23:18 UTC (Fri) by paulj (subscriber, #341) [Link]

Some quick replies:

- I would not argue few people are _capable_ of writing compilers. Rather, reality is that, relative to the number of developers in software, few people will end up working on widely used compilers. This is just an artifact of the world, the breadth of complexity of our systems, and the need for specialisation. Many people _could_ work on compilers, _few_ do.

- If a compiler developer tests out the binary compiler of another team on his system, that compiler binary could "infect" binaries of his own compiler. As per my critique, to fully counter Thompson's attack (unless you limit his attack in a way Thompson did not intend), you must _at least_ fully solve the problem of virus detection. DDC does not do this, AFAICT.

- “So... let's employ multiple different program-handling programs, make it hard for them to collude, and use them to check each other. Now we're no longer dependent on any single supplier to decide if we get malicious code or not.”

You've gone from trust in 1 compiler, to trust in multiple compilers and assuming any subversion of them is independent. You are clearly _still_ having to use trust (Thompson's point). Further, there are at least several assumptions you're relying on that are unsafe (independence of compiler writers; collusion of such writers as the _only_ way compilers could be subverted).

- “the problem changes to "how much verification is enough"”, if you have fully countered something, why is there a grey area of how much verification is needed?! So, the attack is fully countered, but I now have to place trust instead in having used enough compilers? :)

Anyway, really useful technique that could improve levels of assurance, yes! "Fully" countered? Small, but important point: Not at all. Your work _re-inforces_ Thompson's point! :)

Also, note that your experience with tcc suggests there Thompson attackers could deploy counter-measures against the DCC technique which would be indistinguishable from ordinary bugs!

Fully = no longer a single point of trust

Posted Jun 22, 2013 20:10 UTC (Sat) by david.a.wheeler (subscriber, #72896) [Link]

"You've gone from trust in 1 compiler, to trust in multiple compilers and assuming any subversion of them is independent." - No. The point is that subverting a SET of compilers is much harder than subverting one. It's not that an attacker could subvert one, it's that the attacker HAS to subvert them ALL. If ANY of the compilers (original + all DDC compilers) is not subverted the same way, then the trusting trust attack is revealed. That changes everything. Instead of a single point of failure, or multiple single points of failure, the attacker must now simultaneously succeed in subverting all of the compilers used in DDC (as well as the original compiler under test). What's worse, the attacker will often not know what to compilers attack, and a defender can even create a special compiler just for DDC (which is much easier to create compared to a general-purpose compiler).

"You are clearly _still_ having to use trust (Thompson's point)." - No, I don't think that's the point. Thompson's paper isn't titled "Reflections on trust"! The problem isn't trust, it's "trusting trust" - that is, the need to totally trust some executable since there is no way to independently verify it. Which is why Thompson says you have to write everything yourself.

Now we have a way to independently verify program-handling programs (like compilers); we no longer need to continue "trusting trust". DDC fully counters the trusting trust problem, because it provides a mechanism to independently verify the compiler under test.

DDC allows you to decide what to use as your independent tests, and of course, their quality will determine DDC's effectiveness. But that's okay, because the defender gets to choose. You could even write your own compiler (call it "Z") for use in DDC, but notice that this is radically different than Thomson's assumptions. You could use Z to check the real compiler, using DDC, and then use the "real" compiler for everything else. Thompson assumes that, in order to trust it, you must write everything yourself. DDC squelches that; you might choose to write the verification tools yourself, and then use someone else's tools once you've verified them. That is quite different, in a very fundamental way.

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 22, 2013 20:20 UTC (Sat) by david.a.wheeler (subscriber, #72896) [Link]

"Many people _could_ work on compilers, _few_ do." - Even fewer work on all of the compilers for a given language, once there is more than one in simultaneous development.

"If a compiler developer tests out the binary compiler of another team on his system, that compiler binary could "infect" binaries of his own compiler." - Actually, that's quite unlikely. First of all, the delivered executables for widely-used compilers are typically generated in a compilation farm, not on some individual's desktop. And even on a desktop, it's way easier for a compiler-writer to write code to subvert themselves than to subvert other compilers, especially if the other compiler isn't widely used.

"As per my critique, to fully counter Thompson's attack (unless you limit his attack in a way Thompson did not intend), you must _at least_ fully solve the problem of virus detection. DDC does not do this, AFAICT." - Typically viruses won't have access to the build farms used by serious compilers, and it's much more difficult to write a virus to infect a program you have never seen.

"Anyway, really useful technique that could improve levels of assurance, yes! "Fully" countered? Small, but important point: Not at all. Your work _re-inforces_ Thompson's point! :)" - I disagree. We no longer need to keep "trusting trust", that is, trust some executable because we have no independent verification technique.

"Also, note that your experience with tcc suggests there Thompson attackers could deploy counter-measures against the DCC technique which would be indistinguishable from ordinary bugs." - No. Exactly backwards. The DDC process is so sensitive that it not only detects trusting trust attacks... it also detects some ordinary bugs. Thus, compiler authors should use DDC, because it counters trusting trust attacks and can detect some other kinds of bugs too.

Diverse Double-Compiling (DDC) counters "trusting trust"

Posted Jun 24, 2013 19:13 UTC (Mon) by david.a.wheeler (subscriber, #72896) [Link]

Quick correction: I said earlier, "it's much more difficult to write a virus to infect a program you have never seen." What I meant was that it's more difficult to write a virus to subtly change the inner semantics of a program you've never seen. Sure, a virus can include a payload and run it, e.g., to copy itself into new executables, or copy all the files to some external site. But compilers typically parse inputs into some internal data model, transform the data model, and then translate that out to assembler or machine code. If you don't know anything about the internal data model, it's hard to write code to subtly change the internal model.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds