LWN.net Logo

Binary "diversity"

By Jake Edge
August 28, 2013

There's been a lot of talk about reproducible (or deterministic) builds recently for the purposes of verifying that binaries come from the "right" source code. It's particularly topical right now, at least in part because of the NSA spying disclosures coupled with the concern that various governments are actively trying to backdoor applications (especially security applications). So, the Tor project and others (e.g. Bitcoin) have been working on ways to create reproducible builds.

But reproducible builds of necessity create predictable binaries. That gives an attacker information about the layout and organization of the code that can be used for return-oriented programming (ROP) attacks. An alternative is to introduce random changes into a binary as it is built to make these kinds of attacks more difficult. Stephen Crane recently suggested adding two kinds of code generation randomness into the LLVM compiler framework in a post to the LLVMdev mailing list.

As part of a team at the University of California, Irvine, Crane has been working on adding several kinds of randomness into binaries. He proposed that the team submit patches for two types of randomness for LLVM. The first is "NOP insertion", which adds NOPs (i.e. no ops) between machine instructions. The second is "scheduling randomization", which discards the existing instruction scheduling heuristics and randomly schedules any valid instruction at each point. The result is a binary that still runs correctly, is "slightly slower", but is far more resistant to ROP attacks. It is a "simplified subset" of the work described in a paper [PDF] by the team.

The technique is in some ways analogous to address-space layout randomization (ASLR). In both cases, the layout of the code is altered such that an attacker cannot predict where code of interest will live in memory. Either can be defeated by attackers that have access to certain kinds of information. For ASLR, determining the address of a library function in the running executable is generally enough to defeat it. For randomized binaries, the attacker would need to have read access to the binary itself to find the pieces needed for an exploit.

ROP attacks use pieces of existing code in a binary to perform their malicious task. By finding little snippets of code (typically ending in a return) and calling them in the right order, the attack can perform any operation that it needs to. ROP techniques came about after operating systems started marking data as non-executable to thwart buffer overflows and the like. Using ROP techniques, buffer overflows can still be used, but without executing any code on the stack.

Crane noted that there are other randomizations that the team has worked on, but that they planned to start small when proposing patches. Nadav Rotem asked about register allocation randomization, for example, which Crane said could be added to the patch submission.

The patched compiler passes the existing LLVM test suite on x86_64, Crane said. Implementing the changes for ARM is also underway.

Nick Kledzik asked how a software distributor might be able to deliver randomized binaries, given that they normally create a single binary that gets delivered to all of their users. Crane had some thoughts on that, including building multiple or individualized ("watermarked" for example) binaries. For open source, especially for security-sensitive binaries, users can just build their own to significantly raise the bar for attacks. Crane noted that ROP attacks can be used for jailbreaking. That might make the techniques of particular interest to LLVM sponsor Apple.

Security is always about trade-offs, and randomized binaries are just further confirmation of that. Diverse binaries would make verification of the correspondence between source and binary much more difficult but would also make ROP attacks harder. Given that most free software these days is built with GCC, it would be nice to see similar patches for that compiler suite. In any case, randomized binaries will soon be another tool available for the security-sensitive.


(Log in to post comments)

Keeping the binary content hidden

Posted Aug 29, 2013 6:29 UTC (Thu) by epa (subscriber, #39769) [Link]

If it is useful to randomize the binary so that an attacker can't know the exact addresses to jump to in an attack, that also suggests that binaries should have execute permission but not read permission. At least for those which run with higher privilege than their invoker.

For example, the attacker might get a local shell as an unprivileged user, and then proceed to use a known bug in /bin/mount to escalate to root. Perhaps /bin/mount has been compiled with randomization, which will make the stack-smashing attack more difficult. But since on a standard system /bin/mount is world-readable, the attacker can just analyse it afresh for suitable locations to jump to.

Keeping the binary content hidden

Posted Aug 29, 2013 11:18 UTC (Thu) by spender (subscriber, #23067) [Link]

Nevermind that the binary can be read locally even without read access to it... (yet more of that wonderful inconsistent security design through unfixed "user-visible API")

-Brad

Keeping the binary content hidden

Posted Aug 30, 2013 7:22 UTC (Fri) by geofft (subscriber, #59789) [Link]

While that's true of normal binaries, that's not true of setuid programs (like /bin/mount), is it?

Also, and possibly more importantly here, that's certainly not true of daemons or the kernel.

Keeping the binary content hidden

Posted Aug 30, 2013 12:30 UTC (Fri) by spender (subscriber, #23067) [Link]

It is. You don't keep setuid privilege in the process of reading it, though as the purpose is just reading the file, that doesn't matter.

-Brad

Keeping the binary content hidden

Posted Aug 30, 2013 14:20 UTC (Fri) by epa (subscriber, #39769) [Link]

Can you point to an example program which will read an executable despite lacking +r permission, and write the contents to stdout?

Keeping the binary content hidden

Posted Aug 30, 2013 14:34 UTC (Fri) by spender (subscriber, #23067) [Link]

Keeping the binary content hidden

Posted Aug 30, 2013 17:50 UTC (Fri) by nix (subscriber, #2304) [Link]

http://git.zx2c4.com/CVE-2012-0056/tree/ptrace-offset-fin...

is a clearer example. The ptrace()d child is not setuid, but of course its text section is the same, so randomization is trivially defeated and you can trivially suck its loadable sections out of /proc/$pid/mem (or PEEKTEXT it as the code there does). You can't get at non-loadable sections this way, but who cares?

Keeping the binary content hidden

Posted Sep 4, 2013 14:03 UTC (Wed) by ThinkRob (subscriber, #64513) [Link]

I'm showing my ignorance of how these attacks work here, but then why bother randomizing binaries at all? I get the point of ASLR, but if binary randomization is just a question of adding the hurdle of reading the thing from disk as part of some exploit, why bother? (Or am I missing something? [Likely!])

Binary "diversity"

Posted Aug 29, 2013 11:10 UTC (Thu) by SLi (subscriber, #53131) [Link]

LLVM already supports install-time optimization, i.e. it's possible to distribute the program as LLVM IR (typically bitcode) and run machine code generation on the target machine, generating code optimized for that particular variant of the processor/system. I believe Android phones also do something similar with Dalvik.

Would it not then be possible to deterministically compile the source code into LLVM IR and sign it, and then for users to verify the IR and after that run a randomized machine code generation step?

Binary "diversity"

Posted Aug 31, 2013 0:56 UTC (Sat) by robert_s (subscriber, #42402) [Link]

Just don't bother submitting a backtrace to the developers if your application crashes.

Binary "diversity"

Posted Sep 2, 2013 18:44 UTC (Mon) by nix (subscriber, #2304) [Link]

Just don't bother submitting a backtrace *without symbols* to the developers, anyway. A backtrace with symbols is always likely to be useful.

Binary "diversity"

Posted Sep 1, 2013 15:32 UTC (Sun) by deepfire (subscriber, #26138) [Link]

And a huge part of this steadily progressing madness is due to memory-unsafe languages..

Time for stuff like OCaml, or shall the punishment continue?

Binary "diversity"

Posted Sep 1, 2013 18:28 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Personally, I'm more of a Haskell guy, but I think Rust might be the one to do this.

Binary "diversity"

Posted Sep 2, 2013 18:30 UTC (Mon) by deepfire (subscriber, #26138) [Link]

Haskell is not very suitable a replacement, due to its extremely opaque memory usage behaviours.

Moreover, the lazy evaluation and total immutability is too unrealistic a thing to ask everyone to adopt.

Binary "diversity"

Posted Sep 2, 2013 18:36 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I also think that Haskell is unlikely to take the world by storm. I think the biggest problem is the laziness (which tends to cause the opaque memory usage). The immutability (at least by default) is, IMNSHO, much preferred to the current way (e.g., const should not have been the type decorator keyword in C, but something like "mutable" (probably "mut")).

Binary "diversity"

Posted Sep 2, 2013 18:57 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> I also think that Haskell is unlikely to take the world by storm. I think the biggest problem is the laziness (which tends to cause the opaque memory usage).

I would argue that Haskell's non-strict evaluation model is essential to its design, and very much preferable in the majority of cases to strict evaluation. Certainly Haskell programs would look quite different if strictness was the default.

The memory use isn't really all that opaque, once you understand the evaluation model. You just have to understand that any expression which hasn't yet been reduced to WHNF retains references to all its inputs. That can be a problem in the comparatively rare cases where an expression's inputs are much larger than it's result, so in those cases you use `seq` or a strict type annotation to force the full result to be computed early, releasing the references in the process.

Immutable data is definitely the way to go for concurrent programs. If you really need mutability in the middle of a pure Haskell function there is always the ST monad, but most of the time it's not really necessary, and making data immutable by default allows a host of performance optimizations while eliminating entire classes of concurrency bugs.

Binary "diversity"

Posted Sep 3, 2013 12:10 UTC (Tue) by deepfire (subscriber, #26138) [Link]

While there definitely are valid counterpoints in both of the static/dynamic camps, and, furthermore, in the immutability/limited mutability camps, there is the overarching social issue to consider.

Let's _try_ to get the memory/type safety beyond all people _first_.

This means server and desktop. Mobile is mostly already there, weirdly.

And let's kindly agree that Java is not the solution, for a multitude of reasons.
Oh, sh*t, Android..

Binary "diversity"

Posted Sep 3, 2013 10:58 UTC (Tue) by deepfire (subscriber, #26138) [Link]

So, speaking of plausible alternatives, how does Rust compare to OCaml?

Binary "diversity"

Posted Sep 8, 2013 16:34 UTC (Sun) by heijo (guest, #88363) [Link]

Huh?

Why not just randomize the order of functions in the binary?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds