There's been a lot of talk about reproducible (or deterministic) builds
recently for the purposes of verifying
that binaries come from the "right" source code. It's particularly
topical right now, at least in part because of the NSA spying disclosures
coupled with the concern that various
governments are actively trying to backdoor applications (especially
security applications). So, the Tor project and others (e.g. Bitcoin) have
been working on ways to create
But reproducible builds of necessity create predictable binaries. That
gives an attacker information about the layout and organization of the
code that can be used for return-oriented
programming (ROP) attacks. An alternative is to introduce
random changes into a binary as it is built to make these kinds of
attacks more difficult. Stephen Crane recently suggested
adding two kinds of code generation randomness into the LLVM compiler
framework in a post to the LLVMdev mailing list.
As part of a team at the University of California, Irvine,
Crane has been working on adding several kinds of randomness into
binaries. He proposed that the team submit patches for two types of
randomness for LLVM.
The first is "NOP insertion", which adds NOPs (i.e. no ops) between machine
instructions. The second is "scheduling randomization", which discards the
scheduling heuristics and randomly schedules any valid
instruction at each point. The result is a binary that still runs
correctly, is "slightly
slower", but is far more resistant to ROP attacks. It is a
"simplified subset" of the work described in a paper
[PDF] by the team.
The technique is in some ways analogous to address-space layout
randomization (ASLR). In both cases, the layout of the code is altered
such that an attacker cannot predict where code of interest will live in
memory. Either can be defeated by attackers that have access to certain kinds
of information. For ASLR, determining the address of a library function in the
running executable is generally enough to defeat it. For randomized binaries, the
attacker would need to have read access to the binary itself to find the pieces
needed for an exploit.
ROP attacks use pieces of existing code in a binary to perform their
malicious task. By finding little snippets of code (typically ending in a
return) and calling them in the right order, the attack can perform
any operation that it needs to. ROP techniques came about after operating
systems started marking data as non-executable to thwart buffer overflows
and the like. Using ROP techniques, buffer overflows can still be used,
but without executing any code on the stack.
Crane noted that there are other randomizations that the team has worked
on, but that they planned to start small when proposing patches. Nadav
Rotem asked about register allocation
randomization, for example, which Crane said could be added to the patch submission.
The patched compiler passes the existing LLVM test suite on x86_64, Crane
said. Implementing the changes for ARM is also underway.
Nick Kledzik asked how a software
distributor might be able to deliver randomized binaries, given that they
normally create a single binary that gets delivered to all of their
users. Crane had some thoughts on that,
including building multiple or individualized ("watermarked"
for example) binaries. For open source, especially for security-sensitive
binaries, users can just build their own to significantly raise the bar for
attacks. Crane noted that ROP attacks can be used for jailbreaking. That
might make the techniques of particular interest to LLVM sponsor Apple.
Security is always about trade-offs, and randomized binaries are just further
confirmation of that. Diverse binaries would make verification of the
source and binary much more difficult but would also make ROP
attacks harder. Given that most free software these days is built
with GCC, it would be nice to see similar patches for that compiler suite.
In any case, randomized binaries will soon be another tool available for
to post comments)