A return-oriented programming defense from OpenBSD
In a classic stack-smashing attack, the attack code would be written directly to the stack and executed there. Most modern systems do not allow execution of on-stack code, though, so this kind of attack will be ineffective. The stack does affect code execution, though, in that the call chain is stored there; when a function executes a "return" instruction, the address to return to is taken from the stack. An attacker who can overwrite the stack can, thus, force a function to "return" to an arbitrary location.
That alone can be enough to carry out some types of attacks, but ROP adds another level of sophistication. A search through a body of binary code will turn up a great many short sequences of instructions ending in a return instruction. These sequences are termed "gadgets"; a large program contains enough gadgets to carry out almost any desired task — if they can be strung together into a chain. ROP works by locating these gadgets, then building a series of stack frames so that each gadget "returns" to the next.
This technique allows the construction of arbitrary programs on the stack without the need for execute permission on the stack itself. It is worth noting that, on a complex-instruction-set architecture like x86, unexpected gadgets can be created by jumping into the middle of a multi-byte instruction, a phenomenon termed "polymorphism". Needless to say, there are tools out there that can be used by an attacker to locate gadgets and string them together into programs.
The RETGUARD mechanism, posted by Theo de Raadt on August 19, makes use of a simple return-address transformation to disrupt ROP chains and prevent them from executing as intended. It takes the form of a patch to the LLVM compiler adding a new -fret-protector flag. When code is compiled with that flag, two things happen:
- The prologue to each function (the code that runs before the
body of the function itself) exclusive-ORs the return address on the
stack with the value of the stack pointer itself.
- The epilogue, run just before the function returns, repeats the operation to restore the return address to its initial value.
The exclusive-OR operation changes the return address into something that is effectively random, especially when address-space layout randomization is used to place the stack at an unpredictable location. With this change, the first gadget used by a ROP sequence will, when it attempts the second step above, transform the return address into something unpredictable and, most likely, useless to an attacker. That will stop the chain and thwart the attack.
There is, of course, a significant limitation here: a ROP chain made up of
exclusively polymorphic gadgets will still work, since those gadgets were
not (intentionally) created by the compiler and do not contain the
return-address-mangling code. De Raadt acknowledged this limitation, but
said: "we believe once standard-RET is solved those concerns become
easier to address separately in the future. In any case a substantial
reduction of gadgets is powerful
".
Using the compiler to insert the hardening code greatly eases the task of
applying RETGUARD to both the OpenBSD kernel and its user-space code. At
least, that is true for code written in a high-level language. Any code
written in assembly must be changed by hand, though, which is a fair amount
of work. De Raadt and company have done that work; he reports that:
"We are at the point where userland and base are fully working
without regressions, and the remaining impacts are in a few larger ports
which directly access the return address (for a variety of
reasons)
". It can be expected that, once these final issues are
dealt with, OpenBSD will ship with this hardening enabled.
It makes sense to ask whether this relatively straightforward hardening technique could be applied to the Linux kernel as well. Using LLVM to build the kernel is not yet a viable option, but it should be possible to reimplement the RETGUARD transformations as a GCC plugin module. The tiresome task of fixing up the assembly code would also need to be done; the objtool utility could probably be pressed into service to help with this task. But the patch that emerged would not be small.
If any benchmarks have been run to determine the cost of using RETGUARD,
they have not been publicly posted. The extra code will make the kernel a
little bigger, and the extra overhead on every function is likely to add up
in the end. But if this technique can make the kernel that much harder to
exploit, it may well justify the extra execution overhead that it brings
with it. All that's needed is somebody to actually do the work and try it
out.
Index entries for this article | |
---|---|
Kernel | Security/Kernel hardening |
Security | Hardening |
Security | OpenBSD |
Posted Aug 30, 2017 2:25 UTC (Wed)
by pabs (subscriber, #43278)
[Link] (4 responses)
https://www.grsecurity.net/rap_announce.php
Posted Aug 30, 2017 3:26 UTC (Wed)
by smurf (subscriber, #17840)
[Link]
Posted Aug 30, 2017 6:54 UTC (Wed)
by PaXTeam (guest, #24616)
[Link]
Posted Aug 30, 2017 9:04 UTC (Wed)
by alonz (subscriber, #815)
[Link] (1 responses)
Posted Aug 30, 2017 11:21 UTC (Wed)
by roc (subscriber, #30627)
[Link]
Posted Aug 30, 2017 4:18 UTC (Wed)
by luto (guest, #39314)
[Link] (1 responses)
Posted Aug 30, 2017 5:06 UTC (Wed)
by josh (subscriber, #17465)
[Link]
Posted Aug 30, 2017 13:58 UTC (Wed)
by epa (subscriber, #39769)
[Link] (7 responses)
Here I am thinking of the program as a single lump of object code which is compiled and linked in its entirety before running. Obviously if you have loadable modules or you are writing a dynamically linked library you have to allow more flexibility in return addresses.
Posted Aug 31, 2017 0:12 UTC (Thu)
by droundy (subscriber, #4559)
[Link] (6 responses)
Posted Aug 31, 2017 8:34 UTC (Thu)
by epa (subscriber, #39769)
[Link] (3 responses)
If the program is not multithreaded, and static analysis shows that a function is not re-entrant, then its return address (or a number to look up the return address) does not need to be on the stack at all. It can be at a fixed location, making it harder for an attacker to overwrite, and saving some stack space too.
Indeed, there's a case for saying that all re-entrant functions should need to be explicitly tagged as such by the programmer, since they need more care in writing. If the function isn't tagged as re-entrant, and the compiler cannot statically prove that it can never be called by some chain starting from itself, then that's a compile-time warning or error. On the other hand, if a function is never re-entrant, all its local variables can be allocated statically away from the stack. Who knows, perhaps some optimizing compilers do this already.
Posted Aug 31, 2017 9:31 UTC (Thu)
by karkhaz (subscriber, #99844)
[Link] (1 responses)
1. You say
AFAIK there will be plenty of situations where a function cannot be called through a chain starting from itself, but a static analysis cannot prove this, so the compiler will emit false positives (i.e. tell you that you need to tag the function when you should not). This is due to function pointers; static analyses typically over-approximate what concrete values function pointers might have at runtime. Analyses that have a more precise idea about function pointer addresses are typically very slow.
Note also that the analysis would need to have the entire program at its disposal to determine the values of function pointers, while here we're talking about the compiler (which only has access to a single translation unit). There are tools like CBMC [1] that don't have much trouble with analyzing entire programs and serve as a drop-in replacement to GCC, and there has been some work [2] to the Clang Static Analyzer that would enable it to analyze the entire program at once (still in discussion), but these are both way beyond the capabilities of a regular compiler anyway. Alternatively, it could be done as a link-time optimization (well, it would be a link-time analysis, but the line between analysis and optimization is quite fine), as at link-time you have the whole program, though I don't know what the exact capabilities of LLVM and GCC's LTO are and whether this would be possible.
2. I'm not sure what reentrancy has to do with this, would you mind elaborating?
[1] http://www.cprover.org/cbmc/
Posted Aug 31, 2017 13:41 UTC (Thu)
by epa (subscriber, #39769)
[Link]
After my first paragraph I went off on a separate idea which was that the stack could be avoided altogether if a function can only be executing 'once' -- it cannot call itself so it cannot appear twice in a call stack. In that case you can set aside a static area of memory, which is read-write of course, but is physically separate from the stack and so perhaps less likely to be trampled in a typical stack smashing attack. This might not really be a win, if it just means that memory trampling attacks against this static area become easier.
Posted Aug 31, 2017 10:16 UTC (Thu)
by ibukanov (subscriber, #3942)
[Link]
Posted Aug 31, 2017 15:26 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Aug 31, 2017 19:15 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Aug 30, 2017 20:15 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (2 responses)
Posted Aug 31, 2017 12:00 UTC (Thu)
by sorokin (guest, #88478)
[Link] (1 responses)
Posted Aug 31, 2017 12:07 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
Yes, I realise that. It is at least harder though as you have to avoid overwriting your own in the process.
Posted Aug 31, 2017 12:04 UTC (Thu)
by sorokin (guest, #88478)
[Link] (10 responses)
1. Xoring the return address stored in stack. (article)
Just for completeness I would like to say that another possible solution would be having two separate stacks. One for return addresses and for variables whose address is not taken. Another for variables whose address is taken.
Posted Aug 31, 2017 15:24 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Aug 31, 2017 19:17 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Aug 31, 2017 18:07 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (6 responses)
http://dslab.epfl.ch/proj/cpi/
Though IIUC there are still some unresolved integration and interoperability issues involving threading, dynamic memory, etc, that make SafeStack undesirable to use long-term until those issues are resolved.
Posted Aug 31, 2017 22:18 UTC (Thu)
by thestinger (guest, #91827)
[Link]
Posted Sep 11, 2017 13:29 UTC (Mon)
by itvirta (guest, #49997)
[Link] (3 responses)
So, if I got that right, they make another software controlled stack for variables/arrays that get their pointers taken,
I've sometimes wondered why CPUs don't implement a separate, protected, stack for the CALL/RETURN instructions.
Posted Sep 11, 2017 18:40 UTC (Mon)
by wahern (subscriber, #37304)
[Link]
From Table 1 of the Code-Pointer Integrity (2014) paper:
Safe Stack is the dual-stack mechanism. CPS(weak) and CPI (strong) are for dealing with function pointers in heap data.
Posted Sep 12, 2017 16:20 UTC (Tue)
by zlynx (guest, #2285)
[Link] (1 responses)
If only people would stop relying on the x86 / amd64 ISA.
Posted Sep 12, 2017 18:32 UTC (Tue)
by excors (subscriber, #95769)
[Link]
(ARMv7/AArch32 is similar but more confusing, because there are lots of mostly-deprecated ways of returning by using PC as a destination register, and you usually push/pop LR/PC in the same instruction as all the other registers you want to preserve (whereas AArch64 can only push/pop a pair of registers at once), and the Thumb instruction encoding has lots of limitations, and there are only half as many registers as AArch64, so a separate control stack might be significantly more expensive there.)
Posted Sep 16, 2017 22:16 UTC (Sat)
by areilly (subscriber, #87829)
[Link]
Posted Aug 31, 2017 23:04 UTC (Thu)
by shemminger (subscriber, #5739)
[Link]
Also, there is some evidence that these are reducing the use of ROP in exploits:
A return-oriented programming defense from OpenBSD
https://www.grsecurity.net/rap_faq.php
A return-oriented programming defense from OpenBSD
Using the stack pointer itself to encode the return address is an interesting idea that should help thwart such attacks with significantly lower overhead.
A return-oriented programming defense from OpenBSD
Note that the scheme used in RAP has a different trade-off - RAP needs two extra registers (which probably hurts performance) but it also doesn't alter the return address iterator. Modifying the return address are runtime, the way this technique does, will invalidate the processor's branch prediction and hurt performance.
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
> and the compiler cannot statically prove that it can never be called by some chain starting from itself
[2] https://reviews.llvm.org/D30691
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
2. Allowing a function to return only to a finite number of places using an array of possible return addresses. (epa)
3. Keeping return address in a global variable in the case the function is non-recursive. (epa)
4. Duplicate a return address at bottom of stack frame. (michaeljt)
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
http://clang.llvm.org/docs/SafeStack.html
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
and keep the rest in the usual hardware stack?
That should deal with all sorts of overwriting and adding return pointers, if the stack was made read-only or completely
inaccessible to other instructions. But I don't know if there would be some prohibitive cost to that.
A return-oriented programming defense from OpenBSD
Safe Stack CPS CPI
-------------------------------------------
Average (C/C++) | 0.0% 1.9% 8.4%
Median (C/C++) | 0.0% 0.4% 0.4%
Maximum (C/C++) | 4.1% 17.2% 44.2%
-------------------------------------------
Average (C only) | -0.4% 1.2% 2.9%
Median (C only) | -0.3% 0.5% 0.7%
Maximum (C only) | 4.1% 13.3% 16.3%
-------------------------------------------
Table 1: Summary of SPEC CPU2006
performance overheads.
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
A return-oriented programming defense from OpenBSD
https://msdn.microsoft.com/en-us/library/windows/desktop/...(v=vs.85).aspx
Though there are IP issues in using some of this in Linux.
https://www.endgame.com/blog/technical-blog/rop-dying-and...