Optimization-unstable code
Compilers can be tricky beasts, especially where optimizations are concerned. A recent paper [PDF] from MIT highlighted some of the problems that can be caused by perfectly legitimate—if surprising—optimizations, some that can lead to security vulnerabilities. The problem stems from C language behavior that is undefined by the standard, which allows compiler writers to optimize those statements away.
Andrew McGlashan raised the issue on the debian-security mailing list, expressing some surprise that the topic hadn't already come up. The paper specifically cites tests done on the Debian "Wheezy" (7.0) package repository, which found that 40% of 8500+ C/C++ packages have "optimization-unstable code" (or just "unstable code"). That does not mean that all of those are vulnerabilities, necessarily, but they are uses of undefined behavior—bugs, for the most part.
The unstable code was found using a static analysis tool called STACK that was written by the authors of the paper, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. It is based on the LLVM compiler framework and checks for ten separate undefined behaviors. Since C compilers can assume that undefined behavior is never invoked by a program, the compiler can optimize the undefined behavior away—which is what can lead to vulnerabilities.
So, what kind of undefined behavior are we talking about here? Two of the examples given early in the paper help to answer that. The first is that overflowing a pointer is undefined:
char *buf = ...; unsigned int len = ...; if (buf + len < buf) /* overflow check */ ...The compiler can (and often does, depending on the -O setting) optimize the test away. On some architectures, according to the paper, that's no great loss as the test doesn't work. But on other architectures, it does protect against a too large value of len. Getting rid of the test could lead to a buffer overflow ... and buffer overflows can often be exploited.
The second example is a null pointer dereference in the Linux kernel:
struct tun_struct *tun = ...; struct sock *sk = tun->sk; if (!tun) return POLLERR; /* write to address based on tun */Normally that code would cause a kernel oops if tun is null, but if page zero is mapped for some reason, the code is basically harmless—as long as the test remains. Because the compiler sees the dereference operation, it can conclude that the pointer is always non-null and remove the test entirely, which turns a fairly innocuous bug into a potential kernel exploit.
Other undefined behaviors are examined as well. Signed integer overflow, division by zero, and oversized shifts are flagged, for example. In addition, operations like an overlapping memcpy(), use after free()/realloc(), and exceeding array bounds are checked.
The Debian discussion turned toward how to find and fix these kinds of bugs but, of course, they mostly or completely live upstream. As Mark Haase put it:
But Paul Wise noted that there is some ongoing work by Debian and Fedora developers to package static checkers for the distributions. STACK is on the list, he said, but relies on a version of LLVM that is not yet available for Debian. He recommended that interested folks get involved in those efforts and offered a list of links to get started.
There were some who felt the optimizations removing the unstable code were
actually compiler bugs. Miles Fidelman suggested the problem needed to be fixed
"WAY upstream
" in GCC itself: "if gcc's optimizer is opening a
class of security holes - then it's gcc that has to be fixed
". But
Haase was quick to throw cold water on that
idea, noting a GCC bug and an
LLVM blog
post series that pretty clearly show that compiler writers do not see
these kinds of optimizations as bugs. Haase said:
The problem for programmers is a lack of warnings about these kinds of
undefined constructs, Wise said. "Every use of undefined behaviour should
at minimum result in a compiler warning.
" But even doing that is
difficult (and noisy), Wade Richards said:
Joel Rees would like to see the standard
rewritten "to encourage sane behavior in
undefined situations
". Defining "sane" might be somewhat difficult,
of course.
Bernhard R. Link had a different suggestion:
Bugs in our code—many of which lead to security holes—are a never-ending problem, but over time we do at least seem to be getting some tools to assist in finding them. Given that different compilers, optimization levels, and compiler versions will give different behavior for this particular class of bugs makes them even harder to find. STACK seems like a good solution there—thankfully it is open source, unlike some other static analysis tools.
Index entries for this article | |
---|---|
Security | Static analysis |
Posted Dec 5, 2013 7:00 UTC (Thu)
by Lionel_Debroux (subscriber, #30014)
[Link] (1 responses)
The Undefined Behaviour sanitizer (-fsanitize=undefined), one in a family of code-instrumenting sanitizers which has been available in LLVM/Clang for several releases, and was ported to GCC (4.9+ for ubsan), is a step in the right direction for reducing involuntary reliance on undefined behaviour, through program aborts at runtime (whereas STACK is a static checker).
Posted Dec 5, 2013 11:51 UTC (Thu)
by geissert (guest, #61258)
[Link]
Posted Dec 5, 2013 8:35 UTC (Thu)
by jezuch (subscriber, #52988)
[Link] (8 responses)
> On some architectures, according to the paper, that's no great loss as the test doesn't work. But on other architectures, it does protect against a too large value of len.
My take on this is: if you invent TRICKSY code like this, it's your damn fault if it breaks.
> I think the only answer to those lines is to advise you to not use any programs written in C.
Haha only serious ;)
Posted Dec 5, 2013 12:42 UTC (Thu)
by Felix.Braun (guest, #3032)
[Link] (7 responses)
if (buf + len >= &len) ...
Posted Dec 5, 2013 13:42 UTC (Thu)
by HelloWorld (guest, #56129)
[Link]
Posted Dec 5, 2013 14:01 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Maybe I'm missing something, but what are you supposed to do if the check fails? Complain that malloc is giving you bad pointers? If there's a bug serious enough that malloc is giving you a block that wraps around the end of the address space, I'd almost it rather crash to get away from that libc ASAP ;) . Or you're not asking for a big enough block, but that's an easy fix too.
Posted Dec 5, 2013 14:44 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
The compiler can also reuse memory for variables with overlapping lifetimes.
Posted Dec 8, 2013 14:20 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Posted Dec 12, 2013 11:36 UTC (Thu)
by etrusco (guest, #4227)
[Link] (2 responses)
if (buf + len >= (char *) len) ...
?
Posted Dec 12, 2013 12:03 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Dec 12, 2013 14:07 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 5, 2013 10:06 UTC (Thu)
by mti (subscriber, #5390)
[Link] (6 responses)
What I would love would be a compiler mode that turn most undefined behavior into _platform dependent_ behavior. E.g. for 32-bit x86 some of the rules could be:
* shifts are done using the lower 5 bits of the second operand
* pointer arithmetic is always done using normal unsigned 32-bit arithmetic
* division by zero does generate a DIV instruction (that will cause an exception if executed)
These rules could be part of, or a companion to, the platform ABI.
Posted Dec 5, 2013 14:26 UTC (Thu)
by sorokin (guest, #88478)
[Link] (1 responses)
You can done this already: a << (b & 0x1f). I believe (but I haven't checked) on x86 compiler will optimize bitwise and away.
> * pointer arithmetic is always done using normal unsigned 32-bit arithmetic
Perhaps this could be a separate library for low-level tricks. libtricks?
Posted Dec 5, 2013 15:24 UTC (Thu)
by mti (subscriber, #5390)
[Link]
Yes. But what I wanted to avoid was _undefined_ behavior. In the case of shifts I don't really care what the _platorform defined_ behavior is. I could live with a rule that says "return some random bit pattern". But my rule which also could be interpreted as "if unsure, use the SHL instruction" just seemed slightly more useful.
The problem with C undefined is that it can be anything. A standard compliant compiler could translate 'a << -1' as 'system("rm -rf /")'.
> Perhaps this could be a separate library for low-level tricks. libtricks?
Yes. And libtricks.c would have to be written in assembler _or_ compiled with my requested new mode;-)
But you are right, I would only use that new mode in the few parts of my programs that really need them.
Posted Dec 6, 2013 1:40 UTC (Fri)
by dashesy (guest, #74652)
[Link] (3 responses)
Posted Dec 6, 2013 3:36 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
It's easy to look at optimizations based on undefined behavior a merely making the compiler authors' jobs easier, but that isn't really the case. The compiler is just doing the best it can under the assumption that the programmer wrote what he or she meant, meaning that any undefined cases are really "don't cares", i.e. will never happen. A good example is NULL pointer dereferences; if you dereference a pointer without first checking that it isn't NULL, the fact that the NULL dereference case is undefined means that the compiler can take this as a hint that you know something it can't: that the pointer won't be NULL at that point in the program. Signed overflow is another such case, since the fact that it's undefined behavior allows the compiler to assume no overflow will take place and optimize loops which it would otherwise be unable to optimize due to the need to handle the overflow.
A compiler is expected to produce correct code first and foremost, not just code which is small and/or fast. However, it can only generate correct code when it's given correct programs. A correct program won't depend on undefined behavior.
Posted Dec 13, 2013 1:29 UTC (Fri)
by pjm (guest, #2080)
[Link] (1 responses)
The danger of this “treat definedness as knowledge of data values” approach is of course that the resulting compiler behaviour (and behaviour of compiled binaries) makes C/C++ a more dangerous choice of language. Runtime efficiency is a significant reason why people choose to use C/C++, but it's worthwhile looking for other approaches to get that efficiency without the costs in security bugs and obstacles to debugging (nigh impossibility of reasoning about program behaviour when ‘if’ blocks can be optimized away).
A more explicit approach to conveying that information to the compiler would be something more along the lines of ‘ [Programmer-specified preconditions also give a path to overcoming those concerns about how to warn: if the program claims that the given preconditions are complete, and the compiler determines that the given preconditions don't imply tun != NULL, then the compiler can issue a warning or error as soon as it sees By itself, an ‘assume’ facility provides only optimization rather than undefinedness-bug-prevention; but by providing an alternative, it makes it more reasonable to change compiler behaviour to make C/C++ less dangerous choices of implementation language.
Posted Dec 13, 2013 3:25 UTC (Fri)
by dashesy (guest, #74652)
[Link]
Posted Dec 5, 2013 12:18 UTC (Thu)
by epa (subscriber, #39769)
[Link]
if (would_overflow(buf + len)) { ... }
where would_overflow is a language builtin that returns whether its argument (which must be a simple arithmetic expression) would cause an integer overflow if evaluated.
Built on that primitive you could make macros which evaluate an expression but goto an error case on overflow, etc.
Even expert programmers, trying hard to write safe code, introduce bugs related to integer overflow. The current semantics are just too difficult to use safely.
Posted Dec 6, 2013 8:56 UTC (Fri)
by error27 (subscriber, #8346)
[Link]
Posted Feb 17, 2015 8:33 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link] (1 responses)
• http://thread.gmane.org/gmane.linux.debian.devel.general/...
(OT: why is it that G**gle finds lists.d.o and MARC and osdir and spammy mailing list archives, but not GMane?)
Posted Feb 17, 2015 10:49 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link]
Optimization-unstable code
Optimization-unstable code
Optimization-unstable code
> unsigned int len = ...;
>
> if (buf + len < buf) /* overflow check */
> ...
Optimization-unstable code
The way I see it, buf + len can only be < buf if len is sufficiently big to wrap around the address space. Wouldn't it be much more useful to check for a buffer overflow like this instead?
char *buf = ...;
unsigned int len = ...;
if (buf + len < buf) /* overflow check */
That makes no sense. Why would there be any relationship between the buffer's bounds and len's address?
You do have a point though: the check quoted in the LWN article is insufficient, and in fact, the check in the paper is different:
Optimization-unstable code
char *buf = ...;
char *buf_end = ...;
unsigned int len = ...;
if (buf + len >= buf_end)
return; /* len too large */
if (buf + len < buf)
return; /* overflow, buf+len wrapped around */
/* write to buf[0..len-1] */
Optimization-unstable code
Optimization-unstable code
Optimization-unstable code
The compiler can also reuse memory for variables with overlapping lifetimes.
I hope you meant 'non-overlapping'!
Optimization-unstable code
Optimization-unstable code
Optimization-unstable code
Optimization-unstable code
Optimization-unstable code
> * division by zero does generate a DIV instruction (that will cause an exception if executed)
> These rules could be part of, or a companion to, the platform ABI.
Optimization-unstable code
A compiler flag to warn, any time compiler optimizes something assuming that alternative is undefined would be helpful too. Right now compilers are actually more than happy if I accidentally trigger one of these undefined behaviors, because they could just simplify the whole thing to equivalent of Optimization-unstable code
return 0;
and make a code that is much faster although unexpected, thus useless.
As long as compiler writers are concerned, their paycheck depends on how faster (or smaller) the compiled binary is, while mine requires the binary to work first. The concept of correctness depends on the perspective I guess.
Optimization-unstable code
> the fact that the NULL dereference case is undefined means that the compiler can take this as a hint that you know something it can't: that the pointer won't be NULL at that point in the program
A path towards avoiding costs of undefined behaviour optimizations
__builtin_assume(tun != NULL)
’. It could be wrapped in a macro that tests for compiler version and falls back to expr_ ? 0 : 1 << -1
(suitably parenthesized and void-cast), at least for the case that expr_
has no side effects. This could even be built into an assert-like macro in the NDEBUG case, so that such assertions provide both a testing benefit without NDEBUG, and a speed benefit with NDEBUG. Similarly, it mixes well with the design-by-contract approach of specifying all the preconditions of a function.
tun->sk
. It requires work; but the payoff in debugging time and bug cost can be worthwhile.]
Undefined behaviors in C are like a minefield, there are many of them, and programmers learn them by heart only after hitting them hard. A path towards avoiding costs of undefined behaviour optimizations
Compiler does not help in teaching about them and reading is not enough because they are very subtle. If warning is issued when an optimization is kicked in, one learns them and can also opt-in the optimization when it is safe to do so.
Checking for overflow
Just ignore warnings generated inside macros
Optimization-unstable code
• http://thread.gmane.org/gmane.linux.debian.devel.general/...
Optimization-unstable code