Optimization-unstable code

By Jake Edge
December 4, 2013

Compilers can be tricky beasts, especially where optimizations are concerned. A recent paper [PDF] from MIT highlighted some of the problems that can be caused by perfectly legitimate—if surprising—optimizations, some that can lead to security vulnerabilities. The problem stems from C language behavior that is undefined by the standard, which allows compiler writers to optimize those statements away.

Andrew McGlashan raised the issue on the debian-security mailing list, expressing some surprise that the topic hadn't already come up. The paper specifically cites tests done on the Debian "Wheezy" (7.0) package repository, which found that 40% of 8500+ C/C++ packages have "optimization-unstable code" (or just "unstable code"). That does not mean that all of those are vulnerabilities, necessarily, but they are uses of undefined behavior—bugs, for the most part.

The unstable code was found using a static analysis tool called STACK that was written by the authors of the paper, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. It is based on the LLVM compiler framework and checks for ten separate undefined behaviors. Since C compilers can assume that undefined behavior is never invoked by a program, the compiler can optimize the undefined behavior away—which is what can lead to vulnerabilities.

So, what kind of undefined behavior are we talking about here? Two of the examples given early in the paper help to answer that. The first is that overflowing a pointer is undefined:

    char *buf = ...;
    unsigned int len = ...;

    if (buf + len < buf)  /* overflow check */
        ...

The compiler can (and often does, depending on the -O setting) optimize the test away. On some architectures, according to the paper, that's no great loss as the test doesn't work. But on other architectures, it does protect against a too large value of len. Getting rid of the test could lead to a buffer overflow ... and buffer overflows can often be exploited.

The second example is a null pointer dereference in the Linux kernel:

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
        return POLLERR;
    /* write to address based on tun */

Normally that code would cause a kernel oops if tun is null, but if page zero is mapped for some reason, the code is basically harmless—as long as the test remains. Because the compiler sees the dereference operation, it can conclude that the pointer is always non-null and remove the test entirely, which turns a fairly innocuous bug into a potential kernel exploit.

Other undefined behaviors are examined as well. Signed integer overflow, division by zero, and oversized shifts are flagged, for example. In addition, operations like an overlapping memcpy(), use after free()/realloc(), and exceeding array bounds are checked.

The Debian discussion turned toward how to find and fix these kinds of bugs but, of course, they mostly or completely live upstream. As Mark Haase put it:

Anyway, I don't see what this has to do with Debian. It's an interesting paper, but Debian can't find and fix all upstream bugs, nor do I think most users would be happy if suddenly everything was compiled without any optimizations.

But Paul Wise noted that there is some ongoing work by Debian and Fedora developers to package static checkers for the distributions. STACK is on the list, he said, but relies on a version of LLVM that is not yet available for Debian. He recommended that interested folks get involved in those efforts and offered a list of links to get started.

There were some who felt the optimizations removing the unstable code were actually compiler bugs. Miles Fidelman suggested the problem needed to be fixed "WAY upstream" in GCC itself: "if gcc's optimizer is opening a class of security holes - then it's gcc that has to be fixed". But Haase was quick to throw cold water on that idea, noting a GCC bug and an LLVM blog post series that pretty clearly show that compiler writers do not see these kinds of optimizations as bugs. Haase said:

Compiler developers, for better or worse, reserve the right to do whatever they want with undefined behavior, and it's up to the person writing the C code to not include undefined behavior in their own program.

The problem for programmers is a lack of warnings about these kinds of undefined constructs, Wise said. "Every use of undefined behaviour should at minimum result in a compiler warning." But even doing that is difficult (and noisy), Wade Richards said:

The short answer is because of macro expansion and other code-rearranging optimizations (inlining functions, loop unrolling, pulling expressions out of a loop, etc.), undefined code appears and is removed more often than you'd expect. Issuing a warning *every time* this happens would generate many confusing warnings that users wouldn't like.

Joel Rees would like to see the standard rewritten "to encourage sane behavior in undefined situations". Defining "sane" might be somewhat difficult, of course. Bernhard R. Link had a different suggestion:

I think the only answer to those lines is to advise you to not use any programs written in C. I suggest writing everything in Haskell and compiling that to java byte code run in a jvm. With the jvm implemented in Haskell and running in an interpreter.

Bugs in our code—many of which lead to security holes—are a never-ending problem, but over time we do at least seem to be getting some tools to assist in finding them. Given that different compilers, optimization levels, and compiler versions will give different behavior for this particular class of bugs makes them even harder to find. STACK seems like a good solution there—thankfully it is open source, unlike some other static analysis tools.

Index entries for this article
Security	Static analysis

Optimization-unstable code

Posted Dec 5, 2013 7:00 UTC (Thu) by Lionel_Debroux (subscriber, #30014) [Link] (1 responses)

The LLVM project provides Debian and Ubuntu nightly packages (currently at 3.4-rc2+): http://llvm.org/apt/ .

The Undefined Behaviour sanitizer (-fsanitize=undefined), one in a family of code-instrumenting sanitizers which has been available in LLVM/Clang for several releases, and was ported to GCC (4.9+ for ubsan), is a step in the right direction for reducing involuntary reliance on undefined behaviour, through program aborts at runtime (whereas STACK is a static checker).

Optimization-unstable code

Posted Dec 5, 2013 11:51 UTC (Thu) by geissert (guest, #61258) [Link]

IIRC undefined-trap is compile-time only, but it doesn't cover all of the 'undefined' sanitizer.

Optimization-unstable code

Posted Dec 5, 2013 8:35 UTC (Thu) by jezuch (subscriber, #52988) [Link] (8 responses)

> char *buf = ...;
> unsigned int len = ...;
>
> if (buf + len < buf) /* overflow check */
> ...

> On some architectures, according to the paper, that's no great loss as the test doesn't work. But on other architectures, it does protect against a too large value of len.

My take on this is: if you invent TRICKSY code like this, it's your damn fault if it breaks.

> I think the only answer to those lines is to advise you to not use any programs written in C.

Haha only serious ;)

Optimization-unstable code

Posted Dec 5, 2013 12:42 UTC (Thu) by Felix.Braun (guest, #3032) [Link] (7 responses)

char *buf = ...;
unsigned int len = ...;
if (buf + len < buf) /* overflow check */

The way I see it, buf + len can only be < buf if len is sufficiently big to wrap around the address space. Wouldn't it be much more useful to check for a buffer overflow like this instead?

if (buf + len >= &len) ...

Optimization-unstable code

Posted Dec 5, 2013 13:42 UTC (Thu) by HelloWorld (guest, #56129) [Link]

That makes no sense. Why would there be any relationship between the buffer's bounds and len's address? You do have a point though: the check quoted in the LWN article is insufficient, and in fact, the check in the paper is different:

char *buf = ...;
char *buf_end = ...;
unsigned int len = ...;
if (buf + len >= buf_end)
  return;   /* len too large */
if (buf + len < buf)
  return;   /* overflow, buf+len wrapped around */
/* write to buf[0..len-1] */

Optimization-unstable code

Posted Dec 5, 2013 14:01 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

That only works if buf points to the stack (which would happen with char buf[]). If buf is malloc'd or points to a .data section, its value and len's location on the stack are pretty unrelated.

Maybe I'm missing something, but what are you supposed to do if the check fails? Complain that malloc is giving you bad pointers? If there's a bug serious enough that malloc is giving you a block that wraps around the end of the address space, I'd almost it rather crash to get away from that libc ASAP ;) . Or you're not asking for a big enough block, but that's an easy fix too.

Optimization-unstable code

Posted Dec 5, 2013 14:44 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (1 responses)

No, it doesn't work at all. The compiler can move automatic variables in the stack as it wishes. For example, with "-fstack-protector" it is a good idea for the compiler to move arrays near the top of the stack, so that overflows will overwrite the canary value and will be detected.

The compiler can also reuse memory for variables with overlapping lifetimes.

Optimization-unstable code

Posted Dec 8, 2013 14:20 UTC (Sun) by nix (subscriber, #2304) [Link]

The compiler can also reuse memory for variables with overlapping lifetimes.

I hope you meant 'non-overlapping'!

Optimization-unstable code

Posted Dec 12, 2013 11:36 UTC (Thu) by etrusco (guest, #4227) [Link] (2 responses)

Maybe you meant

if (buf + len >= (char *) len) ...

Optimization-unstable code

Posted Dec 12, 2013 12:03 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

That would only ever be true if buf is between NULL and address len. That's an awfully low pointer to get in userspace.

Optimization-unstable code

Posted Dec 12, 2013 14:07 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Err, nevermind. I should read things closer.

Optimization-unstable code

Posted Dec 5, 2013 10:06 UTC (Thu) by mti (subscriber, #5390) [Link] (6 responses)

Undefined behavior in C is a pain when you are doing low-level programming. From time to time I find situations where the best way to solve a problem is to make some assumtion that I cannot do according to the C standard but that I know is true. E.g. I know the computer uses 32-bit two's complement integers.

What I would love would be a compiler mode that turn most undefined behavior into _platform dependent_ behavior. E.g. for 32-bit x86 some of the rules could be:

* shifts are done using the lower 5 bits of the second operand

* pointer arithmetic is always done using normal unsigned 32-bit arithmetic

* division by zero does generate a DIV instruction (that will cause an exception if executed)

These rules could be part of, or a companion to, the platform ABI.

Optimization-unstable code

Posted Dec 5, 2013 14:26 UTC (Thu) by sorokin (guest, #88478) [Link] (1 responses)

> * shifts are done using the lower 5 bits of the second operand

You can done this already: a << (b & 0x1f). I believe (but I haven't checked) on x86 compiler will optimize bitwise and away.

> * pointer arithmetic is always done using normal unsigned 32-bit arithmetic
> * division by zero does generate a DIV instruction (that will cause an exception if executed)
> These rules could be part of, or a companion to, the platform ABI.

Perhaps this could be a separate library for low-level tricks. libtricks?

Optimization-unstable code

Posted Dec 5, 2013 15:24 UTC (Thu) by mti (subscriber, #5390) [Link]

> You can done this already: a << (b & 0x1f).

Yes. But what I wanted to avoid was _undefined_ behavior. In the case of shifts I don't really care what the _platorform defined_ behavior is. I could live with a rule that says "return some random bit pattern". But my rule which also could be interpreted as "if unsure, use the SHL instruction" just seemed slightly more useful.

The problem with C undefined is that it can be anything. A standard compliant compiler could translate 'a << -1' as 'system("rm -rf /")'.

> Perhaps this could be a separate library for low-level tricks. libtricks?

Yes. And libtricks.c would have to be written in assembler _or_ compiled with my requested new mode;-)

But you are right, I would only use that new mode in the few parts of my programs that really need them.

Optimization-unstable code

Posted Dec 6, 2013 1:40 UTC (Fri) by dashesy (guest, #74652) [Link] (3 responses)

A compiler flag to warn, any time compiler optimizes something assuming that alternative is undefined would be helpful too. Right now compilers are actually more than happy if I accidentally trigger one of these undefined behaviors, because they could just simplify the whole thing to equivalent of return 0; and make a code that is much faster although unexpected, thus useless.
As long as compiler writers are concerned, their paycheck depends on how faster (or smaller) the compiled binary is, while mine requires the binary to work first. The concept of correctness depends on the perspective I guess.

Optimization-unstable code

Posted Dec 6, 2013 3:36 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (2 responses)

I think you would find that there are too many perfectly legitimate optimizations of this sort for the warning to be useful.

It's easy to look at optimizations based on undefined behavior a merely making the compiler authors' jobs easier, but that isn't really the case. The compiler is just doing the best it can under the assumption that the programmer wrote what he or she meant, meaning that any undefined cases are really "don't cares", i.e. will never happen. A good example is NULL pointer dereferences; if you dereference a pointer without first checking that it isn't NULL, the fact that the NULL dereference case is undefined means that the compiler can take this as a hint that you know something it can't: that the pointer won't be NULL at that point in the program. Signed overflow is another such case, since the fact that it's undefined behavior allows the compiler to assume no overflow will take place and optimize loops which it would otherwise be unable to optimize due to the need to handle the overflow.

A compiler is expected to produce correct code first and foremost, not just code which is small and/or fast. However, it can only generate correct code when it's given correct programs. A correct program won't depend on undefined behavior.

A path towards avoiding costs of undefined behaviour optimizations

Posted Dec 13, 2013 1:29 UTC (Fri) by pjm (guest, #2080) [Link] (1 responses)

> the fact that the NULL dereference case is undefined means that the compiler can take this as a hint that you know something it can't: that the pointer won't be NULL at that point in the program

The danger of this “treat definedness as knowledge of data values” approach is of course that the resulting compiler behaviour (and behaviour of compiled binaries) makes C/C++ a more dangerous choice of language. Runtime efficiency is a significant reason why people choose to use C/C++, but it's worthwhile looking for other approaches to get that efficiency without the costs in security bugs and obstacles to debugging (nigh impossibility of reasoning about program behaviour when ‘if’ blocks can be optimized away).

A more explicit approach to conveying that information to the compiler would be something more along the lines of ‘__builtin_assume(tun != NULL)’. It could be wrapped in a macro that tests for compiler version and falls back to expr_ ? 0 : 1 << -1 (suitably parenthesized and void-cast), at least for the case that expr_ has no side effects. This could even be built into an assert-like macro in the NDEBUG case, so that such assertions provide both a testing benefit without NDEBUG, and a speed benefit with NDEBUG. Similarly, it mixes well with the design-by-contract approach of specifying all the preconditions of a function.

[Programmer-specified preconditions also give a path to overcoming those concerns about how to warn: if the program claims that the given preconditions are complete, and the compiler determines that the given preconditions don't imply tun != NULL, then the compiler can issue a warning or error as soon as it sees tun->sk. It requires work; but the payoff in debugging time and bug cost can be worthwhile.]

By itself, an ‘assume’ facility provides only optimization rather than undefinedness-bug-prevention; but by providing an alternative, it makes it more reasonable to change compiler behaviour to make C/C++ less dangerous choices of implementation language.

A path towards avoiding costs of undefined behaviour optimizations

Posted Dec 13, 2013 3:25 UTC (Fri) by dashesy (guest, #74652) [Link]

Undefined behaviors in C are like a minefield, there are many of them, and programmers learn them by heart only after hitting them hard.
Compiler does not help in teaching about them and reading is not enough because they are very subtle. If warning is issued when an optimization is kicked in, one learns them and can also opt-in the optimization when it is safe to do so.

Checking for overflow

Posted Dec 5, 2013 12:18 UTC (Thu) by epa (subscriber, #39769) [Link]

C desperately needs a standard, safe, and reliable way to check for overflow. Something like

if (would_overflow(buf + len)) { ... }

where would_overflow is a language builtin that returns whether its argument (which must be a simple arithmetic expression) would cause an integer overflow if evaluated.

Built on that primitive you could make macros which evaluate an expression but goto an error case on overflow, etc.

Even expert programmers, trying hard to write safe code, introduce bugs related to integer overflow. The current semantics are just too difficult to use safely.

Just ignore warnings generated inside macros

Posted Dec 6, 2013 8:56 UTC (Fri) by error27 (subscriber, #8346) [Link]

Smatch has a check for the tun.c inconsistent NULL checking bug. Smatch ignores checking that happens inside macros and GCC could do the same thing. (I have heard this is possible in GCC but I haven't looked myself).

Optimization-unstable code

Posted Feb 17, 2015 8:33 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (1 responses)

Optimization-unstable code

Posted Feb 17, 2015 10:49 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

OT: http://thread.gmane.org/robots.txt