LWN.net Logo

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 24, 2013 1:01 UTC (Sun) by iabervon (subscriber, #722)
In reply to: Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks by butlerm
Parent article: Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

The point is that GCC doesn't think you'll actually reach the undefined behavior part of the loop. Say you've got:

static inline int framelen(char s[]) {
  for (int i = 0; i < 256; i++) {
    if (!s[i])
      return i;
  }
  return i;
}

If you call this with a nul-terminated char array, or any array of at least 256 chars, it is well-defined. If you call it with a nul-terminated array of less than 256 chars, it will be faster if it uses an unconditional branch. It assumes that the programmer has some good reason to believe that short arrays are nul-terminated, which the compiler can't necessarily figure out.


(Log in to post comments)

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 3:28 UTC (Tue) by butlerm (subscriber, #13312) [Link]

The difference is that in the original example the compiler can prove that an out of bounds array access will occur under all input conditions. This should be considered an error. Dividing by a constant zero is a similar example. What possible good could come from the compiler just making something up in a situation like that?

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 4:12 UTC (Tue) by iabervon (subscriber, #722) [Link]

It's not actually able to prove that (or, really, it's not set up to consider proving that type of thing). It's actually just making a series of optimizations: first, it assumes that there won't be an out-of-bounds access, then it determines that this means that the loop can't exit normally (like in my example), then it finds that the return is unreachable, then it finds that the value being calculated is unused, then it finds that nothing is needed except for the infinite loop. Each of these optimizations improves the performance of some correct code, and it doesn't have the deeper analysis to notice that it can prove that the loop executes 16 times in violation of the assumption.

It doesn't really have an overall knowledge set that could find contradictions; it's got patterns that produce warnings and patterns that produce optimizations, and so it can't tell when optimizations are leading to total nonsense.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 5:10 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

and the mistake in the chain of reasoning is the very first one where it assumes that the loop variable will never be out of range..

The rest of the optimizations make sense, but that first one is optimistic thinking on the part of the compiler writer.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 8:39 UTC (Tue) by mlopezibanez (guest, #66088) [Link]

No, the assumption is how C programs work. It is in general impossible to tell if there is going to be an out-of-bounds access without checking every access. If you want code that checks that, then wrap every array access in the equivalent of vector.at() and let the compiler try to remove redundant checks.

Of course, GCC could do better at static analysis and warning about such cases, but that is a different problem from optimization, and GCC needs new developers that are interested in such things.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 22:27 UTC (Wed) by HelloWorld (guest, #56129) [Link]

> and the mistake in the chain of reasoning is the very first one where it assumes that the loop variable will never be out of range.
That's not the chain of reasoning. The reasoning is that if the loop variable is out of range, the program's behaviour is undefined, thus not testing the variable is just as valid as testing it or doing something else entirely.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 19:23 UTC (Wed) by HelloWorld (guest, #56129) [Link]

Use the string.h, Luke!
static inline int framelen(char s[]) {
  return strnlen(s, 256);
}
;)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds