Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
Posted Jul 17, 2009 15:55 UTC (Fri) by forthy (guest, #1525)
The question is "is that reasonable"? And what is "undefined behavior"?
The GCC maintainer (the language talibans) argue that whatever the C99
standard defines not accurately (and that's pretty much all of the it ;-)
is left open "undefined". I suppose that if x+n >= x holds true, than x+n
really ought to be greater than x - which is not true when compiled
by GCC. Same for dereferencing the NULL pointer: This is not defined in C,
but being not defined does not say "it breaks". When it may not break,
optimizing the test away is simply wrong. If GCC's optimizer would work a
bit differently, it would reorder the access and the test - because the
test causes the function to return without using the accessed field (this
value is dead, and dead loads can be optimized away) - and then by the
funny logic the test is both vital and can be optimized away ;-).
But I've basically given up on the GCC maintainer to do reasonable
things, anyway. There interpretation of standards is just upside down: A
standard is a compromise between various implementers and users, so that
different qualities of implementations can claim they are "standard". It's
like the POSIX discussion we had some times ago here (ext4 problems) where
POSIX allowed a file system to loose data and leave the filesystem in a
limbo state. This is not a good implementation if it does so.
Standard writers can drive towards good implementations while still
allowing bad ones: Use the word "should" instead of "shall". Like "a file
system should preserve a consistent state in case of a crash" - this means
"best effort", and the amount of reasonable effort is left to the
implementer. A C compiler should wrap around numbers on a two's complement
system, not trap, not crash. The code should honor the execution model of
the underlying machine (which e.g. can dereference null pointers). And
making a language defined only in vague terms is not a good idea - because
writing programs in an underspecified language won't work. The "good"
practice of implementing such an underspecified language is to define the
terms properly and then stick to them - and hope the practice catches on,
so that the next round of standardization effort can encode them with
"shall" instead of "should".
Posted Jul 17, 2009 16:29 UTC (Fri) by bluebirch (guest, #58264)
"The reason some behavior has been left undefined is to allow compilers for a wide variety of instruction set architectures to generate more efficient executable code for well-defined behavior, which was deemed important for C's primary role as a systems implementation language;" (from Wikipedia)
Relying on undefined behaviour means the programmer needs to fully understand the (common) implementation. In any case the code is easier to understand if it doesn't rely on this magic. E.g.:
INT_MAX - x >= n // instead of x + n > x
Posted Jul 17, 2009 19:18 UTC (Fri) by ehabkost (guest, #46058)
Posted Jul 17, 2009 19:38 UTC (Fri) by foom (subscriber, #14868)
Actually no, the idea is slightly different...
Posted Jul 18, 2009 0:57 UTC (Sat) by khim (subscriber, #9252)
GCC would be within its rights to automatically exec NetHack
whenever you dereference a null pointer. :) But instead it chooses to
assume that the program will crash, and optimizes the rest of the program
No, no, no. Nothing of the sort. Idea is different and it's somewhat
1. Behavior is undefined so program can do anything it wants. It can
destroy the world for god's sake!
2. Program which can destroy the world is pretty useless so obviously
people will not write such program.
3. Ergo program is withing defined behavior. Somehow. Compiler does not
need to guarantee that - it's programmer's responsibility.
4. This means some other part of program checks the pointer for NULL (even
if compiler has no idea which one).
5. And that means the next check is redundant and can be removed.
That's why it's so hard to undertand for the outsider the discussion
which goes in cyrcles when GCC developers talk with normal users:
A. This result is totally ridiculous - fix it!
B. This is undefined behavior - fix your program. WONTFIX.
A. What do you mean "undefined behavior"? It introduces security bugs.
B. This is undefined behavior - fix your program. WONTFIX.
A. Argh. This is all just stupid: how can you even imagine such behavior?
B. This is UNDEFINED behavior - fix your program. WONTFIX.
GCC developer really don't care what happens to the program with
undefined behavior. Not one jot. What happens - will happens. ICE, crash,
whatever. The programmer must ensue his (or her) program does not contain
undefined constructs - then and only then it's time to complain.
Note: not all behaviors come from C standard. Some come from other
standards, some come from descussions from in mailing lists (for example if
you go with C standard it becomes impossible to write multithreaded
programs so there are some additiona guarantees invented by GCC
developers). But if you agree that something is "undefined behavior" then
the resolution WONTFIX comes automatically.
Posted Jul 18, 2009 7:03 UTC (Sat) by ABCD (subscriber, #53650)
Posted Jul 18, 2009 10:58 UTC (Sat) by nix (subscriber, #2304)
(Undefined behaviour isn't a magic word. It means *exactly what it says*.)
Posted Jul 19, 2009 0:12 UTC (Sun) by xilun (subscriber, #50638)
Do you run all of your programs in production continuously under Valgrind? Because the way you (incorrectly) intrepret the spirit of the standard, I think you should, for your security. Memory is cheap and CPU are fast anyway.
Posted Jul 19, 2009 0:36 UTC (Sun) by dlang (✭ supporter ✭, #313)
deciding to overwrite the hard drive, run nethack, etc may technically qualifies as undefined, but is defiantly not reasonable.
deciding to 'optimize away' a check immediately afterwards that checks if the pointer is null (prior to the resulting variable being used) is not reasonable.
Posted Jul 19, 2009 1:22 UTC (Sun) by xilun (subscriber, #50638)
"deciding to 'optimize away' a check immediately afterwards that checks if the pointer is null (prior to the resulting variable being used) is not reasonable"
In the name of what?
You could abritrarily disallow a lot of optimisations with such edict.
Everybody that actually have read the C standard know that it perfectly allow such optimisation. See 22.214.171.124. See 126.96.36.199 (note)
Refraining from making optimisation just because it could make buggy program buggier is not reasonable. Just fix the buggy program. Or in some limited cases, explicitely use the flag that the GCC maintainers kindly provide you, so that even if your program is not strictly conforming, it remains conforming given this particular compiler and compilation option.
Posted Jul 17, 2009 19:43 UTC (Fri) by bluebirch (guest, #58264)
The correct behaviour is
, well, undefined. So opening a security hole is no less correct (by definition) than crashing or causing "demons to fly out of your nose".
Posted Jul 17, 2009 20:58 UTC (Fri) by stevenb (guest, #11536)
When GCC makes that the default behavior, there will be others bashing GCC for not optimizing away an obvious unnecessary null-pointer check.
Whatever GCC does, there will always be folks around here and everywhere else who disagree with it. GCC bashing is just the favorite hobby of the entire FOSS developer community, it seems. Just sad...
Posted Jul 18, 2009 9:04 UTC (Sat) by Ross (subscriber, #4065)
So your other question is who defines what is defined. The C standard defines what things trigger undefined behavior and it is the obligation of the program to avoid them. It's like a contract between programs and the compiler. In a few places specific compilers may define behavior in some of those cases as extensions, but there is no such extension here. Any program which fails to avoid dereferencing of NULL pointers and cares about the subsequent execution of the program is buggy, and blaming the compiler for the problem is not reasonable.
Posted Jul 19, 2009 0:01 UTC (Sun) by xilun (subscriber, #50638)
If a programer wants to write non portable code, by using such vendor specific extensions, he can. Such constructs indeed exists between Linux and GCC, but not in this case.
If no such extension exists and some code triggers an undefined behavior, then is just a bug in this code. It can be because of a false assumption of the programmer about what the standard says (so the programmer thinks a program is portable and correct on some aspect while it is indeed not) or because of a bug (seems to be the case here). In NO way this is a bug of the compiler. In NO way this is an indication the compiler is bad. There are more people than you suspect that value compilers that produces fast binaries, thanks of optimisations completely allowed by the standard.
As long as the compiler does not pretend to support undefined behaviors it indeed does not support, this is reasonable. This is indeed the very way to use standardized technology : do not rely on non standard features unless you are concious this is non portable and sure the way you use it is supported by the implementation.
And about NULL pointer dereferencing, the unreasonable behavior is indeed to think it can be dereferenced under special circonstances. There is no good reason to do that. The very purpose of the NULL pointer is to designate that something does NOT point to a valid object. ("If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compater unequal to a pointer to any object or function".) The standard even says that NULL is an _invalid_ value for dereferencing a pointer.
As for "A C compiler should wrap around numbers on a two's complement system", this is not in the standard, and this is indeed NOT what a lot of people think it should do when the alternative is an optimisation. This clearly can't be portable as C is not limited to 2 complement computers, and this is not needed even on 2 complement computers as you can cast to (unsigned) and then back to (int) if you really do want 2-complement behavior on such computers, with the additionnal advantage that such construct triggers future readers of the code into thinking there can be an overflow there the original programmer has thought about.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds