User: Password:
|
|
Subscribe / Log in / New account

Since when does GCC *assume* the program to be correct?

Since when does GCC *assume* the program to be correct?

Posted Apr 17, 2008 20:39 UTC (Thu) by brouhaha (subscriber, #1698)
In reply to: Since when does GCC *assume* the program to be correct? by nix
Parent article: GCC and pointer overflows

My point exactly. If the assumption is only that the program is written in C, but NOT that it is a correct C program, then it isn't reasonable to assume that the program's use of pointers meets the constraint defined by the standard, in which a pointer will never be manipulated to point outside its object.

That would be a fine optimization to use when you choose some high level of optimization that includes unsafe optimizations, but it shouldn't happen at -O1.


(Log in to post comments)

Since when does GCC *assume* the program to be correct?

Posted Apr 17, 2008 21:45 UTC (Thu) by nix (subscriber, #2304) [Link]

Something which invokes undefined behaviour according to the standard is, 
in a sense, not written in C. Of course any useful program will do things 
*beyond* the standard (e.g. relying on POSIX); but relying on things the 
like pointer wraparound is a rather different matter. (Who the hell would 
ever intentionally rely on such things anyway? It pretty much *has* to be 
a bug to have such code.)

Since when does GCC *assume* the program to be correct?

Posted Apr 17, 2008 22:58 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

There are three possibilities:
  1. GCC does not make assumptions about the validity of the input
  2. GCC assumes that the input is a C program, but not necessarily a valid one (according to the C standard)
  3. GCC assumes that the input is a valid C program
The GCC maintainers claim that the optimization is legitimate because GCC assumes that the input is a valid C program (choice 3). If they are willing to make that assumption, then they shouldn't need any error checking whatsoever.

A compiler assuming that the input is a valid program is counter to my expectations as a user of the compiler. Unless I explicitly turn on unsafe optimizations, I don't expect it to optimize away any comparisons that I've written, unless it can prove that the condition will always have the same result, based on my actual program code, not on what the C standard says a valid program may or may not do.

I have no problem at all with having such optimizations enabled as part of the unsafe optimizations at higher -O option levels.

Since when does GCC *assume* the program to be correct?

Posted Apr 18, 2008 11:18 UTC (Fri) by nix (subscriber, #2304) [Link]

The problem is that the boundary between 'a C program, but not necessarily 
a valid one' and 'not a C program' is questionable.

if (a + foo < a) is testing something which, in any C program conforming 
to the Standard without really weird extensions, must be false. This is 
every bit as true as if (sizeof (a) < 1) would be. If it decided that, oh, 
that could be true after all, it's choosing an interpretation which the 
Standard forbids.

... and if the compiler starts accepting that, what other non-C programs 
should it start accepting? Perhaps we should spell 'while' differently 
when the locale is de_DE? Perhaps `mary had a little lamb' is now defined 
as a valid C program?

(Sure, compilers can *do* all this, and GCC does have some extensions 
chosen explicitly because their syntax is invalid Standard C --- the 
statement-expression extension springs to mind --- but the barrier to new 
extensions is very high these days, and in any case that doesn't mean that 
*anything* people do wrong should be defined as a language extension, 
especially not when it's as weird and devoid of practical utility as this. 
Portability to other compilers is important.)

Since when does GCC *assume* the program to be correct?

Posted Apr 18, 2008 19:15 UTC (Fri) by brouhaha (subscriber, #1698) [Link]

There's a big difference between the (a + foo < a) and (sizeof (a) < 1) cases, which is that the former is something that a programmer is likely to code specifically because he or she knows that the program might (unintentionally) be buggy, in an attempt to catch a bug, while the latter is unlikely to occur at all, and certainly not as something a programmer is likely to deliberately test for.
If it decided that, oh, that could be true after all, it's choosing an interpretation which the Standard forbids.
Yet which can actually quite easily happen in real programs. NOWHERE does the standard say that a compiler has to optimize away tests that might always have a fixed value for valid programs, but might easily have differing values for buggy programs.
Perhaps we should spell 'while' differently when the locale is de_DE?
You've lost me here. I don't see any way that a purely semantic error in a program could result in "while" being misspelled, even if the locale is de_DE.

Since when does GCC *assume* the program to be correct?

Posted Apr 18, 2008 16:19 UTC (Fri) by viro (subscriber, #7872) [Link]

Standard is a contract between authors of program and authors of
C implementation; if program invokes undefined behaviour, all bets
are off.  It is allowed to compile the check in question into "check
if addition had triggered an overflow; if it had, before bothering
with any comparisons do unto luser what Simon would have done
on a particulary bad day".

It can also turn that into "if addition overflows, take the value of
first argument".  And optimize according to that.

It's not a matter of optimizing your comparisons away; it's a matter
of addition having no prescribed semantics in case of overflows,
regardless of optimizations.

Since when does GCC *assume* the program to be correct?

Posted Apr 18, 2008 21:00 UTC (Fri) by nix (subscriber, #2304) [Link]

Well said. Also, while in some cases it is a QoI issue which high-quality 
implementations will in some cases prescribe useful semantics for, this 
isn't such a case. I can't think of any particularly useful semantics for 
pointer wraparound, especially given that distinct objects have no defined 
nor stable relationships with each other anyway. Operating under the rules 
of modular arithmetic might have been nice, and perhaps a more recent 
language would define that...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds