Who's afraid of a big bad optimizing compiler?

Posted Jul 17, 2019 7:25 UTC (Wed) by dunlapg (guest, #57764)
In reply to: Who's afraid of a big bad optimizing compiler? by PaulMcKenney
Parent article: Who's afraid of a big bad optimizing compiler?

There was a lot more variety in systems back in the day, including some in which signed integer overflow did interesting things.

But saying, "We'll do an add instruction, but we can't really guarantee what the processor will do" is a completely different thing than saying "Since we don't make any promises about what add does in this circumstance, we'll do something surprising and dangerous which corrupts data and allows people to exploit your system."

The GPL literally says that it makes no warranties about fitness for a particular purpose; but that doesn't give software authors the right to make software that behaves however they want. Imagine if the response to all bug reports was, "Our license literally says that we make no guarantees. This behavior is more convenient for us, so just put up with it." It's a preposterous position to take.

Who's afraid of a big bad optimizing compiler?

Posted Jul 17, 2019 14:31 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (6 responses)

For a given computer system, yes, the results of an add instruction should be defined. However, even hardware has the equivalent of undefined behavior, especially in supervisor mode.

But the standard needs to say what will happen in on all systems, including systems that have not yet been defined. Me, I join epa in preferring implementation-specified behavior to undefined behavior in this case, but either way, portable software would need to avoid signed-integer overflow. So one can argue that for a lot of software, there isn't much practical difference between implementation-specified behavior and undefined behavior.

But there is a lot of software that isn't intended to be universally portable!

Who's afraid of a big bad optimizing compiler?

Posted Jul 17, 2019 14:46 UTC (Wed) by dunlapg (guest, #57764) [Link] (5 responses)

For a given computer system, yes, the results of an add instruction should be defined.

Not sure what it is that you're replying to. I never said anything about the results of an add instruction being defined; I'm quibbling with the range of actions that compiler authors seem to think "undefined" allows them to do.

"Since we don't know what hardware will do on signed integer overflow, we can't promise signed integer overflow will work in any particular way" is reasonable. "Since we don't promise signed integer overflow works in any particular way, we are free to make signed integer overflow behave in completely pathological ways" is not.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 1:54 UTC (Thu) by brooksmoses (guest, #88422) [Link] (1 responses)

From the point of view of a compiler implementer, here's the fundamental problem with that: The underlying logic here is that the compiler can deduce that this signed integer operation will not overflow (because the C program would be ill-formed if it did) and can use that deduction to make optimization choices, but the compiler is not anywhere near smart enough to distinguish between the times that the result is benign and the times when it's completely pathological. (I would doubt that humans are that smart either, in the general case, if given the same information the compiler is given.)

You could say that you would prefer that the compiler _never_ make optimization choices based on a deduction of "undefined behavior means that this can never happen", but it turns out that this deduction has been useful in quite a lot of ways (only one of which is making small loops fast) and people in practice turn out to object strenuously to significant performance regressions from their compiler.

Beyond this, a problem is that compiler optimization engines generally don't have an idea of "I know this information only for purposes of this optimization but not that one", and defining this would require a quite involved "tainted" model to know what information is allowed when. Without that knowledge, the compiler can't distinguish between logical statements that it learned from "this is required for the program to be well-formed" and "this has to be true for the program to be on this code path" or other entirely solid things.

The pathological cases are almost always the result of entirely-reasonable small pieces of logic combining to create something that wasn't expected by either the compiler author or the code author, not something that was intentionally designed in, and determining a small piece of logic to eliminate to prevent the pathological behavior without preventing useful behavior is quite difficult, if not impossible.

Who's afraid of a big bad optimizing compiler?

Posted Jul 20, 2019 13:31 UTC (Sat) by anton (subscriber, #25547) [Link]

You could say that you would prefer that the compiler _never_ make optimization choices based on a deduction of "undefined behavior means that this can never happen", but it turns out that this deduction has been useful in quite a lot of ways (only one of which is making small loops fast) and people in practice turn out to object strenuously to significant performance regressions from their compiler.

You are insinuating that not performing such "optimizations" produces significant performance regressions. Funnily, for all their talk about the performance advantages of such "optimizations", the advocates of "optimizations" hardly ever produce numbers (and the one time I have seen numbers, they were from an unnamed source for another compiler). One would expect that, if there were significant performance benefits from "optimizations", their advocates would parade them up and down.

If there is no significant performance benefit from "optimizations", there is also no need to keep any "knowledge" around that is derived from the assumption that undefined behaviour does not occur.

In any case, the better approach would be to give optimization hints to programmers. That requires changes in just a few places instead of "sanitizing" the whole program all the time, and the consequences of not doing it are far less severe. E.g., if the use of wraparound int for a local variable would lead to sign-extension operations in hot code, the compiler could suggest to change that local variable to, say, (wraparound) long.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 9:01 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

> Since we don't know what hardware will do on signed integer overflow

You're writing C for a C abstract machine. That machine has undefined behavior for signed integer overflow. Undefined behavior in C means "all bets are off". If this is not what you want, don't use C, convince compilers to provide a "sensible C" dialect, or change C via the standards committee.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 17:48 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

In the particular case of signed integer overflow, of course, compilers already provide two "sensible C" dialects (gcc calls them -fwrapv and -ftrapv).

Who's afraid of a big bad optimizing compiler?

Posted Jul 22, 2019 12:06 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

The C standard uses an abstract machine to define the semantics of the language. But real code is certainly not written to run on "abstract machines" and the C standard also contains the nice statement that "a conforming program is one acceptable to a conforming implementation". That's not the same as a strictly conformimg program (another term from the C standard) which would restrict itself to the functionality defined in this document.