Sure, but then you deserve what you get...
Posted May 24, 2011 11:20 UTC (Tue) by farnz
In reply to: Sure, but then you deserve what you get...
Parent article: What Every C Programmer Should Know About Undefined Behavior #3/3
But look at comment 10 to that bug; Eric's not a fool, and yet his mental model of the C89 virtual machine tells him that the following code has implementation-defined semantics, not undefined semantics. In particular, he assumes that i will overflow in a defined fashion, although the final value is not predictable:
int i = INT_MAX;
int *location = some_sane_value;
for( j = 0; j < 100; ++j )
location[j] = i++;
So, the question is why does Eric think that way? I would suggest that one reason is that an assembly language equivalent does have well-defined semantics (using an abstract machine that's a bit like ARM):
MOV R0, #INT_MAX
ADR R1, some_sane_value;
MOV R2, 0
MOV [R1 + R2 * 4], R0
ADD R0, R0, #1
ADD R2, R2, #1
CMP R2, #100
In this version of the code, which is roughly what an intuition of "C is a high level assembly language" would compile the source to,
ADD R0, R0, #1 has defined overflow semantics; further, exiting the loop depends on the final value of R2, not on the value of R0. The surprise for Eric is twofold:
- The compiler has chosen to elide j, and exit the loop when i reaches its final value.
- Because i is signed, i's final value is undefined, thus the compiler never exits the loop.
If i was unsigned, and we changed INT_MAX to UINT_MAX, Eric would probably still have been surprised that his loop compiled to something like:
MOV R0, #UINT_MAX
ADR R1, some_sane_value
MOV [R1], R0
ADD R0, R0, #1
ADD R1, R1, #4
CMP R0, #UINT_MAX + 100
Assuming I'm right in thinking that it's the mental model caused by "C is a high level assembly language" that's breaking things, we have an open question: how do we change the way developers think about C such that perfectly correct compilers don't surprise them?
to post comments)