|
|
Log in / Subscribe / Register

What Every C Programmer Should Know About Undefined Behavior #1/3

The LLVM project blog has the beginning of a three-part series on undefined behavior in C. "Undefined behavior exists in C-based languages because the designers of C wanted it to be an extremely efficient low-level programming language. In contrast, languages like Java have eschewed undefined behavior because they want safe and reproducible behavior across implementations, and willing to sacrifice performance to get it. While neither is 'the right goal to aim for,' if you're a C programmer you really should understand what undefined behavior is."

to post comments

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 15:48 UTC (Thu) by xav (guest, #18536) [Link] (2 responses)

Looks like there's a little typo in the last example. The object which is supposed to be unchanged by the store is "i", not "P".
Too bad it's not possible to comment on the blog.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 16:27 UTC (Thu) by cbcbcb (subscriber, #10350) [Link] (1 responses)

The example is correct. i is known not to alias because it is a local variable and its address has not been taken. However, without C's type aliasing rules, a store to P[i] could overwrite P.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 16:42 UTC (Thu) by xav (guest, #18536) [Link]

Oh, then not being able to comment was a good thing, I only embarassed myself here ...

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 16:40 UTC (Thu) by iabervon (subscriber, #722) [Link]

A certain amount of the undefined behavior in C is of mostly historical utility to the compiler. For example, making the values of uninitialized local variables undefined was a big win at the beginning, because it avoided making the compiler initialize the 99% of local variables that get written before they are read. But these days, compilers pretty much always analyze whether a variable gets written first, and give warnings in that case (because the warnings are generally considered to be helpful, at least when the compiler is good at the analysis, and because it is no longer too computationally expensive to do this analysis while compiling), and they could simply initialize the 1% of local variables they can't prove don't need it.

For that matter, programs generally end up initializing stack-allocated structures with bzero() because they don't know whether the compiler is inserting padding and they need to avoid leaking data in between the structure fields. In this case, you've actually got a reversal of which party (between the compiler and the programmer) has the information necessary to decide whether something needs to be initialized.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 19:24 UTC (Thu) by aliguori (guest, #30636) [Link] (25 responses)

The example around integer overflow scared me. It had:

 if (X <= (X + 1)) //  compiler always optimizes to false

But this is a pretty common check for exploitable integer overflows. You may do something like:

 if (size <= (size + 1)) {
    // integer overflow attack!
 } else {
    string = malloc(size + 1);
    read(fd, string, size);
    string[size] = 0;
 }

But the article seems to suggests that LLVM would think it's okay to eliminate the if clause while still presumably letting the hardware overflow the integer. That effectively bypasses the security check.

Am I missing something?

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 19:39 UTC (Thu) by josh (subscriber, #17465) [Link]

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 19:47 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (8 responses)

That's basically correct. The result of an integer overflow is undefined, but this check relies on specific overflow behavior. The compiler is free to ignore the possibility of overflow in generating the code. (I'm not sure whether LLVM or any other modern compiler actually would do so, however.)

A better check for overflow in (a + b) would be (a > (MAX_VALUE - b)), where MAX_VALUE is a type-specific constant, e.g. SIZE_MAX. This detects the possibility of an overflow without actually causing one.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 19:56 UTC (Thu) by jzbiciak (guest, #5246) [Link] (5 responses)

That's only safe if b is positive...

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 20:28 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

That's correct. I should have qualified the statement as "A better check for overflow *in unsigned arithmetic*...". The original example was using size_t, which is defined as an unsigned integer type.

This should work for signed addition, though there's probably an easier way:

(b >= 0) ? (a < (MAX_VALUE - b)) : (a >= (MIN_VALUE - b))

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 23:58 UTC (Thu) by jreiser (subscriber, #11027) [Link] (3 responses)

"unsigned overflow" is a strange concept. Usually that's CarryOut [and no error, else multi-precision arithmetic is in big trouble.]

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 0:13 UTC (Fri) by jzbiciak (guest, #5246) [Link] (2 responses)

He's referring to checking for signed overflow using unsigned arithmetic. I'm pretty sure C guarantees that an "unsigned int" can hold all the same non-negative values as a "signed int", so this seems like a reasonable thing to do.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 0:38 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (1 responses)

Actually, I was referring to unsigned overflow, e.g. computing (SIZE_MAX + 1) with a size_t result, as in the original example. No signed values in sight. Of course, you can use the same formula to check for overflow in signed addition provided at least one of the values is guaranteed to be non-negative.

Overflow ("the condition that occurs when a calculation produces a result that is greater in magnitude than that which a given register or storage location can store or represent" <http://en.wikipedia.org/wiki/Arithmetic_overflow>) does not only apply to signed integers. Since C does not define any standard way to access the carry bit, any result which (before truncation) would be too large to fit in the result type must be considered an overflow, whether the result type is signed or unsigned.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 10:05 UTC (Fri) by ncm (guest, #165) [Link]

Unsigned overflow (2s complement wraparound) is well-defined. The compiler is not allowed to make unbounded-arithmetic assumptions about unsigned expressions. So, it's just signed expressions where you need to worry about this.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 20:59 UTC (Thu) by ballombe (subscriber, #9523) [Link]

> (I'm not sure whether LLVM or any other modern compiler actually would do so, however.)
gcc -O3 does such optimisation.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 12:52 UTC (Fri) by ukleinek (subscriber, #56625) [Link]

> (I'm not sure whether LLVM or any other modern compiler actually would do so, however.)
gcc does starting with -O2:
$ gcc --version
gcc (Debian 4.4.5-8) 4.4.5
...

$ cat test.c 
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>

int main(int argc, char **argv)
{
        int base;

        printf("INT_MAX = %d\n", INT_MAX);
        scanf("%d", &base);

        if (base < base + 1)
                printf("%d < %d\n", base, base + 1);

        return EXIT_SUCCESS;
}

$ gcc -O2 test.c
$ ./a.out 
INT_MAX = 2147483647
2147483647
2147483647 < -2147483648

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 19:48 UTC (Thu) by suokko (guest, #74887) [Link]

To me it sounds like only standard way to write the check would be

if (X <= typeof(X)((unsigned long long)X + 1)

if (X == MAX_<what ever type X is>)

But according to C standard optimizing the original check out is allowed if typeof(X) is signed. But I would suspect in many cases typeof(X) is unsigned making code correct.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 20:18 UTC (Thu) by scottwood (guest, #74349) [Link] (7 responses)

Only signed integer overflow is undefined. Use unsigned types for sizes and other situations where you need to do this sort of check.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 8:06 UTC (Fri) by epa (subscriber, #39769) [Link] (6 responses)

Use unsigned types for sizes and other situations where you need to do this sort of check.
Unfortunately, the type conversion rules between signed and unsigned are crazy and silent, so mixing signed and unsigned ints is asking for trouble.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 1:31 UTC (Sat) by wahern (subscriber, #37304) [Link] (5 responses)

So don't mix them. The vast majority of C code doesn't need signed arithmetic. Most C code (like most code in general, really) is merely transforming data, not crunching complex formulas. It's rare to need to index an array by -7, for example. This is why the language permits taking a pointer to array[countof(array)] (one past the last element), but not array[-1]. (You can't dereference the +1 element, but the pointer is valid. &array[-1] is undefined behavior, and some instrumenting compilers generate a fault, so iterating backwards through an array using pointers can be tricky.)

Also, the time when you need signed arithmetic is exactly the time when you should be paying extremely close attention to implicit type conversions and other issues. Thus, the sensible thing to do is use unsigned types by default, and only used signed when you need it. Over all it makes code safer.

Unfortunately, int is only 3 letters. unsigned is more than twice the length, and size_t has as an ugly underscore. So we can make predictions about which is going to get used more without having to know anything about computer programming.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 8:05 UTC (Sat) by epa (subscriber, #39769) [Link]

Yes - existing code heavily uses 'int' so if you decide to be the one using 'unsigned int' you're probably going to trip up.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 12:15 UTC (Sat) by juliank (guest, #45896) [Link] (3 responses)

> Thus, the sensible thing to do is use unsigned types
> by default, and only used signed when you need it

My rule is the other way around, as unsigned only
leads to surprises such as

typeof(u1 - u2) = unsigned

which is problematic if you store 3u - 5u in an int and expect -2.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 21:50 UTC (Sat) by wahern (subscriber, #37304) [Link] (2 responses)

Why would you ever do that? Outside of a mathematical formula that's pretty messed up code. And inside of a mathematical formula, you're culpable for any "surprises" because unless you pay extremely close attention to not only type but also width, you're setting yourself up for trouble.

When you're dealing with data and object properties, negative numbers are near meaningless.

More importantly, converting a value with unsigned type to signed is undefined behavior. On the other hand, converting signed to unsigned is well defined. So it's easier to move from signed to unsigned than the reverse.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 15, 2011 9:45 UTC (Sun) by juliank (guest, #45896) [Link] (1 responses)

> Why would you ever do that?
Imagine a value which must never be less than zero. Now, I decrease the values multiple times in the program. Having it signed allows me to do assert(i >= 0) and thus prevents bugs. Furthermore, it makes working with other code using signed integers easier.

> When you're dealing with data and object properties,
> negative numbers are near meaningless.

Mostly, I use all those symbolic typedefs. I mostly use unsigned integers (size_t) for counting array elements (or indexing arrays, and for loops over arrays) or object sizes; sometimes ssize_t for string lengths so users can specify -1 and then the function calls strlen(). For file sizes, file offsets, and similar; I use the off_t type (or really GLib's goffset).

> converting a value with unsigned type to signed is undefined behavior
It's not undefined. It is well defined to be unchanged for any unsigned value fitting in the signed type [6.3.1.3 (1)]; and implementation-defined if it does not fit into the signed type [6.3.1.3 (3)].

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 15, 2011 22:15 UTC (Sun) by njs (subscriber, #40338) [Link]

> Imagine a value which must never be less than zero. Now, I decrease the values multiple times in the program. Having it signed allows me to do assert(i >= 0) and thus prevents bugs

You could equivalently say assert(i < (1 << sizeof(i)/2)), of course. Or use a different guard value if you don't want to reserve exactly half of your integer's range for detecting overflow/underflow.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 21:21 UTC (Thu) by epa (subscriber, #39769) [Link] (5 responses)

Yes, it's such a total pain in the arse. The lack of safe, checked integers in C is a shame. (Not saying that all 'int' variables should sprout runtime checks, just that basic Ada-style bounded integers should be simple to use for those who want them.)

In C++ you can write a simple class to provide signed integers with defined overflow semantics, or throwing an exception on overflow, or even something fancy which does the checking for possible overflows at compile time and provides explicit 'unsafe' functions for ops that may overflow. But I don't think any such class has become established practice in the same way as, for example, std::string.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 5:33 UTC (Fri) by cpeterso (guest, #305) [Link] (4 responses)

> In C++ you can write a simple class to provide signed integers with defined overflow semantics

As an exercise, I wrote a C++ "Integer" class for a safe int. Unfortunately, the class grew to almost 100 LOC once I tried covering all the overloaded operators (with both left- and right-hand sided versions) and implicit type conversions of my class to and from other C types. I gave up. :(

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 10:08 UTC (Fri) by ncm (guest, #165) [Link]

One of the best features of a numeric class is that you can eliminate (almost?) all the automatic conversions. Start over and define only the helpful stuff.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 11:16 UTC (Fri) by MisterIO (guest, #36192) [Link]

Well, 100 LOC doesn't really seem that much to me, at least not for a usefull class!

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 19:02 UTC (Mon) by PaXTeam (guest, #24616) [Link] (1 responses)

even if one may not like the Ms-PL, it may still be worth to look at http://safeint.codeplex.com/ .

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 21:11 UTC (Mon) by zooko (guest, #2589) [Link]

Or, depending on how one likes Google, there is http://code.google.com/p/safe-iop/

And here is my own contribution: http://tahoe-lafs.org/trac/libzutil/browser/trunk/libzuti...

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 20:03 UTC (Thu) by jzbiciak (guest, #5246) [Link] (18 responses)

One of my "favorite" run-ins with undefined behavior occurred when porting Doom to one of our DSPs as a test case. For some reason, the built in demo sequence in the WAD would diverge, resulting in a broken demo. Everything else seemed to be running OK.

I eventually traced it to this idiom repeated multiple places in the code:

x = P_Random() - P_Random()

The P_Random() function returned a predefined sequence of unsigned uniformly distributed 'random' numbers, and the idiom was grabbing two to subtract them to get a signed random value with a triangular distribution.

Turns out nearly every other compiler that Doom compiles with generates code equivalent to this:

a = P_Random(); x = a - P_Random();

Our compiler, on the other hand, generated code equivalent to:

a = P_Random(); x = P_Random() - a;

Boy was that fun to track down...

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 12, 2011 22:20 UTC (Thu) by daglwn (guest, #65432) [Link] (17 responses)

That's not undefined behavior. I believe it is technically implementation-defined behavior because the standard does not specify the order of operations between sequence points. The compiler is not allowed to format your hard drive in this case. :)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 0:05 UTC (Fri) by jzbiciak (guest, #5246) [Link] (16 responses)

Actually, it's neither. It's "unspecified". Now, had the P_Random() calls been macros with side effects as opposed to function calls, then it'd be undefined. The sequence points "saved" us from being undefined.

What that means is that the implementation can choose what to do, and doesn't have to tell you its rule for choosing. But, you're correct, it can't go all system("nethack") on you.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 0:29 UTC (Fri) by ras (subscriber, #33059) [Link] (14 responses)

It's funny how the mind works, isn't it?

If P_random() had been pop_item_off_list() which obviously has side effects that alter the result of the next call, most experienced C programmers would recoil in terror. Actually not just C programmers. In the majority languages the order of evaluation with an expression is undefined. But P_random() - it looks so innocent.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 2:51 UTC (Fri) by nlucas (subscriber, #33793) [Link] (10 responses)

It's a bit late at night now, but it took me about 5 minutes staring at the code until I understood the bug.

I think the talk about signed and unsigned distracted me. Took too long looking for any signed/unsigned conversions problems until I finally got it, after re-reading the first paragraph.

That's a bug to remember for some time...

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 5:24 UTC (Fri) by cpeterso (guest, #305) [Link] (5 responses)

Yes! I too spent 5 minutes hunting for signed/unsigned conversions or underflow problems until I re-read your comment about re-reading jzbiciak's first paragraph.

Hint: the problem arises from the different order of operations rearranging the predefined sequence of "random" numbers.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 5:45 UTC (Fri) by jzbiciak (guest, #5246) [Link] (4 responses)

Hint: the problem arises from the different order of operations rearranging the predefined sequence of "random" numbers.

That reminds me, the random number generator was pretty awesome in its own right. It's the kind of random number generator that both XKCD and Dilbert can weigh in on, even without meaning to!

unsigned char rndtable[256] = {
    0,   8, 109, 220, 222, 241, 149, 107,  75, 248, 254, 140,  16,  66 ,
    74,  21, 211,  47,  80, 242, 154,  27, 205, 128, 161,  89,  77,  36 ,
    95, 110,  85,  48, 212, 140, 211, 249,  22,  79, 200,  50,  28, 188 ,
    52, 140, 202, 120,  68, 145,  62,  70, 184, 190,  91, 197, 152, 224 ,
    149, 104,  25, 178, 252, 182, 202, 182, 141, 197,   4,  81, 181, 242 ,
    145,  42,  39, 227, 156, 198, 225, 193, 219,  93, 122, 175, 249,   0 ,
    175, 143,  70, 239,  46, 246, 163,  53, 163, 109, 168, 135,   2, 235 ,
    25,  92,  20, 145, 138,  77,  69, 166,  78, 176, 173, 212, 166, 113 ,
    94, 161,  41,  50, 239,  49, 111, 164,  70,  60,   2,  37, 171,  75 ,
    136, 156,  11,  56,  42, 146, 138, 229,  73, 146,  77,  61,  98, 196 ,
    135, 106,  63, 197, 195,  86,  96, 203, 113, 101, 170, 247, 181, 113 ,
    80, 250, 108,   7, 255, 237, 129, 226,  79, 107, 112, 166, 103, 241 ,
    24, 223, 239, 120, 198,  58,  60,  82, 128,   3, 184,  66, 143, 224 ,
    145, 224,  81, 206, 163,  45,  63,  90, 168, 114,  59,  33, 159,  95 ,
    28, 139, 123,  98, 125, 196,  15,  70, 194, 253,  54,  14, 109, 226 ,
    71,  17, 161,  93, 186,  87, 244, 138,  20,  52, 123, 251,  26,  36 ,
    17,  46,  52, 231, 232,  76,  31, 221,  84,  37, 216, 165, 212, 106 ,
    197, 242,  98,  43,  39, 175, 254, 145, 190,  84, 118, 222, 187, 136 ,
    120, 163, 236, 249
};

int rndindex = 0;
int prndindex = 0;

// Which one is deterministic?
int P_Random (void)
{
    prndindex = (prndindex+1)&0xff;
    return rndtable[prndindex];
}

int M_Random (void)
{
    rndindex = (rndindex+1)&0xff;
    return rndtable[rndindex];
}

void M_ClearRandom (void)
{
    rndindex = prndindex = 0;
}

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 12:21 UTC (Mon) by sdalley (subscriber, #18550) [Link] (3 responses)

Surely there is *no* external difference between M_Random() and P_Random() ?

The functions look deterministic as well, since at no point do the signed integers rndindex and prndindex have nonpositive numbers in them. 0xff, since it will convert losslessly to an int is regarded as a plain int of value 256, therefore positive. index+1 can never be more than 256, ergo there are no overflow issues.

Or am I missing something?

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 13:11 UTC (Mon) by jzbiciak (guest, #5246) [Link] (1 responses)

M_Random and P_Random exist to allow different random number streams in different parts of the code. IIRC, P_Random is used in the part of the code that needs to behave 100% deterministically across all builds so that the demo recorder works correctly as well as its network multiplayer support. You can add as many calls to M_Random as you like (such as in menu wipes, etc) without affecting the parts that demo recording and multiplayer rely on.

Doom's demo recorder relies on recording inputs as opposed to recording output. Likewise for multiplayer support--it just broadcasts everyone's inputs, expecting everyone's engines to respond identically.

So, the "P_Random() - P_Random()" thing caused both the demo recorder and network multiplayer support to break. (Never got to test the network multiplayer support, but I'm certain it would've been quite broken if I played our DSP vs. a PC.)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 13:36 UTC (Mon) by jzbiciak (guest, #5246) [Link]

(Never got to test the network multiplayer support, but I'm certain it would've been quite broken if I played our DSP vs. a PC.)

...that is until I fixed it with a function that did the two calls to P_Random in the expected order.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 13:14 UTC (Mon) by jzbiciak (guest, #5246) [Link]

If you're referring to the comment "// Which one is deterministic?"... they both are. :-)

The main difference is the rules applied to how they're used elsewhere in the code, hence the tongue-in-cheek comment.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 9:30 UTC (Fri) by ghane (guest, #1805) [Link] (3 responses)

The last time I programmed was in Fortran 77 (I have stuck to bash since then).

Will someone please take pity on me and explain what the bug is? nlucas, I _did_ stare at it for 5 mins :-)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 10:05 UTC (Fri) by nlucas (subscriber, #33793) [Link] (2 responses)

Putting it simply, imagine P_random() returns a sequence of numbers in order, like 1 on first invocation, 2 on second invocation, etc.

The bug, as pointed below, is that the C standard doesn't specify the order the calls are made to retrieve the temporary values, so it can result in either:

x = 1 - 2 = -1

or

x = 2 - 1 = 1

The game was depending on a specific order for the demo to work as expected.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 10:40 UTC (Fri) by ghane (guest, #1805) [Link] (1 responses)

I am still confused.

Since the function is specifically "random", and the whole point seems to get a triangular random distribution around the Y axis, what difference would the order make?

Even assuming that the PRNG produced predictable values in order of invocation, we would get a mirror image of the "triangle", which would still be a psudo-random triangle.

And one could hardly rely on each random device to always generate the same sequence anyway.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 10:46 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

Doom's demo recorder doesn't record video; it records the player's actions. The playback system uses the game engine but with the player's actions provided by the demo recording instead of user input. If the game engine is taking pairs of pseudorandom numbers to generate a third pseudorandom number by subtraction, and the order in which each pair of numbers emerges from the generator is reversed, then the game engine's behaviour diverge even though the distribution is identical.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 10:37 UTC (Sat) by zorro (subscriber, #45643) [Link] (2 responses)

In the majority languages the order of evaluation with an expression is undefined.

In Java and C# the evaluation order is always left to right. It is very unfortunate that C++ inherited the undefined evaluation order from C.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 16, 2011 21:19 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

It is not "unfortunate": it is more room for optimization as explained in the main article.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 17, 2011 3:12 UTC (Tue) by jzbiciak (guest, #5246) [Link]

Indeed. Java's stack-based interpreter model might not benefit from such optimizations, but a real register-based machines can benefit from evaluating expressions inside-out, for example, since it minimizes the number of temporary values you need to keep around.

In the worst case, left-to-right evaluation requires generating a temporary for every subexpression. Consider this pathological case:

 foo = (a() - (b() - (c() - (d() - (e() - f())))));

You need to compute a(), b(), c(), d(), e() and f() before you can do the first subtract in a language that guarantees left-to-right evaluation. In C, the compiler is free to evaluate e() or f() ahead of the rest.

Before someone points out the commutativity and associativity of addition and subtraction, let me remind you that even C can't re-order the order of floating point operations. So if 'a' through 'f' return floats, you must do e() - f() first.

So, the C rules give the compiler some needed (and obviously useful in this contrived example) flexibility in this case.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 7:11 UTC (Fri) by paulj (subscriber, #341) [Link]

I had to go reread the C99 standard to be sure of whether this was a code or compiler bug. ;) Nice. The standard has an even more convoluted example:

(*pf[f1()]) (f2(), f3() + f4())

"the functions f1, f2, f3, and f4 may be called in any order. All side effects have to be completed before the function pointed to by pf[f1()] is called."

I think the conditions are actually slightly stricter than just the one given on the function pointer: The side effects from any of the function calls must be complete before the beginning of the next, as each function call has a sequence point before it. I.e. the functions could not be called in parallel, as it stands.

Nice example anyway, evil-interview-question kind of stuff. ;)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 7:51 UTC (Fri) by Darkmere (subscriber, #53695) [Link] (3 responses)

The page cannot be found, and appears to be gone from the llvm blog, did anyone store a copy?

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 9:10 UTC (Fri) by mgedmin (guest, #34497) [Link]

I'm seeing links to recent posts on various blogs hosted by Blogger end up missing today.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 12:52 UTC (Fri) by ekonijn (subscriber, #6395) [Link] (1 responses)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 14:52 UTC (Fri) by juliank (guest, #45896) [Link]

> See http://blog.regehr.org/archives/213
Different article

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 9:35 UTC (Fri) by jezuch (subscriber, #52988) [Link]

> Use of an uninitialized variable: (...) This improves performance by not requiring that all variables be zero initialized when they come into scope (as Java does).

In fact, in Java it's true only for class fields. Use of unintialized local variables is fully defined to be a compile-time error.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 22:07 UTC (Fri) by ejr (subscriber, #51652) [Link] (5 responses)

And in case anyone's wondering, ANSI grew a clue and is selling the C standard in electronic form for $19. If you're serious, buy a copy and read it.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 22:27 UTC (Fri) by mpr22 (subscriber, #60784) [Link] (3 responses)

Link? I just looked on their online store and they're still listing it for over $300.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 13, 2011 22:35 UTC (Fri) by ejr (subscriber, #51652) [Link] (2 responses)

Well, it looks like they increased the price to $30 for no good reason. See http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2fI...(R2005)

(I've been in a standards group. Yes, this is a rip-off considering that nearly all the standard-specific work is donated. For some standards, the development cost was covered by the original society, e.g. IEEE, letting ANSI and ISO jump directly to step 3, profit!)

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 11:45 UTC (Sat) by MisterIO (guest, #36192) [Link] (1 responses)

If you have to spend money anyway, wouldn't it be better to just buy a good book like "c: a reference manual" by harbison, steele? Is there anything that book lacks compared to the standard?

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 12:07 UTC (Sat) by juliank (guest, #45896) [Link]

> Is there anything that book lacks compared to the standard?
It's not the standard. And as thus, likely lacks important definitions of things like undefined behavior, unspecified behavior, or implementation-defined behavior.

As an example of non-standard speech, see the Linux manpage for memcpy:

The memcpy() function copies n bytes from memory
area src to memory area dest. The memory areas should
not overlap. Use memmove(3) if the memory areas do overlap.

Totally imprecise description. Especially the part where it says "should not overlap" is not correct, as this is not a recommendation; violating it causes the behavior of the program to be undefined.

What Every C Programmer Should Know About Undefined Behavior #1/3

Posted May 14, 2011 2:08 UTC (Sat) by wahern (subscriber, #37304) [Link]

The ISO drafts are free. I bought a hard copy once, so if I absolutely need to look at the "official" standard, I can, though I'd need to find it first.

Just go to the working group page and download the latest draft C99 draft.

http://www.open-std.org/jtc1/sc22/wg14/

What does GCC do?

Posted May 14, 2011 0:53 UTC (Sat) by Richard_J_Neill (subscriber, #23093) [Link] (2 responses)

Seems to me that the real question (for Linux users) is actually what GCC/LLVM does. Although the standard may leave something undefined, the compiler must actually chose to do _something_ (the computer is deterministic, one hopes).

In this case, is there a good reference on what the compilers actually do, (and whether it will do so deliberately and consistently on all platforms with/without optimisations, vs whether it is merely coincidental)?

What does GCC do?

Posted May 14, 2011 1:39 UTC (Sat) by neilbrown (subscriber, #359) [Link]

While a given compiler may well be deterministic:

1/ The behaviour may be determined by some subtle context issues that we mere humans may not be able to assess effectively.
2/ The behaviour may change from one release to the next.

So really, what GCC does isn't nearly so relevant as what GCC promises to do. And when it doesn't make a promise, anything is possible.

What does GCC do?

Posted May 14, 2011 13:52 UTC (Sat) by corbet (editor, #1) [Link]

"What the compiler does" can be a dangerous guideline. The GCC and glibc folks are happy to change that behavior if it's not set in stone somewhere, leading to weird breakage down the line.

What Every C Programmer Should Know About Undefined Behavior #2/3

Posted May 15, 2011 11:23 UTC (Sun) by juliank (guest, #45896) [Link]

Fast inverse square root

Posted May 23, 2011 15:52 UTC (Mon) by nye (guest, #51576) [Link]

It's just occurred to me that the 'violating type rules' section points out that tricks of the sort used in the fast inverse square root algorithm rely on undefined behaviour. Am I reading that correctly?

Would replacing

    long i;
    float y;
    y = number;
    i = *(long *)&y;
with
    long i;
    union
    {
        float as_float;
        long as_long;
    } y;
    y.as_float = number;
    i = y.as_long;
be correct (where 'correct' means 'valid C, even if the outcome is implementation defined and not necessarily what you expect')?


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds