LWN.net Logo

access off end of array

access off end of array

Posted Mar 24, 2013 0:46 UTC (Sun) by pflugstad (subscriber, #224)
Parent article: Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Couldn't doing a read off the end of an array potentially cause a segfault? Depending on how the array was allocated? If so, seems to me like this is just buggy code, plain and simple.

I also think this code is confusing and poor coding style. Why even have dd? I think this code:

    int d[16];
    int SATD (void)
    {
      int satd = 0, k;
      for ( k=0; k<16; k++ ) {
        satd += (d[k] < 0 ? -d[k] : d[k]);
      }
      return satd;
    }

is cleaner, easier to understand and maintain, and eliminates the bug. With any optimization at all, GCC should eliminate the repeated d[k] accesses so I would expect almost identical code and performance. Maybe I'm just missing something? Even if you leave dd in there, doing the assignment inside the block is cleaner and easier to understand than inside the for loop.

I also think that GCC using a undefined data access to essentially short circuit a for loop control variable is just busted - and it looks like they fixed this before the final GCC was actually released.


(Log in to post comments)

access off end of array

Posted Mar 24, 2013 1:55 UTC (Sun) by pflugstad (subscriber, #224) [Link]

DOH - I completely missed butlerm's comment... so yeah, it can SEGFAULT.

And I still think it's buggy and unclear code.

access off end of array

Posted Mar 24, 2013 6:53 UTC (Sun) by alankila (subscriber, #47141) [Link]

You don't understand. C developers pride themselves for having come up with these ugly and confusing functions, which perhaps once in history, long time before any of us were born, produced better code from some compiler someone's grandfather used in their youth. Or maybe the grandpa just thought so, or was terribly drunk while writing the code, or experimented with just how much he could get away with using that particular compiler and optimization level. Grandpa was known for being bit unruly. The point is, this specific written form is now hallowed by history and tradition, and therefore can't be changed, and any compiler who dares to break it is immediately at fault.

access off end of array

Posted Mar 25, 2013 19:27 UTC (Mon) by tjc (subscriber, #137) [Link]

If you trace this back to the root problem you may come to the conclusion that changing the state of a variable in a non-assignment expression can be more trouble than it's worth, especially if you're dealing with concurrency. A language that allows

i++;

as a stand-alone statement would be a useful compromise, since it could still be used as the increment statement in a for loop, which is by far the most common idiom for this construct.

access off end of array

Posted Mar 25, 2013 23:13 UTC (Mon) by HelloWorld (guest, #56129) [Link]

The only reason i++ and --i were invented is that that made it possible to generate more efficient code for the PDP-11 with simple-minded compilers. There's no reason for them nowadays as i +=1 is almost as short and most languages today also feature a proper for loop.

access off end of array

Posted Mar 26, 2013 0:34 UTC (Tue) by tjc (subscriber, #137) [Link]

Actually, the ++/PDP-11 connection is urban legend -- see "More History", paragraph 2 at this link:

The Development of the C Language

I think i++ is fine from a syntax point of view, so long as it's a stand-along statement, where it produces the same code as i += 1. But I try to avoid embedding increment operators within expressions that produce easily overlooked side effects.

access off end of array

Posted Mar 26, 2013 9:00 UTC (Tue) by khim (subscriber, #9252) [Link]

Actually, the ++/PDP-11 connection is urban legend -- see "More History", paragraph 2 at this link:

The Development of the C Language

Well, your own link shows that it's not an "urban legend" but more like oversimplification: This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few `auto-increment' memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own.

While factually incorrect (C design predates PDP-11) both "++" in C and "(RX)+" in PDP-11's assembler come from the same source.

access off end of array

Posted Mar 26, 2013 16:04 UTC (Tue) by hummassa (subscriber, #307) [Link]

I know for a fact the 6809 microprocessors had some instructions "load/store from pointer with post/pre-auto-increment/decrement" so that one of:

a = *b++
a = *++b
a = *b--
a = *--b
*b++ = a
*++b = a
*b-- = a
*--b = a

was a single instruction; they made easy to implement real fast stacks and queues, and zero-terminated strings (because "a = *b++" &c set the Z flag if the char was zero).

access off end of array

Posted Mar 26, 2013 16:52 UTC (Tue) by brouhaha (subscriber, #1698) [Link]

Yes, but the 6809 came along much later than the PDP-11, so it's not relevant to discussion of where the C pre/post-increment/decrement operators came from.

access off end of array

Posted Mar 26, 2013 8:09 UTC (Tue) by alankila (subscriber, #47141) [Link]

I guess there would be many ways to improve C, which largely are about breaking expressions that used to work but which are ugly, confusing and sometimes semantically broken. Perhaps GCC can slowly over time nudge people away from doing multiple things in a single statement -- that definitely sounds like an improvement.

access off end of array

Posted Mar 26, 2013 16:26 UTC (Tue) by tjc (subscriber, #137) [Link]

I think a warning flag would be a step in the right direction, and maybe as far as things should to go. -Wall doesn't warn against this sort of thing, but since the "all" in -Wall is not really all, there might already be a flag for this.

access off end of array

Posted Mar 24, 2013 19:02 UTC (Sun) by iabervon (subscriber, #722) [Link]

Probably the real reason to have this in a benchmark is because it's stupid. Unless your compiler does particularly good flow control analysis, it'll generate a read of d[16], which is a likely cache miss (if the compiler aligns the array, there's a good chance that d[16] will be in a different cache line from d[15] and anything else that's hot). If the compiler can figure out that dd isn't used outside the loop, and that it can therefore be set after the test instead of before, you'll get code that runs faster than if the compiler is less clever. Of course, if you wanted to get a fast result, you'd just write it the obvious way and get the optimal result on any compiler, but they want to have some compilers do better than other compilers.

It's like writing an exam question: it would be easy to write a question that everybody would get right, but you want to write a question that people who know the material will get right more often than people who don't. Obviously, in ordinary life, you want to ask questions which will be more likely to get correct answers, and you want to write code that all compilers will make as fast as possible, but that's not the situation here.

access off end of array

Posted Mar 25, 2013 5:09 UTC (Mon) by cesarb (subscriber, #6266) [Link]

Except that it is not an "exam question". It is supposed to be real code, in this case from a reference implementation of the H.264 codec.

It was not written to stress test compilers. It is just not very optimized (and since it is only a reference implementation, it does not have to be).

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds