LWN.net Logo

What If I Don't Actually Like My Users?

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 1:32 UTC (Sat) by pr1268 (subscriber, #24648)
In reply to: What If I Don't Actually Like My Users? by jzbiciak
Parent article: What If I Don't Actually Like My Users?

Oy. Thankfully I can say that was almost half a lifetime ago for me.

Darn... That happened just yesterday for me (it was with the blasted parameters to a function call).

In that same program, I experienced the below bundle of joy (and what is it with these stupid size_t types?!) Code:

size_t moonWalk(const vector<MyThingy>& stuff)
{
    size_t i = h_list.size();
    while(i >= 0) {
        /* do something with */ stuff[--i];
    }
    return i;
}

Oh, so subtle! It was a long afternoon of troubleshooting before I figured out why Mr. Fault (first name: Segmentation) kept interrupting my otherwise blissful coding session. (Note to self: Be sure to add brown paper bags to shopping list.)


(Log in to post comments)

Problem and solution

Posted Apr 5, 2008 11:01 UTC (Sat) by man_ls (subscriber, #15091) [Link]

Hmmm... before we all ignorants spend a long afternoon figuring this out, could you show us the problem and the solution? The code looks fine to me...

Problem and solution

Posted Apr 5, 2008 12:08 UTC (Sat) by emk (subscriber, #1128) [Link]

It took me a second, too, because it's using a strange (incorrect) looping idiom. Here, size_t is an unsigned type, so it can never be negative. If i == 0, and you write --i, it wraps around to a huge positive value. To fix it, try:

while (i > 0) {

Thanks to the unsigned nature of size_t, it's surprisingly hard to loop backwards over STL vectors without tripping over this bug. You could choose to use reverse iterators, which are clunky but safer:

vector<MyThingy>::const_reverse_iterator iter = stuff.rbegin();
for (; iter != stuff.rend(); ++iter) {
    /* do something with */ *iter;
}

Also, I have no idea why the original function returns i. Was there a break somewhere in the original loop? Without one, i would always equal 0 (assuming the loop terminated successfully).

Problem and solution

Posted Apr 6, 2008 12:55 UTC (Sun) by olecom (guest, #42886) [Link]

> while (i > 0) { stuff[--i];
when using predecrement, use

i=max+1; do { --i; } while(i);

Problem and solution

Posted Apr 6, 2008 13:20 UTC (Sun) by olecom (guest, #42886) [Link]

> > while (i > 0) { stuff[--i];
> when using predecrement, use
>
> i=max+1; do { --i; } while(i);

Also beware on input!

max -- is maximum linear address
i   -- geek counter, counts downto zero including;
       thus, number of loops is (max + 1)

better is to have *all* counts downto zero,
i.e. zero address access and

i = max /* number of loops is (max + 1) */
do {
  /* use */ stuff[i];
} while (--i);

Problem and solution

Posted Apr 6, 2008 15:20 UTC (Sun) by man_ls (subscriber, #15091) [Link]

What if the array is empty? You would be doing one iteration, while the above idiom (while (i>0) {--i;}) skips the loop and works fine.

Problem and solution

Posted Apr 7, 2008 5:11 UTC (Mon) by olecom (guest, #42886) [Link]

> What if the array is empty?

Same thing -- check your input. `if' for check, `while' for loop/iterations.

Mixing the two isn't a good thing, worst optimization ever.
_____

Problem and solution

Posted Apr 7, 2008 6:17 UTC (Mon) by man_ls (subscriber, #15091) [Link]

It depends. Quite often it doesn't matter much, and the compact version can be better (at least it takes less time to write). You know what they say about premature optimization, don't you?

By the way, with Java your best optimization is to write while (i != 0) instead of while (i > 0), because otherwise the JVM will perform arithmetic comparisons instead of logical. (Yes, it is pitiful.) I believe they have fixed it now, but it would be nice to know for sure.

just correct loops (Problem and solution)

Posted Apr 7, 2008 8:28 UTC (Mon) by olecom (guest, #42886) [Link]

> It depends. Quite often it doesn't matter much, and the compact version
> can be better (at least it takes less time to write).

All was about decrement-after-check access in the loop.

> You know what they say about premature optimization, don't you?

Those loops are quite fine for me, YMMV.
BTW, i'm in to system programming, so i don't know what Java is, really .:)
-- 
sed 'sed && sh + olecom = love' << ''
-o--=O`C
 #oo'L O
<___=E M

Problem and solution

Posted Apr 10, 2008 15:13 UTC (Thu) by DennisJ (subscriber, #14700) [Link]

Supposing a signed type had been used for the counter, and nothing changed to the loop
condition, the last pass through the loop would have done something to stuff[-1] which is also
unlikely to be what the author wanted.

So isn't 'while(i>0)' the correct solution following an incorrect analysis?

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 15:51 UTC (Sat) by nix (subscriber, #2304) [Link]

-Wall gives you a nice helpful warning about this case.

(I use size_t (and where appropriate ssize_t) religiously for things like 
array indexes, so I hardly ever see this problem. It only turns up whe you 
have to interface with code that was written by people who don't 
understand that the index to arrays in C is *not* a fricking int or a 
long, it's a size_t... this is starting to matter now that int and long 
are different sizes again on some platforms.)

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 16:36 UTC (Sat) by jzbiciak (✭ supporter ✭, #5246) [Link]

Erm... how often do you have 2 billion elements in a single array?  It's only when you get
larger than that in a *single array* that int vs. size_t matters for an *array index*.

Using more than 2 gigs of memory isn't unheard of, and is in fact quite common.  But, using
that much memory in a *single array* seems rather suspect.  Actually, no, it seems rather
outrageous, unless your screen display is a 1200 dpi bitmap on a 24" screen or something.

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 17:43 UTC (Sat) by nix (subscriber, #2304) [Link]

It's not common. I'm just a correctness fiend. :)

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 22:13 UTC (Sat) by jzbiciak (✭ supporter ✭, #5246) [Link]

Of course, if your size_t results in bugs like the one above (downcounters that just don't
quit), I'd argue it hurts correctness.

I don't recall anything in the C standard that suggests size_t is actually an appropriate type
for array indexes.  It *is* an appropriate type to pass to malloc, but that's what pops out of
sizeof(type).  It says nothing about the type of the index that you'll use on the resulting
array.

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 0:29 UTC (Sun) by nix (subscriber, #2304) [Link]

The Standard implies it, and the implication is fairly obvious as these 
things go. Let's follow through the logic.

size_t is the upper limit on the size of any object in C, and arrays (like 
other derived types) are themselves objects (they are not functions nor 
incomplete types, the other classes of type).

The smallest addressable object type in C is 'char', which by definition 
occupies one byte; thus, the largest possible array is an array of char of 
size (size_t)-1.

Thus, the largest possible array index is by definition always the same as 
the largest possible allocated object, i.e., contained exactly within 
size_t.

Use another type and it will eventually hurt you. (If your algorithms rely 
on decrementing index counters below zero, I'd say they are themselves 
risky and should be rethought, because if you use that index, you'll be 
indexing an array before its start, which if it goes off the start of an 
allocated object invokes undefined behaviour.)

(As further evidence, the Standard contains a --- non-normative --- 
example of using sizeof to determine the length of an array, which
implies that the length of an array is a size_t, so its index probably is 
too...)

This concludes today's ludicrous pedantry. Don't make the mistake of 
thinking that any of this stuff is actually important. :)

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 1:25 UTC (Sun) by jzbiciak (✭ supporter ✭, #5246) [Link]

I guess you could be even more pedantic and put 'U' suffixes on your array bounds too: int array[3U][5U]; ;-)

As for down-counting loops: The counter going negative is a red herring in terms of correct array accesses. What do the following two loops have in common?

for (i = 0; i < N; i++)
    do_something(array[i]);

for (i = N-1; i >= 0; i--)
    do_something(array[i]);

Answer? Both leave 'i' pointing one element past the end of the array. The only difference is which end.

I personally find negative array subscripting useful. The following is legitimate C code:

    /* Take a histogram of signed 8-bit values */
    int histogram[256];
    int *hist_mid = histogram + 128;
    signed char *data;

    /* ... */

    for (i = 0; i < N; i++)
        hist_mid[data[i]]++;

And as far as the standard goes, at least this example from the C0x standard uses int to define array bounds (in the context of the new "Variable Length Array" feature being added to C).

*shrug*

You're right, though, it doesn't matter a whole lot. Just don't take my signed integer indices away, and I'll let you keep your unsigned ones. :-)

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 14:28 UTC (Sun) by jbh (subscriber, #494) [Link]

The largest single array I've worked with lately was about 500 million 64-bit doubles (the
values array of a CRS matrix).

So I don't think 2 billion elements seems outrageous.

But I wouldn't run that on a 32-bit cpu, for obvious reasons.

What If I Don't Actually Like My Users?

Posted Apr 7, 2008 6:50 UTC (Mon) by gdt (subscriber, #6284) [Link]

Q: Erm... how often do you have 2 billion elements in a single array?

A: When it's being exploited.

brought to you by the letters R, G and B...

Posted Apr 16, 2008 21:49 UTC (Wed) by roelofs (subscriber, #2599) [Link]

Erm... how often do you have 2 billion elements in a single array?

    unsigned char *pix = (unsigned char *)malloc(32768*32768*3*sizeof(unsigned char));

Not even a tiny bit far-fetched.

Greg

brought to you by the letters R, G and B...

Posted Apr 16, 2008 21:55 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246) [Link]

Hmm... GIMP at least tiles that and uses a higher order tile structure.  Are there really
programs that actually allocate that in a single 3GB malloc()?

brought to you by the letters R, G and B...

Posted Apr 16, 2008 22:19 UTC (Wed) by roelofs (subscriber, #2599) [Link]

XV used to try. It included multiple image-format decoders that checked the height and/or width separately but either failed to check the product of the two or else did so but subsequently failed to check the product of the result with the byte-depth. Or they did so incorrectly--i.e., failing only for negative products, not realizing that h*w*d can easily wrap back into the positive range. You really want to use this pattern or its equivalent:

  // int, long, or other signed type:  w, h, npixels, bufsize
  npixels = w * h;
  bufsize = 3 * npixels;
  if (w <= 0 || h <= 0 || npixels/w != h || bufsize/3 != npixels) {
    FAIL();
  }
  buf = malloc(bufsize*sizeof(whatever_buf_is_made_of));

But realistically, you're right--you absolutely don't want to use a program that tries to allocate the whole thing simultaneously, regardless of whether it's in one piece or many. And you may want to avoid certain image formats for the same reason--tiled TIFF, for example, is well suited to very large images, but many (most?) other image formats are not. PNG, for all its simplicity (well, relative to TIFF, anyway), basically requires you to decode at least two full rows simultaneously, and rows can be up to 16 GB each (2^31 - 1 pixels wide * 8 bytes deep for 64-bit RGBA). Of course, 2G x 2G x 64-bit images are still in the realm of fantasy, AFAIK...

Greg

Correction and comment replies

Posted Apr 5, 2008 20:42 UTC (Sat) by pr1268 (subscriber, #24648) [Link]

s/h_list/stuff/ on line 3 of my code.

In reply to man_ls, as others have pointed out, a size_t is an unsigned long integer type. Integer underflow occurred here with the pre-decrement --i array index.

In reply to emk, I did indeed changing over to using a reverse iterator for this code, but I originally had some motivation not to do so, the reason which now escapes me. And to think this was just two days ago - goes to show just how fast thoughts and ideas flee through my mind when writing software!

Thanks for the replies - it's not just anywhere where I can share such embarrassing C++ code and still stimulate a nice discussion.

What If I Don't Actually Like My Users?

Posted Apr 8, 2008 14:16 UTC (Tue) by aya (guest, #19767) [Link]

Ah, but you forget you're using an STL class.

size_t moonWalk(const vector<MyThingy>& stuff)
{
    for (vector<MyThingy>::const_reverse_iterator i= stuff.rbegin();
         i != stuff.rend();
         ++i)
    {
        /* do something with *i */
    }

    /* I don't get what you're returning, though. */
}

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds