LWN.net Logo

What If I Don't Actually Like My Users?

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 16:36 UTC (Sat) by im14u2c (subscriber, #5246)
In reply to: What If I Don't Actually Like My Users? by nix
Parent article: What If I Don't Actually Like My Users?

Erm... how often do you have 2 billion elements in a single array?  It's only when you get
larger than that in a *single array* that int vs. size_t matters for an *array index*.

Using more than 2 gigs of memory isn't unheard of, and is in fact quite common.  But, using
that much memory in a *single array* seems rather suspect.  Actually, no, it seems rather
outrageous, unless your screen display is a 1200 dpi bitmap on a 24" screen or something.


(Log in to post comments)

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 17:43 UTC (Sat) by nix (subscriber, #2304) [Link]

It's not common. I'm just a correctness fiend. :)

What If I Don't Actually Like My Users?

Posted Apr 5, 2008 22:13 UTC (Sat) by im14u2c (subscriber, #5246) [Link]

Of course, if your size_t results in bugs like the one above (downcounters that just don't
quit), I'd argue it hurts correctness.

I don't recall anything in the C standard that suggests size_t is actually an appropriate type
for array indexes.  It *is* an appropriate type to pass to malloc, but that's what pops out of
sizeof(type).  It says nothing about the type of the index that you'll use on the resulting
array.

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 0:29 UTC (Sun) by nix (subscriber, #2304) [Link]

The Standard implies it, and the implication is fairly obvious as these 
things go. Let's follow through the logic.

size_t is the upper limit on the size of any object in C, and arrays (like 
other derived types) are themselves objects (they are not functions nor 
incomplete types, the other classes of type).

The smallest addressable object type in C is 'char', which by definition 
occupies one byte; thus, the largest possible array is an array of char of 
size (size_t)-1.

Thus, the largest possible array index is by definition always the same as 
the largest possible allocated object, i.e., contained exactly within 
size_t.

Use another type and it will eventually hurt you. (If your algorithms rely 
on decrementing index counters below zero, I'd say they are themselves 
risky and should be rethought, because if you use that index, you'll be 
indexing an array before its start, which if it goes off the start of an 
allocated object invokes undefined behaviour.)

(As further evidence, the Standard contains a --- non-normative --- 
example of using sizeof to determine the length of an array, which
implies that the length of an array is a size_t, so its index probably is 
too...)

This concludes today's ludicrous pedantry. Don't make the mistake of 
thinking that any of this stuff is actually important. :)

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 1:25 UTC (Sun) by im14u2c (subscriber, #5246) [Link]

I guess you could be even more pedantic and put 'U' suffixes on your array bounds too: int array[3U][5U]; ;-)

As for down-counting loops: The counter going negative is a red herring in terms of correct array accesses. What do the following two loops have in common?

for (i = 0; i < N; i++)
    do_something(array[i]);

for (i = N-1; i >= 0; i--)
    do_something(array[i]);

Answer? Both leave 'i' pointing one element past the end of the array. The only difference is which end.

I personally find negative array subscripting useful. The following is legitimate C code:

    /* Take a histogram of signed 8-bit values */
    int histogram[256];
    int *hist_mid = histogram + 128;
    signed char *data;

    /* ... */

    for (i = 0; i < N; i++)
        hist_mid[data[i]]++;

And as far as the standard goes, at least this example from the C0x standard uses int to define array bounds (in the context of the new "Variable Length Array" feature being added to C).

*shrug*

You're right, though, it doesn't matter a whole lot. Just don't take my signed integer indices away, and I'll let you keep your unsigned ones. :-)

What If I Don't Actually Like My Users?

Posted Apr 6, 2008 14:28 UTC (Sun) by jbh (subscriber, #494) [Link]

The largest single array I've worked with lately was about 500 million 64-bit doubles (the
values array of a CRS matrix).

So I don't think 2 billion elements seems outrageous.

But I wouldn't run that on a 32-bit cpu, for obvious reasons.

What If I Don't Actually Like My Users?

Posted Apr 7, 2008 6:50 UTC (Mon) by gdt (subscriber, #6284) [Link]

Q: Erm... how often do you have 2 billion elements in a single array?

A: When it's being exploited.

brought to you by the letters R, G and B...

Posted Apr 16, 2008 21:49 UTC (Wed) by roelofs (subscriber, #2599) [Link]

Erm... how often do you have 2 billion elements in a single array?

    unsigned char *pix = (unsigned char *)malloc(32768*32768*3*sizeof(unsigned char));

Not even a tiny bit far-fetched.

Greg

brought to you by the letters R, G and B...

Posted Apr 16, 2008 21:55 UTC (Wed) by im14u2c (subscriber, #5246) [Link]

Hmm... GIMP at least tiles that and uses a higher order tile structure.  Are there really
programs that actually allocate that in a single 3GB malloc()?

brought to you by the letters R, G and B...

Posted Apr 16, 2008 22:19 UTC (Wed) by roelofs (subscriber, #2599) [Link]

XV used to try. It included multiple image-format decoders that checked the height and/or width separately but either failed to check the product of the two or else did so but subsequently failed to check the product of the result with the byte-depth. Or they did so incorrectly--i.e., failing only for negative products, not realizing that h*w*d can easily wrap back into the positive range. You really want to use this pattern or its equivalent:

  // int, long, or other signed type:  w, h, npixels, bufsize
  npixels = w * h;
  bufsize = 3 * npixels;
  if (w <= 0 || h <= 0 || npixels/w != h || bufsize/3 != npixels) {
    FAIL();
  }
  buf = malloc(bufsize*sizeof(whatever_buf_is_made_of));

But realistically, you're right--you absolutely don't want to use a program that tries to allocate the whole thing simultaneously, regardless of whether it's in one piece or many. And you may want to avoid certain image formats for the same reason--tiled TIFF, for example, is well suited to very large images, but many (most?) other image formats are not. PNG, for all its simplicity (well, relative to TIFF, anyway), basically requires you to decode at least two full rows simultaneously, and rows can be up to 16 GB each (2^31 - 1 pixels wide * 8 bytes deep for 64-bit RGBA). Of course, 2G x 2G x 64-bit images are still in the realm of fantasy, AFAIK...

Greg

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds