LWN.net Logo

Portability and Pitfalls of C-Types (developerWorks)

IBM developerWorks looks at using types in C. "Effectively use the C type system, with help from Peter Seebach, as he covers Hungarian notation (the good kind and the bad kind), using typedef, portability issues, and major pitfalls."
(Log in to post comments)

a few notable omissions...

Posted May 1, 2006 23:12 UTC (Mon) by stevenj (guest, #421) [Link]

First, no discussion of writing portable code for large projects on Unix is complete without a mention of GNU Autoconf. (In my experience, those who dismiss autoconf usually end up reinventing it badly.) For one thing, you can't always stick with universally supported standards in this world—the author himself recommends using several C99 constructs that aren't universally implemented. Even if you do, the system to build your program is hard to make completely portable, even among Unices, without requiring users to manually edit Makefiles.

He also writes: Floating point numbers are particularly non-portable, unless all the systems involved use a common format. They might not, although a surprising number of modern systems have adopted the IEEE 754 specification. If unsure, write things out as text. First, even if two systems use IEEE 754, they may still have endian differences. Second, binary floating-point formats are essential for storing large scientific datasets, but one doesn't have to abandon portability: there are portable binary formats supported by free/libre libraries, notably NCSA's HDF format.

And then: If you are concerned about data storage requirements, give careful consideration to the possible range of values, and use the smallest type that can represent the range you care about. This advice can be a disaster for floating-point calculations (see Kahan's rules of thumb for why): unless you really know what you are doing, use double precision!

Also, Do not treat pointers as integers. While many systems tolerate some amount of abuse along these lines, some do not. Umm, intptr_t?

Careful with the omissions...

Posted May 1, 2006 23:59 UTC (Mon) by Stavros (guest, #36829) [Link]

[...] unless you really know what you are doing, use double precision!

I disagree. In particular for storage, whether in memory or on disk, the careful choice of storage type for floats is a valuable memory jog telling you to pay attention to the precision of your calculations. An apparent loss of precision is rarely due to insufficient size of the storage or processing type but rather due to a poor choice of calculational method.

The danger in simply taking the double-precision-short-cut is that you can fool yourself into thinking you have much more precision than you really do ("Look at all those digits!"). I have been working in scientific computing for many, many years and even there, where everyone should know better, it is surprising how often people calculate a small quantity by subtracting a large quantity from another large quantity and using double (or even quadruple) precision in the mistaken belief that the problem is solved. Then they are shocked when the input quantization wreaks havoc with the results.

For those who want to know more, read "What Every Computer Scientist Should Know About Floating Point Arithmetic" (David Goldberg, 1991). The issues are also discussed in many books on numerical analysis, though often in less detail.

Numerical calculaton is rarely simple and rules of thumb like using doubles by default can be very dangerous. Kudos if you know this and are careful in your math, but I worry about leading less aware people astray.

-- Stavros

exponent range!

Posted May 2, 2006 2:18 UTC (Tue) by jreiser (subscriber, #11027) [Link]

The increased exponent range of double precision (about 10**308) in contrast to single precision (about 10**38) is at least as important as the increased precision (about 13 decimal digits vs. about 6.)

floating-point precision

Posted May 2, 2006 7:36 UTC (Tue) by stevenj (guest, #421) [Link]

You're absolutely right that these issues are tricky, and that double precision is not a panacea. However, Kahan's points are twofold:

First, error analysis can be very nonobvious, and because of this it's dangerous to ask programmers to decide for themselves whether single precision is sufficient. As Kahan puts it, Except in extremely uncommon situations, extra-precise arithmetic generally attenuates risks due to roundoff at far less cost than the price of a competent error-analyst.

Second, another misconception that Kahan warns about, which is quite common in my experience, is the myth that "Arithmetic should be barely more precise than the data and the desired result." Here, the developerWorks author suggested that the precision be determined by the "range you care about", which in my opinion is dangerously misleading at best—it could easily be read as saying that the precision is given by the desired accuracy of the result (or perhaps by the accuracy of the input, which is usually just as wrong), when in fact you have to worry about the intermediate calculations as well.

For huge datasets, there is a strong incentive to use single precision, I agree. Even then it is often a good idea to perform in-cache intermediate calculations in double precision.

I've read the Goldberg article, but I'm inclined to find Kahan's 80-page presentation (despite being a series of slides and not an article) to be more thought-provoking because it goes directly after common misconceptions and explodes them.

C99

Posted May 3, 2006 8:16 UTC (Wed) by ringerc (subscriber, #3071) [Link]

At least this article, unlike the other recent DeveloperWorks article, discussed c99 types like 'int32' in passing. There really should be NO need for those awful-and-often-conflicting integer size typedefs in modern code.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds