Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 0:25 UTC (Mon) by stevenj (guest, #421)
Parent article: What's coming in glibc 2.10

Certain special interest groups subverted the standardization process (again) and pressed through changes to introduce in the C programming language extensions to support decimal floating point computations. 99.99% of all the people will never use this stuff and still we have to live with it.

My understanding was that decimal floating point is actually extraordinarily useful in circumstances like banking and accounting where human inputs (which are invariably in decimal) have to be preserved exactly, but the range of fixed-point representations is too narrow. (And good arguments have been made by various people with long experience in floating-point arithmetic, such as Kahan, that decimal floating point's elimination of rounding in binary-decimal conversion will eliminate a lot of silly errors in other fields too, and hence should become the norm as long as the speed is adequate.) And the latest revision of IEEE-754 describes an efficient way to encode decimal floating point into binary, with the possibility of future hardware support as well, so C is being revised to support this capability.

So why is Drepper digging in his heels?

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 0:43 UTC (Mon) by tbrownaw (guest, #45457) [Link] (7 responses)

My understanding was that decimal floating point is actually extraordinarily useful in circumstances like banking and accounting where human inputs (which are invariably in decimal) have to be preserved exactly, but the range of fixed-point representations is too narrow.

I thought that with money you always wanted high-precision fixed-point with hard errors on numeric overflow, such as "15 digits with two of them after the decimal point". Floating point anything would mean that when you got enough dollars your cents would start getting rounded off, so what I understand is typically done is to use rationals with a fixed denominator of 100 (or whatever your money system uses).

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 1:47 UTC (Mon) by ringerc (subscriber, #3071) [Link] (6 responses)

Generic rationals can be a screaming nightmare to work with. Rationals are OK if you use a fixed denominator for the type and are really careful about the sorts of calculations you perform and the ordering of those calculations. They're still a pain, just not as bad as variable-denominator rationals.

It seems to generally be quite sufficient to use double precision floating-point for this sort of thing. You just have to ensure you're running in strict IEEE floating point mode, trap floating point exceptions, and allow enough precision that you have at least a couple of significant figures of breathing room.

I've been fairly happy with PostgreSQL's `DECIMAL' type. It's a fixed-precision base-10 decimal that's REALLY nice to work with:

test=# SELECT (DECIMAL '100000000000000000000.0' + DECIMAL '0.0000000000100000000000000001') * DECIMAL '4.0' AS result; 
                       result                        
-----------------------------------------------------
 400000000000000000000.00000000004000000000000000040

test=# SELECT DECIMAL '400000000000000000000.0000000000000000000000000000000000000000000000000000040' / DECIMAL '1.1' AS result2;
                                    result2                                    
-------------------------------------------------------------------------------
 363636363636363636363.6363636363636363636363636363636363636363636363636363673
(1 row)

Having something like this in the core C language specification would be quite delightful.

(Of course, on a random and unrelated rant: a proper unicode string type would be a HUGE improvement to C/C++. The poorly-specified, painful to work with wchar_t with its variation across platforms and implementations, and its total lack of associated encoding conversion functions doesn't really do the job. Let's not even talk about std::wstring .)

Anyway ... I'd actually be really interested in some good references on numeric type choice and proper calculation methods in financial applications.

wchar_t

Posted Apr 20, 2009 8:37 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (5 responses)

You don't need a Unicode string type in C, and it's probably a mistake to ask for one to be built into C++ (but I can't tell, maybe C++ standardisation is about trying to /collect/ mistakes at this point).

wchar_t is a legacy of the mistaken belief that Unicode was (as some documents from a decade or more ago declared) the encoding of all world symbols into a 16-bit value. Once UCS-2 was obsolete, wchar_t was obsolete too, don't use it. Use UTF-8 on the wire, on disk and even in memory except when you're doing heavyweight character processing, and then use UTF-32, ie uint32_t or at a pinch (since the tops bits are unused anyway) int.

The only real non-legacy argument for UTF-16 was that it's less bytes than UTF-8 for texts in some writing systems, notably Chinese. But the evidence of the last couple of decades is that the alphabetic and syllabic writing systems will eat the others alive, the majority of the world's population may yet be speaking Chinese in our lifetimes, but if so they'll write it mostly in Roman script destroying UTF-16's size advantage.

wchar_t

Posted Apr 20, 2009 13:33 UTC (Mon) by mrshiny (guest, #4266) [Link] (3 responses)

Chinese has only about 410 unique syllables (2050 if you include tones). There are thousands of words and many sentences which, even if tones are properly conveyed, are ambiguous. I would be surprised if the current romanizations replaced Chinese characters. I would be less surprised to see an alphabet arise instead, much like what the Japanese have.

wchar_t

Posted Apr 20, 2009 18:55 UTC (Mon) by proski (subscriber, #104) [Link] (2 responses)

It's already happening in Taiwan: http://en.wikipedia.org/wiki/Bopomofo

wchar_t

Posted Apr 21, 2009 1:19 UTC (Tue) by xoddam (guest, #2322) [Link] (1 responses)

On the other hand, pinyin is pretty much universal on the mainland, often as a way of specifying regional or personal pronunciation.

Vietnam has successfully switched (almost) entirely from han tu to the Latin alphabet, albeit with a forest of diacritics. Chinese might one day do the same, but it's unlikely since there is such a variety of Chinese speech and usage. Unlike Vietnamese, Chinese national identity has never been a matter of shared pronunciation.

Both pinyin and bopomofo have some chance of evolving to make it possible to write both pronunciation and semantics reproducibly in the same representation, but neither is likely to become a universal replacement for hanzi, since they lose the advantage (not meaningfully damaged by the Simplified/Traditional split) that the several very different Chinese languages become mutually intelligible when written down.

Universal alphabetisation of Chinese won't be possible until the regional differences become better acknowledged, so people learn literacy both in their first dialect and in the "standard" language(s).

As for the relatively low count of "unique" symbols -- the whole idea of unifying hanzi and the Japanese and Korean versions using the common semantics to reduce the required code space and "assist" translations and text searches has met great resistance, especially in Japan, and despite it there are now nearly 100,000 distinct characters defined in Unicode. 16 bits was always a pipe dream.

It is ultimately necessary (ie. required by users) to represent distinct glyphs uniquely; Unicode still doesn't satisfy many users precisely because it tries not to have too many distinct code points; probably it never will.

I expect one day the idea of choosing a font based on national context will be abandoned, and the code point count will finally explode, defining one Unicode character per glyph.

wchar_t

Posted Apr 30, 2009 17:27 UTC (Thu) by pixelpapst (guest, #55301) [Link]

> I expect one day the idea of choosing a font based on national context will be abandoned, and the code point count will finally explode, defining one Unicode character per glyph.

I agree. And I think when this happens, we just *might* see a revival of UTF-16 in Asia - in a modal form. So you wouldn't repeat the high-order surrogate when it is the same as that of the previous non-BMP character.

This would pack these texts a bit tighter than UTF-8 or UCS-4 (can encode 10 bits per low-order surrogate), while being a bit easier to parse than all the Escape-Sequence modal encodings.

IMHO, let's see.

wchar_t

Posted Apr 21, 2009 0:25 UTC (Tue) by xoddam (guest, #2322) [Link]

wchar_t is 32-bit by default in g++ and in the stddef.h usually used by gcc.

There is a g++ compiler option -fshort-wchart to change the intrinsic type in C++, and you can use alternative headers or pre-define "-D__WCHAR_T__=uint16_t" for C, but this is pretty unusual on Linux except when cross-compiling for another platform (or building WINE).

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 1:01 UTC (Mon) by mgb (guest, #3226) [Link] (5 responses)

64-bit integers are adequate to deal with most banking and financial calculations today. Currencies are generally denoted in cents or mills, although the Zimbabwean dollar might more appropriately be denoted in billions.

128-bit floats can be used to store very large integers today and pure 128-bit integers are waiting in the wings.

Decimal floats are symptomatic of poor design.

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 8:25 UTC (Mon) by epa (subscriber, #39769) [Link] (4 responses)

64-bit integers are adequate to deal with most banking and financial calculations today.

Not all numbers used in finance are integers. Consider exchange rates and interest rates, for a start. If you were particularly perverse you could decide to use 64-bit ints for everything, with some way of encoding the number of decimal places (or binary places), but in that case you have effectively reinvented a floating point math library.

Decimal floats are symptomatic of poor design.

Not at all. They are often the best match to what the user and the rest of the world requires. It is accepted that 1/3 gives a recurring decimal .333... but no accountant wants their computer system to introduce rounding errors, no matter how minute, when calculating 1/5 (which is .0011... in binary). Or do you mean that *floating* point decimal is a bad idea, and it's better to use fixed point with a certain fixed number of digits precision? There is certainly a case for that.

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 16:31 UTC (Mon) by stevenj (guest, #421) [Link] (3 responses)

A lot of people here are proposing that decimal fixed point is just as good or better than decimal floats.

I'm a little skeptical of this, based on my experience with scientific computation: there are many, many circumstances when both the input and output of the computation appear to be in a range suitable for fixed-point representation, but the intermediate calculations will have vastly greater rounding errors in fixed point than in floating point. And fixed-point error analysis in the presence of rounding and overflow is a nightmare compared to floating point.

Decimal floating point gives you the best of both worlds. If the result of each calculation is exactly representable, it will give you the exact result. (Please don't raise the old myth that floating-point calculations add some mysterious random noise to each calculation!) There is no rounding when decimal inputs are entered, so human input is preserved exactly. And if the result is not exactly representable, its rounding characteristics will be much, much better than fixed point. (And don't try to claim that financial calculations never have to round.)

Note that the IEEE double-precision (64-bit) decimal-float format has a 16 decimal-digit significand (and there is also a quad-precision decimal float with a 34 decimal-digit significand). I would take this over 64-bit fixed point any day: only nine bits of this are sacrificed in IEEE to give you a floating decimal point and fixed relative precision over a wide dynamic range.

Why the vehement objections to decimal floating-point?

Posted Apr 20, 2009 16:34 UTC (Mon) by stevenj (guest, #421) [Link] (2 responses)

(By "result of each calculation is exactly representable" I am of course including intermediate calculations. Note that this is equally true in fixed-point and integer arithmetic.)

Why the vehement objections to decimal floating-point?

Posted Apr 25, 2009 12:29 UTC (Sat) by dmag (guest, #17775) [Link] (1 responses)

Fixed-point won't loose information on simple calculations, but there is a possibility some intermediate results will saturate your representation. For example, if you square a number, add 1 and take the square root. For large numbers, the square isn't likely to be representable.

Floating point has the opposite problem. The intermediate calculations won't blow up, but you can lose precision even in simple cases.

Most people don't have a correct mental model of floating point. Floating point has a reputation for being 'lossy' because it can loose information in non-obvious ways.

$ irb
>> 0.1 * 0.1 - 0.01
=> 1.73472347597681e-18

Sometimes the answer is to store in fixed point, but calculate in floating point (and do appropriate rounding during conversion back to fixed).

Why the vehement objections to decimal floating-point?

Posted Apr 18, 2011 22:37 UTC (Mon) by stevenj (guest, #421) [Link]

Your example makes no sense; the result would be computed exactly in decimal floating point.

More generally, in essentially any case where decimal fixed point with N digits would produce exact results, decimal floating point with an N-digit significand would also produce exact results. The only sacrifice in going from fixed to (decimal) floating point is that you lose a few bits of precision to store the exponent, and in exchange you get vastly bigger dynamic range and much more sensible roundoff characteristics.

You're certainly right that many people don't have a correct mental model of floating point, however.