What's coming in glibc 2.10
The new malloc_info function therefore does not export a structure. Instead it exports the information in a self-describing data structure. Nowadays the preferred way to do this is via XML. The format can change over time (it's versioned), some fields will stay the same, other will change. No breakage. The reader just cannot assume that all the information will forever be available in the same form. There is no reader in glibc. This isn't necessary, it's easy enough to write outside glibc using one of the many XML libraries."
Posted Apr 19, 2009 15:40 UTC (Sun)
by jkohen (subscriber, #47486)
[Link] (19 responses)
Although I don't care much about this particular function, it would be great if simpler and safer alternatives to XML were considered instead, one example being Google's Protocol Buffers.
Posted Apr 19, 2009 16:19 UTC (Sun)
by bvdm (guest, #42755)
[Link] (5 responses)
Besides, if you want to export information in a standard and extensible manner, XML is pretty much the way to go. Now Protocol Buffers, that would be over-engineering!
Posted Apr 19, 2009 21:51 UTC (Sun)
by elanthis (guest, #6227)
[Link] (2 responses)
Posted Apr 20, 2009 10:43 UTC (Mon)
by alankila (guest, #47141)
[Link] (1 responses)
I think it's quite safe to assume that Ulrich means "text with angle brackets" rather than the entirety of XML specification. In other words, even a crude parser is likely to work...
Posted Apr 20, 2009 15:07 UTC (Mon)
by drag (guest, #31333)
[Link]
I don't know why people are getting excited about this. Mallocinfo seems to me to be a very poorly designed part of the POSIX spec and, like the blog entry said, is dependent on the specifics of a particular malloc implimentation.
The particular Mallocinfo in the spec is so widly useless for any sort of modern system that there is just no 'right' way to do it.
So since it's returning statistica data that are dependent on a particular internal inplimentation then the sort of statistics that are relevent to anybody are going to change based on the changes of Malloc.
So instead of requiring application designers to use some sort of custom parser or try to read data whose relevence and meaning, at any point in the future, can and will change he has decided to go with a very commonly used, standard, self-describing format... which is XML.
Makes sense to me.
Posted Apr 20, 2009 0:27 UTC (Mon)
by xoddam (guest, #2322)
[Link] (1 responses)
Posted Apr 20, 2009 19:41 UTC (Mon)
by jkohen (subscriber, #47486)
[Link]
If you want structured data ready for human consumption, you probably need to write a user-space tool, in which case using XML is no better than protocol buffers, JSON and so on; it just has a more complicated spec.
Posted Apr 19, 2009 20:22 UTC (Sun)
by nix (subscriber, #2304)
[Link] (11 responses)
I don't like XML much, but this seems like a reasonable application for
Posted Apr 19, 2009 20:37 UTC (Sun)
by jkohen (subscriber, #47486)
[Link] (10 responses)
XML is way more complex than required for this problem, and there are simpler encoding schemas. Got something against binary formats? JSON could work. Not that I particularly like JSON, but it's simpler than XML and doesn't try to solve more than required by this particular problem.
Posted Apr 20, 2009 0:33 UTC (Mon)
by bvdm (guest, #42755)
[Link] (9 responses)
The XML world is a huge alphabet soap, but the XML spec itself (which does *not* include namespaces) is pretty small. Why require a JSON parser if most applications already have an XML dependency?
Posted Apr 20, 2009 19:26 UTC (Mon)
by jordanb (guest, #45668)
[Link] (8 responses)
You must live in a different universe than me if you think this is a small spec. Do you work in aerospace perhaps? Compare that to the JSON spec, weighing in at all of eight pages. I doubt it'd take more than 100 lines of code for a fully compliant recursive descent parser. And writing an encoder would be just about as easy. Which makes me wonder if they're *really* including a fully compliant XML encoder into the libc just to spit out a few lines of structured data, or if glibc is going to be joining the long line of shitty systems that produce broken XML by concatenating strings.
Posted Apr 20, 2009 22:42 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Apr 21, 2009 4:24 UTC (Tue)
by bvdm (guest, #42755)
[Link] (6 responses)
Posted Apr 21, 2009 21:56 UTC (Tue)
by jordanb (guest, #45668)
[Link] (5 responses)
The JSON spec is terse but it also rigorously defines the syntax of a very simple data serialization format. There are no omissions that make it incomplete. This is the advantage of JSON for simple data serialization tasks.
Posted Apr 22, 2009 4:15 UTC (Wed)
by dlang (guest, #313)
[Link] (4 responses)
Posted Apr 22, 2009 16:09 UTC (Wed)
by jordanb (guest, #45668)
[Link] (3 responses)
"JavaScript Object Notation (JSON) is a text format for the
The standard is so small because it only has six data
Another advantage of JSON is that if you have simple mostly
[{
Posted Apr 22, 2009 16:11 UTC (Wed)
by jordanb (guest, #45668)
[Link]
Posted Apr 22, 2009 18:51 UTC (Wed)
by bronson (subscriber, #4806)
[Link] (1 responses)
"hlt_bug" : false,
Even so, I'd still much rather work with JSON than XML!
Posted Apr 22, 2009 18:56 UTC (Wed)
by jordanb (guest, #45668)
[Link]
Posted Apr 20, 2009 15:33 UTC (Mon)
by iabervon (subscriber, #722)
[Link]
This would have the same extensibility benefits as XML, but avoid most of the parsing overhead. (Of course, the main practical extensibility benefit comes from using ASCII decimal strings of arbitrary length for numbers, which then requires a bit of parsing.)
Posted Apr 19, 2009 16:54 UTC (Sun)
by welinder (guest, #4699)
[Link] (3 responses)
Posted Apr 19, 2009 17:40 UTC (Sun)
by eklitzke (subscriber, #36426)
[Link] (2 responses)
Posted Apr 19, 2009 22:30 UTC (Sun)
by oak (guest, #2786)
[Link] (1 responses)
Posted Apr 19, 2009 23:50 UTC (Sun)
by foom (subscriber, #14868)
[Link]
Posted Apr 19, 2009 17:53 UTC (Sun)
by ikm (guest, #493)
[Link] (1 responses)
Posted Apr 21, 2009 0:01 UTC (Tue)
by xoddam (guest, #2322)
[Link]
But when is Drepper going to come down amongst us in person and tell us IT ISN'T SO?
Posted Apr 19, 2009 20:23 UTC (Sun)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Apr 19, 2009 21:50 UTC (Sun)
by elanthis (guest, #6227)
[Link] (1 responses)
Posted Apr 19, 2009 22:15 UTC (Sun)
by nix (subscriber, #2304)
[Link]
(hm, and it looks like it might become STT_IFUNC in time and not be
[1] an oversimplification: actually that implicit call happens the first
Posted Apr 19, 2009 22:55 UTC (Sun)
by jreiser (subscriber, #11027)
[Link]
It would be useful to have some parameters: a pointer to the symbol table entry for the symbol in question (and the string table, ...), and a pointer to the dynamic linker context (search path: chain of struct link_map *) that applies. Then the same intercepting function could handle multiple names (there are many str* and mem* functions, for instance) and might be able to handle better some special cases such as compatibility matching, etc.
Posted Apr 20, 2009 0:25 UTC (Mon)
by stevenj (guest, #421)
[Link] (14 responses)
My understanding was that decimal floating point is actually extraordinarily useful in circumstances like banking and accounting where human inputs (which are invariably in decimal) have to be preserved exactly, but the range of fixed-point representations is too narrow. (And good arguments have been made by various people with long experience in floating-point arithmetic, such as Kahan, that decimal floating point's elimination of rounding in binary-decimal conversion will eliminate a lot of silly errors in other fields too, and hence should become the norm as long as the speed is adequate.) And the latest revision of IEEE-754 describes an efficient way to encode decimal floating point into binary, with the possibility of future hardware support as well, so C is being revised to support this capability.
So why is Drepper digging in his heels?
Posted Apr 20, 2009 0:43 UTC (Mon)
by tbrownaw (guest, #45457)
[Link] (7 responses)
My understanding was that decimal floating point is actually extraordinarily useful in circumstances like banking and accounting where human inputs (which are invariably in decimal) have to be preserved exactly, but the range of fixed-point representations is too narrow. I thought that with money you always wanted high-precision fixed-point with hard errors on numeric overflow, such as "15 digits with two of them after the decimal point". Floating point anything would mean that when you got enough dollars your cents would start getting rounded off, so what I understand is typically done is to use rationals with a fixed denominator of 100 (or whatever your money system uses).
Posted Apr 20, 2009 1:47 UTC (Mon)
by ringerc (subscriber, #3071)
[Link] (6 responses)
Generic rationals can be a screaming nightmare to work with. Rationals are OK if you use a fixed denominator for the type and are really careful about the sorts of calculations you perform and the ordering of those calculations. They're still a pain, just not as bad as variable-denominator rationals. It seems to generally be quite sufficient to use double precision floating-point for this sort of thing. You just have to ensure you're running in strict IEEE floating point mode, trap floating point exceptions, and allow enough precision that you have at least a couple of significant figures of breathing room. I've been fairly happy with PostgreSQL's `DECIMAL' type. It's a fixed-precision base-10 decimal that's REALLY nice to work with: Having something like this in the core C language specification would be quite delightful. (Of course, on a random and unrelated rant: a proper unicode string type would be a HUGE improvement to C/C++. The poorly-specified, painful to work with wchar_t with its variation across platforms and implementations, and its total lack of associated encoding conversion functions doesn't really do the job. Let's not even talk about std::wstring .) Anyway ... I'd actually be really interested in some good references on numeric type choice and proper calculation methods in financial applications.
Posted Apr 20, 2009 8:37 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
wchar_t is a legacy of the mistaken belief that Unicode was (as some documents from a decade or more ago declared) the encoding of all world symbols into a 16-bit value. Once UCS-2 was obsolete, wchar_t was obsolete too, don't use it. Use UTF-8 on the wire, on disk and even in memory except when you're doing heavyweight character processing, and then use UTF-32, ie uint32_t or at a pinch (since the tops bits are unused anyway) int.
The only real non-legacy argument for UTF-16 was that it's less bytes than UTF-8 for texts in some writing systems, notably Chinese. But the evidence of the last couple of decades is that the alphabetic and syllabic writing systems will eat the others alive, the majority of the world's population may yet be speaking Chinese in our lifetimes, but if so they'll write it mostly in Roman script destroying UTF-16's size advantage.
Posted Apr 20, 2009 13:33 UTC (Mon)
by mrshiny (guest, #4266)
[Link] (3 responses)
Posted Apr 20, 2009 18:55 UTC (Mon)
by proski (subscriber, #104)
[Link] (2 responses)
Posted Apr 21, 2009 1:19 UTC (Tue)
by xoddam (guest, #2322)
[Link] (1 responses)
Vietnam has successfully switched (almost) entirely from han tu to the Latin alphabet, albeit with a forest of diacritics. Chinese might one day do the same, but it's unlikely since there is such a variety of Chinese speech and usage. Unlike Vietnamese, Chinese national identity has never been a matter of shared pronunciation.
Both pinyin and bopomofo have some chance of evolving to make it possible to write both pronunciation and semantics reproducibly in the same representation, but neither is likely to become a universal replacement for hanzi, since they lose the advantage (not meaningfully damaged by the Simplified/Traditional split) that the several very different Chinese languages become mutually intelligible when written down.
Universal alphabetisation of Chinese won't be possible until the regional differences become better acknowledged, so people learn literacy both in their first dialect and in the "standard" language(s).
As for the relatively low count of "unique" symbols -- the whole idea of unifying hanzi and the Japanese and Korean versions using the common semantics to reduce the required code space and "assist" translations and text searches has met great resistance, especially in Japan, and despite it there are now nearly 100,000 distinct characters defined in Unicode. 16 bits was always a pipe dream.
It is ultimately necessary (ie. required by users) to represent distinct glyphs uniquely; Unicode still doesn't satisfy many users precisely because it tries not to have too many distinct code points; probably it never will.
I expect one day the idea of choosing a font based on national context will be abandoned, and the code point count will finally explode, defining one Unicode character per glyph.
Posted Apr 30, 2009 17:27 UTC (Thu)
by pixelpapst (guest, #55301)
[Link]
I agree. And I think when this happens, we just *might* see a revival of UTF-16 in Asia - in a modal form. So you wouldn't repeat the high-order surrogate when it is the same as that of the previous non-BMP character.
This would pack these texts a bit tighter than UTF-8 or UCS-4 (can encode 10 bits per low-order surrogate), while being a bit easier to parse than all the Escape-Sequence modal encodings.
IMHO, let's see.
Posted Apr 21, 2009 0:25 UTC (Tue)
by xoddam (guest, #2322)
[Link]
There is a g++ compiler option -fshort-wchart to change the intrinsic type in C++, and you can use alternative headers or pre-define "-D__WCHAR_T__=uint16_t" for C, but this is pretty unusual on Linux except when cross-compiling for another platform (or building WINE).
Posted Apr 20, 2009 1:01 UTC (Mon)
by mgb (guest, #3226)
[Link] (5 responses)
128-bit floats can be used to store very large integers today and pure 128-bit integers are waiting in the wings.
Decimal floats are symptomatic of poor design.
Posted Apr 20, 2009 8:25 UTC (Mon)
by epa (subscriber, #39769)
[Link] (4 responses)
Posted Apr 20, 2009 16:31 UTC (Mon)
by stevenj (guest, #421)
[Link] (3 responses)
I'm a little skeptical of this, based on my experience with scientific computation: there are many, many circumstances when both the input and output of the computation appear to be in a range suitable for fixed-point representation, but the intermediate calculations will have vastly greater rounding errors in fixed point than in floating point. And fixed-point error analysis in the presence of rounding and overflow is a nightmare compared to floating point.
Decimal floating point gives you the best of both worlds. If the result of each calculation is exactly representable, it will give you the exact result. (Please don't raise the old myth that floating-point calculations add some mysterious random noise to each calculation!) There is no rounding when decimal inputs are entered, so human input is preserved exactly. And if the result is not exactly representable, its rounding characteristics will be much, much better than fixed point. (And don't try to claim that financial calculations never have to round.)
Note that the IEEE double-precision (64-bit) decimal-float format has a 16 decimal-digit significand (and there is also a quad-precision decimal float with a 34 decimal-digit significand). I would take this over 64-bit fixed point any day: only nine bits of this are sacrificed in IEEE to give you a floating decimal point and fixed relative precision over a wide dynamic range.
Posted Apr 20, 2009 16:34 UTC (Mon)
by stevenj (guest, #421)
[Link] (2 responses)
Posted Apr 25, 2009 12:29 UTC (Sat)
by dmag (guest, #17775)
[Link] (1 responses)
Floating point has the opposite problem. The intermediate calculations won't blow up, but you can lose precision even in simple cases.
Most people don't have a correct mental model of floating point. Floating point has a reputation for being 'lossy' because it can loose information in non-obvious ways.
Posted Apr 18, 2011 22:37 UTC (Mon)
by stevenj (guest, #421)
[Link]
More generally, in essentially any case where decimal fixed point with N digits would produce exact results, decimal floating point with an N-digit significand would also produce exact results. The only sacrifice in going from fixed to (decimal) floating point is that you lose a few bits of precision to store the exponent, and in exchange you get vastly bigger dynamic range and much more sensible roundoff characteristics.
You're certainly right that many people don't have a correct mental model of floating point, however.
Posted Apr 20, 2009 4:31 UTC (Mon)
by xorbe (guest, #3165)
[Link] (1 responses)
Posted Apr 20, 2009 7:37 UTC (Mon)
by bvdm (guest, #42755)
[Link]
LOL
Posted Apr 20, 2009 12:13 UTC (Mon)
by oseemann (guest, #6687)
[Link] (2 responses)
Posted Apr 20, 2009 19:36 UTC (Mon)
by jkohen (subscriber, #47486)
[Link]
Posted Apr 20, 2009 22:25 UTC (Mon)
by nix (subscriber, #2304)
[Link]
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
Who is going to read this information anyway?
Who is going to read this information anyway?
What's coming in glibc 2.10: XML
*is* a trivial dependency: what is it, 4K?)
it.
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
no reason why it has to be 'broken'. It's not as if it's going to be doing
fully general-purpose data->XML transformation: its emitting *one
structure* with contents fully controlled by the libc.
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
market it for web development as a way to send data out to your
client-side JS by simply serializing JSON objects into <script>
tags in the webpage. That's the only reason for the focus on
Javascript -- not to rely on the ECMAScript standard. In fact the
only reference to ECMAScript is informational:
serialization of structured data. It is derived from the object
literals of JavaScript, as defined in the ECMAScript Programming
Language Standard, Third Edition [ECMA]."
types ('object' (essentially associative array), list, number,
boolean, string, and 'null'). It has no ability to do node
attributes and is only tree-based to the extent that objects and
arrays can contain more of them. It is certianly not a rival to
XML for dealing with large scale or very complicated or nuanced
data. But it is an excellent alternative when you have small
amounts of structured data that you wish to serialize.
tabular datasets they can be serialized (with judicious use of
whitespace) in a manner that's both machine and
human-readable. For fun and as an example I decided to 'encode'
my /proc/cpuinfo in JSON:
processor : 0,
vendor_id : "GenuineIntel",
cpu_family : "6",
model : "8",
model_name : "Pentium III (Coppermine)",
stepping : 1,
cpu_MHz : 498.283,
cache_size : [ 262144, "256 KB" ],
fdiv_bug : false,
hlt_bug : false,
f00f_bug : false,
coma_bug : false,
fpu : true,
fpu_exception : true,
cpuid_level : 2,
wp : true,
flags : [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8",
"sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "mmx", "fxsr", "sse", "up" ],
bogomips : 997.69,
clflush_size : 32
}]
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
"f00f_bug" : false,
"coma_bug" : false,
What's coming in glibc 2.10: XML
What's coming in glibc 2.10: XML
What's coming in glibc 2.10
Having to go through XML for that sounds like serious overkill. That
said, there are probably other pieces of information from malloc that
it actually makes sense to provide in xml form.
What's coming in glibc 2.10
What's coming in glibc 2.10
heap. Checking that may be a performance issue, so needing to parse XML
sounds strange. And if Glibc does allocs for the XML output, it would be
even stranger...
What's coming in glibc 2.10
What's coming in glibc 2.10
corbet trolling
What's coming in glibc 2.10
like having a private per-symbol hook into the dynamic linker :)
What's coming in glibc 2.10
What's coming in glibc 2.10
implicit first call into the dynamic linker that happens the first time
any given symbol is referenced[1]. The security context is unchanged: all
the dynamic linker normally does is relocate that PLT entry. It's just
that now you can have it pick *what* to relocate first. It's no more done
in a different context than is calling a function through a function
pointer: that's all it really is.
GNU-specific after all. Even better.)
time a PLT entry is jumped to, but that's nearly the same thing...
... a function pointer to a function that takes no argument and returns the real function pointer to use ...
What's coming in glibc 2.10
Why the vehement objections to decimal floating-point?
Certain special interest groups subverted the standardization process (again) and pressed through changes to introduce in the C programming language extensions to support decimal floating point computations. 99.99% of all the people will never use this stuff and still we have to live with it.
Why the vehement objections to decimal floating-point?
Why the vehement objections to decimal floating-point?
test=# SELECT (DECIMAL '100000000000000000000.0' + DECIMAL '0.0000000000100000000000000001') * DECIMAL '4.0' AS result;
result
-----------------------------------------------------
400000000000000000000.00000000004000000000000000040
test=# SELECT DECIMAL '400000000000000000000.0000000000000000000000000000000000000000000000000000040' / DECIMAL '1.1' AS result2;
result2
-------------------------------------------------------------------------------
363636363636363636363.6363636363636363636363636363636363636363636363636363673
(1 row)
wchar_t
wchar_t
It's already happening in Taiwan: http://en.wikipedia.org/wiki/Bopomofo
wchar_t
wchar_t
wchar_t
wchar_t
Why the vehement objections to decimal floating-point?
Why the vehement objections to decimal floating-point?
64-bit integers are adequate to deal with most banking and financial calculations today.
Not all numbers used in finance are integers. Consider exchange rates and interest rates, for a start. If you were particularly perverse you could decide to use 64-bit ints for everything, with some way of encoding the number of decimal places (or binary places), but in that case you have effectively reinvented a floating point math library.
Decimal floats are symptomatic of poor design.
Not at all. They are often the best match to what the user and the rest of the world requires. It is accepted that 1/3 gives a recurring decimal .333... but no accountant wants their computer system to introduce rounding errors, no matter how minute, when calculating 1/5 (which is .0011... in binary). Or do you mean that *floating* point decimal is a bad idea, and it's better to use fixed point with a certain fixed number of digits precision? There is certainly a case for that.
A lot of people here are proposing that decimal fixed point is just as good or better than decimal floats.
Why the vehement objections to decimal floating-point?
Why the vehement objections to decimal floating-point?
Fixed-point won't loose information on simple calculations, but there is a possibility some intermediate results will saturate your representation. For example, if you square a number, add 1 and take the square root. For large numbers, the square isn't likely to be representable.Why the vehement objections to decimal floating-point?
$ irb
>> 0.1 * 0.1 - 0.01
=> 1.73472347597681e-18
Sometimes the answer is to store in fixed point, but calculate in floating point (and do appropriate rounding during conversion back to fixed).Why the vehement objections to decimal floating-point?
What's coming in glibc 2.10
What's coming in glibc 2.10
What's coming in glibc 2.10
What's coming in glibc 2.10
What's coming in glibc 2.10
guarantees? (Other than X, whicih is having similar problems: there's no
way *it* can use XML over the wire, but it's recently had to
introduce 'generic events' because it's run out of spare events...)