What's coming in glibc 2.10: XML [LWN.net]

What's coming in glibc 2.10: XML

Posted Apr 19, 2009 16:19 UTC (Sun) by bvdm (guest, #42755) [Link] (5 responses)

Using XML for this purpose is really not that unreasonable given that very few applications will call this function and they likely already have an XML parser as dependency.

Besides, if you want to export information in a standard and extensible manner, XML is pretty much the way to go. Now Protocol Buffers, that would be over-engineering!

What's coming in glibc 2.10: XML

Posted Apr 19, 2009 21:51 UTC (Sun) by elanthis (guest, #6227) [Link] (2 responses)

It's very unreasonable, because it's a HUGE freaking specification and there's no truly compelling reason not to use a simple list of structs with a corresponding free_mallocinfo() function, just like any number of similar APIs in the libc already use.

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 10:43 UTC (Mon) by alankila (guest, #47141) [Link] (1 responses)

Don't overlook the possibility that this decision is a harbinger for using less binary structs in glibc's API, and more XML. Could happen.

I think it's quite safe to assume that Ulrich means "text with angle brackets" rather than the entirety of XML specification. In other words, even a crude parser is likely to work...

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 15:07 UTC (Mon) by drag (guest, #31333) [Link]

Well ya.. XML is simple. Angle-y brackets and a few other small things then your done.

I don't know why people are getting excited about this. Mallocinfo seems to me to be a very poorly designed part of the POSIX spec and, like the blog entry said, is dependent on the specifics of a particular malloc implimentation.

The particular Mallocinfo in the spec is so widly useless for any sort of modern system that there is just no 'right' way to do it.

So since it's returning statistica data that are dependent on a particular internal inplimentation then the sort of statistics that are relevent to anybody are going to change based on the changes of Malloc.

So instead of requiring application designers to use some sort of custom parser or try to read data whose relevence and meaning, at any point in the future, can and will change he has decided to go with a very commonly used, standard, self-describing format... which is XML.

Makes sense to me.

Who is going to read this information anyway?

Posted Apr 20, 2009 0:27 UTC (Mon) by xoddam (guest, #2322) [Link] (1 responses)

Heap implementation and performance details are not the sort of thing an application can hope to comprehend and adapt to on the fly. I would expect this information to be examined only by developers (often after the fact eg. in logs depicting memory pressure in a production system experiencing performance problems or horrible OOM events), who might prefer a more human-readable representation but shouldn't be afraid of XML in the case of a complex hierarchical heap structure.

Who is going to read this information anyway?

Posted Apr 20, 2009 19:41 UTC (Mon) by jkohen (subscriber, #47486) [Link]

In this case there is really no excuse to go with XML. Even something like /proc/meminfo is more readable to a human than structured markup.

If you want structured data ready for human consumption, you probably need to write a user-space tool, in which case using XML is no better than protocol buffers, JSON and so on; it just has a more complicated spec.

What's coming in glibc 2.10: XML

Posted Apr 19, 2009 20:22 UTC (Sun) by nix (subscriber, #2304) [Link] (11 responses)

Writing XML is easy. It's not as if it needs to parse it. (And tinyxml
*is* a trivial dependency: what is it, 4K?)

I don't like XML much, but this seems like a reasonable application for
it.

What's coming in glibc 2.10: XML

Posted Apr 19, 2009 20:37 UTC (Sun) by jkohen (subscriber, #47486) [Link] (10 responses)

If you plan to only use a subset of XML, then you should to define well what that subset is in advance. Otherwise applications don't know what features they need to support. It doesn't seem like TinyXML supports namespaces for instance, so if a developer uses this library, and glibc starts using namespaces later, the application might break.

XML is way more complex than required for this problem, and there are simpler encoding schemas. Got something against binary formats? JSON could work. Not that I particularly like JSON, but it's simpler than XML and doesn't try to solve more than required by this particular problem.

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 0:33 UTC (Mon) by bvdm (guest, #42755) [Link] (9 responses)

Wowa! Hold on already...

The XML world is a huge alphabet soap, but the XML spec itself (which does *not* include namespaces) is pretty small. Why require a JSON parser if most applications already have an XML dependency?

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 19:26 UTC (Mon) by jordanb (guest, #45668) [Link] (8 responses)

You must live in a different universe than me if you think this is a small spec.

Do you work in aerospace perhaps?

Compare that to the JSON spec, weighing in at all of eight pages. I doubt it'd take more than 100 lines of code for a fully compliant recursive descent parser. And writing an encoder would be just about as easy.

Which makes me wonder if they're *really* including a fully compliant XML encoder into the libc just to spit out a few lines of structured data, or if glibc is going to be joining the long line of shitty systems that produce broken XML by concatenating strings.

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 22:42 UTC (Mon) by nix (subscriber, #2304) [Link]

Of *course* it's going to produce XML by concatenating strings, but I see
no reason why it has to be 'broken'. It's not as if it's going to be doing
fully general-purpose data->XML transformation: its emitting *one
structure* with contents fully controlled by the libc.

What's coming in glibc 2.10: XML

Posted Apr 21, 2009 4:24 UTC (Tue) by bvdm (guest, #42755) [Link] (6 responses)

Have you read it? It is wonderfully complete and quite educational. If the JSON spec was as well written it would probably be about 3/4 the length.

What's coming in glibc 2.10: XML

Posted Apr 21, 2009 21:56 UTC (Tue) by jordanb (guest, #45668) [Link] (5 responses)

I haven't read it in its entirety. It does seem to be well written, I'll agree to that.

The JSON spec is terse but it also rigorously defines the syntax of a very simple data serialization format. There are no omissions that make it incomplete. This is the advantage of JSON for simple data serialization tasks.

What's coming in glibc 2.10: XML

Posted Apr 22, 2009 4:15 UTC (Wed) by dlang (guest, #313) [Link] (4 responses)

the JSON spec is defining a subset of javascript. does it really fully define everything in it? or does it refer you to the main javascript definitions for the gory details and corner cases?

What's coming in glibc 2.10: XML

Posted Apr 22, 2009 16:09 UTC (Wed) by jordanb (guest, #45668) [Link] (3 responses)

JSON was designed to be a pure subset of Javascript so they could
market it for web development as a way to send data out to your
client-side JS by simply serializing JSON objects into <script>
tags in the webpage. That's the only reason for the focus on
Javascript -- not to rely on the ECMAScript standard. In fact the
only reference to ECMAScript is informational:

"JavaScript Object Notation (JSON) is a text format for the
serialization of structured data. It is derived from the object
literals of JavaScript, as defined in the ECMAScript Programming
Language Standard, Third Edition [ECMA]."

The standard is so small because it only has six data
types ('object' (essentially associative array), list, number,
boolean, string, and 'null'). It has no ability to do node
attributes and is only tree-based to the extent that objects and
arrays can contain more of them. It is certianly not a rival to
XML for dealing with large scale or very complicated or nuanced
data. But it is an excellent alternative when you have small
amounts of structured data that you wish to serialize.

Another advantage of JSON is that if you have simple mostly
tabular datasets they can be serialized (with judicious use of
whitespace) in a manner that's both machine and
human-readable. For fun and as an example I decided to 'encode'
my /proc/cpuinfo in JSON:

[{
processor : 0,
vendor_id : "GenuineIntel",
cpu_family : "6",
model : "8",
model_name : "Pentium III (Coppermine)",
stepping : 1,
cpu_MHz : 498.283,
cache_size : [ 262144, "256 KB" ],
fdiv_bug : false,
hlt_bug : false,
f00f_bug : false,
coma_bug : false,
fpu : true,
fpu_exception : true,
cpuid_level : 2,
wp : true,
flags : [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8",
"sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "mmx", "fxsr", "sse", "up" ],
bogomips : 997.69,
clflush_size : 32
}]

What's coming in glibc 2.10: XML

Posted Apr 22, 2009 16:11 UTC (Wed) by jordanb (guest, #45668) [Link]

Of course that looked nicer before lwn nuked my whitespace but you get the idea.

What's coming in glibc 2.10: XML

Posted Apr 22, 2009 18:51 UTC (Wed) by bronson (subscriber, #4806) [Link] (1 responses)

This is actually invalid JSON. You need to quote the key names:

"hlt_bug" : false,
"f00f_bug" : false,
"coma_bug" : false,

Even so, I'd still much rather work with JSON than XML!

What's coming in glibc 2.10: XML

Posted Apr 22, 2009 18:56 UTC (Wed) by jordanb (guest, #45668) [Link]

Heh. I was cringing when I submitted that because I was sure I'd make a mistake. I thinking too hard about making sure there were commas were all in order. :P

What's coming in glibc 2.10: XML

Posted Apr 20, 2009 15:33 UTC (Mon) by iabervon (subscriber, #722) [Link]

I think it's weirder that glibc will be outputting a big string than that the string will be XML. What I think would be more sensible is if it output a data structure that looked like something you might get by parsing XML (ignoring comments, irrelevant whitespace, details of how characters were written, etc). That is, it would be a struct with a name, an array of attribute keys and values, an array of children, and a string for character data.

This would have the same extensibility benefits as XML, but avoid most of the parsing overhead. (Of course, the main practical extensibility benefit comes from using ASCII decimal strings of arbitrary length for numbers, which then requires a bit of parsing.)