LWN.net Logo

The Journal - a proposed syslog replacement

The Journal - a proposed syslog replacement

Posted Nov 20, 2011 7:11 UTC (Sun) by dlang (✭ supporter ✭, #313)
In reply to: The Journal - a proposed syslog replacement by skissane
Parent article: The Journal - a proposed syslog replacement

What makes you think that binary formats would be any more standardized than the existing text formats?

There are ways to do self-describing text formats, but developers don't do it.

With text formats it's a lot easier to examine the file and reverse engineer the format than it is from a binary format.

Damaged/lost files are also a place where text files are easier to recover than binary files.

In theory none of this should ever be needed and binary files are just fine. But this is where the quote "in theory, theory and practice are the same, but in practice they are not" applies


(Log in to post comments)

The Journal - a proposed syslog replacement

Posted Nov 20, 2011 7:55 UTC (Sun) by skissane (subscriber, #38675) [Link]

I think, if you want to stick to text, it would be much better if tools output in some standardised text format, e.g. XML, JSON, YAML, etc.

But then, once you have a standardised text format, why not save some space and processing time with an efficient binary serialization of XML/JSON/YAML/what-have-you?

And then you can have a tool, e.g. bin2text, which reads the binary format on standard input and writes the text format on standard output, and vice versa. With such a tool, reverse-engineering/examination should be no harder than with a plain text format.

I think this would be better than both (1) the rather poorly-defined text formats used at present by many tools and (2) binary is more efficient than text.

The point you make about trying to recover from corrupted files being easier when they are in text is true, but how often do you have to deal with that? If there were provided some good quality libraries (say C with bindings to other common languages such as C++, Java, Perl, Python, etc.), the odds of a corrupt file due to programmer error should be low, outside of some mid-transaction failure scenario. And if we had transaction support in the library or the underlying filesystem, we could avoid that problem too.

The Journal - a proposed syslog replacement

Posted Nov 23, 2011 22:12 UTC (Wed) by cas (subscriber, #52554) [Link]

But then, once you have a standardised text format, why not save some space and processing time with an efficient binary serialization of XML/JSON/YAML/what-have-you?

  • space is irrelevant these days. multi-terabyte disks are cheap, readily available consumer products
  • in my experience, XML etc *greatly* complicates most jobs, increasing processing time, difficulty of programming, difficulty of understanding WTF is going on. it turns what should be a quick and simple one liner to extract information into a multi-hour programming effort reading API docs, parsing the data in whatever obscured format it's in (and possibly parsing other things like the DTD).
  • it's completely missing the point of XML, JSON, YAML etc - they're data *transfer* protocols, not data *storage* methods. their purpose is to unambiguosly transfer data from one system to another, not to store data in yet another obscure special purpose file format
  • it violates the KISS principle. but, then, everything Lennart is involved in does that.

The Journal - a proposed syslog replacement

Posted Nov 23, 2011 23:38 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

even multi-terabyte disks are expensive if you need a lot of them.

I store my logs at 10:1 compression (or better) and I still have 10's of TB of logs to deal with.

The Journal - a proposed syslog replacement

Posted Nov 20, 2011 8:12 UTC (Sun) by drag (subscriber, #31333) [Link]

Well they tried. It ended up being XML. :(

The Journal - a proposed syslog replacement

Posted Nov 20, 2011 19:27 UTC (Sun) by skissane (subscriber, #38675) [Link]

The problem with XML is:
1) a syntax originally designed for marking up documents got reused for
data, with the result that XML provides distinctions which are
unnecessary for data purposes (e.g. element vs. attribute distinction)
2) historical baggage, e.g. DTDs
Certainly you can define new syntaxes which avoid those two problems that
XML has. On the other hand, whatever its warts, XML is an industry standard,
and practical considerations often imply choosing the imperfect industry
standard over some technically superior but rarely used alternative.

But, JSON is quite common now, and addresses some of the issues above. (But
I think it has its own deficiencies too)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds