User: Password:
|
|
Subscribe / Log in / New account

They should be paying attention to the lumberjack project

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 1:27 UTC (Sun) by jzbiciak (subscriber, #5246)
In reply to: They should be paying attention to the lumberjack project by dlang
Parent article: Toward more reliable logging

Nothing in particular. It was the only other format that seemed to fit all your selection criteria. I wasn't too surprised to find out that YAML 1.2 is a strict superset of JSON. JSON probably wins on simplicity.

Some additional things in YAML that might be helpful that I don't think are in JSON (but could be mistaken): Explicit typecasting, and the ability to have internal cross-references.

I'm not entirely certain internal cross-references would be useful, although maybe they're useful to refer back to a component of an earlier log message. (Flip side: There's value in redundancy in logs, especially when records go missing.) Explicit typecasting might be useful if there's ever a case where a given value looks like a number but really ought to be treated as a string.

All that said, those are dubious benefits, and JSON probably wins on simplicity. I only mentioned YAML because it was the only other format I could think of offhand that survives the selection criteria fairly well.


(Log in to post comments)

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 1:46 UTC (Sun) by dlang (subscriber, #313) [Link]

Given that filtering criteria may mean that prior log entries are not available, references to them can't count on working, and I don't see much likelihood of them being useful within a single log message (a full document yes, a single log message no)

My query about different serialization protocols was serious. I don't pretend that I know all of them and the advantages of each, so it is very possible that there is something out there that's better.

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 2:41 UTC (Sun) by jzbiciak (subscriber, #5246) [Link]

The fact that very few standard formats survive the selection criteria illustrates the challenge, too. Good luck!

I mentioned YAML because I've found it very lightweight for the things I've used it for, and it is very human-friendly. I didn't realize that JSON is a proper subset of YAML until I looked up some comparisons. So, JSON wins similarly in the human-friendly department, and its simpler spec makes it easier to adopt.

Simple Declarative Language looks interesting. It appears to be a modest step up from JSON, adding explicit types to containers and the ability to add attributes to the type. Sure, you can capture that in a JSON serialization by adding explicit fields, but making it a first class aspect of the syntax has a certain economy to it. I hadn't heard of SDL before today. It looks interesting. Unfortunately, the list of languages that have SDL APIs seems out of line with my usual requirements of C and Perl.

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 3:00 UTC (Sun) by jzbiciak (subscriber, #5246) [Link]

Expanding on my SDL comment... You could easily imagine capturing many repeated aspects of a log entry in the entry type and attributes, rather than fields within the entry record itself. eg:

Example record from my /var/log/messages:

Apr  8 14:23:44 elysium kernel: [9234662.980516] r8169 0000:03:00.0: eth0: link up

One possible way to split between attributes and keys within the container:

entry date=1333913564 host=elysium source=kernel level=info timestamp=9234662.980516 \
     { message="r8169 0000:03:00.0: eth0: link up" }

Or something...

Honestly, I go back and forth between the value of attributed types vs. just embedding the information as fields within the structure. What color do I want my bikeshed today?

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 6:33 UTC (Sun) by lindi (subscriber, #53135) [Link]

You'd also want to have a way to extract that "eth0" in a programmatic way.

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 21:18 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

You'd also want to extract the "0000:03:00.0", "r8169" (device driver name), and possibly "up".

And the date, host, and source values aren't from the kernel, so they wouldn't be in there.

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 21:29 UTC (Fri) by dlang (subscriber, #313) [Link]

the information may not be from the kernel, but by the time anything other than the log transport sees the data, it will need to be there (and arguably the timestamp should be put there by the kernel)

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 22:28 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Aren't we talking about in what form the kernel should produce log messages?

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 22:30 UTC (Fri) by dlang (subscriber, #313) [Link]

I had wandered a bit from that, but yes, that's where we started.

And the kernel should put the timestamp on the messages it generates, you don't know how long it's going to be before some other process picks them up and could add a timestamp to them.

They should be paying attention to the lumberjack project

Posted Apr 15, 2012 3:03 UTC (Sun) by dlang (subscriber, #313) [Link]

I agree abut SDL, but whatever is used needs to be supported in every language (or something that's easy enough to manually create)

Keep in mind that JSON is just the least common denominator, the 'everything must support this' option. It is expected that most logging libraries, and the logging transports (i.e. syslog daemons) will support additional options. At the moment the other options expected for later are

BSON (more efficient transport with type information)

XML (because someone will want it, it's hard to do structured stuff and ignore XML ;-)

but others can be added as/if needed.

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 7:54 UTC (Fri) by man_ls (guest, #15091) [Link]

What about protocol buffers? Have they fallen out of grace already?

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 15:11 UTC (Fri) by dlang (subscriber, #313) [Link]

protocol buffers are good for some things, but they serialize into a binary format, which is not compatible with existing logging tools.

Also (as I understand them) protocol buffers require absolute agreement between the sender and the receiver on the data structures to be passed. This is hard to do for logging libraries that will be written in many different languages, multiple log transport tools, and the vast array of log analysis/storage tools.

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 17:23 UTC (Fri) by smurf (subscriber, #17840) [Link]

No, they're backwards compatible
From the documentation:

>> You can add new fields to your message formats without
>> breaking backwards-compatibility; old binaries simply
>> ignore the new field when parsing

https://developers.google.com/protocol-buffers/docs/overview

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 17:31 UTC (Fri) by dlang (subscriber, #313) [Link]

Ok, but in any case, they won't work with the existing (text based) logging tools.

Yes, any change to the message being logged 'breaks' existing tools that depend on exact matches of known log messages, but as long as the new log format is still text based, all the existing tools can be tweaked (new regex rules) and handle the log messages.

If you switch to something other than text streams for your messages, you will require that all logging tools be re-written to handle your new format. Since this is unlikly to happen, there is a very large emphisis in being compatible with the existing tools.

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 21:10 UTC (Fri) by man_ls (guest, #15091) [Link]

Protocol buffers is a binary protocol, like BSON. If binary formats are being considered (as I deduced from your message) then protocol buffers should be considered. (I myself think that BSON has a much brighter future, but I was just wondering.)

They should be paying attention to the lumberjack project

Posted Apr 20, 2012 21:37 UTC (Fri) by dlang (subscriber, #313) [Link]

nxlog already has a binary transport, but it can only be used from nxlog to nxlog. There is though of having a binary transport, but that's a bit out still as the discussion is still focusing on the right way to generate the data and what tags are going to be used.

CEE is supposed to be releasing a 1.0beta spec, and the initial fields planned are documented at https://fedorahosted.org/lumberjack/wiki/FieldList#Unifie...

for the API, the initial focus is on trying to get a good C API that can replace the syslog() call. RedHat has a largish project that they've been calling ELAPI (Enhanced Logging API https://fedorahosted.org/ELAPI/wiki/WikiPage/Architecture ) that they are now realizing largely overlaps with the capabilities of the modern syslog daemons, so they are going though the code they wrote for that and ripping out lots of it to only keep what's needed. There is some question of if the result is still in the 'sledgehammer to swat a fly' category and so you have lumberlog working from the other direction


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds