This may come off as a bit abusive and probably is full of fail but what I'd like to see in a log format is null deliminated strings.
And it would look something like this:
log_version\0 machine_ident\0 machine_fqdn\0 timestamp\0 service_ident\0 service_string\0 process_id\0 severity\0 data\0 checksum\0\0\n
something simple like that. The *ident fields are UUID and are completely arbitrary.
The 'machine_ident' would be generated when the syslog-like daemon first starts up like ssh keys are. When the logging daemon connects to a service or starts a new log file it just pukes out a log entry with various useful system identification strings that can be easily picked up by any logging parsing software. Like how browsers do when they connect to a web server. That way it makes it easy to identify the machine by UUID. As long as you can read the first log entry in any file or any time it connects to a network logging daemon then you can figure out what it is pretty easily.
Timestamps are just x.xxxx seconds from unix epoch, GMT. Can have a fine grain of a time stamp as the application warrants and the system can deliver on.
Severity level is similar to how Debian does their apt-pinning. Just a number, like 0-1000. And that number maps to different severity levels:
0-250 - debug
250-500 - info
500-750 - warning
750-1000 - error
That way application developers have a way of saying "well this error is more of a error then that error", which seems important.
The actual data field can be whatever you want. Any data as long as no nulls. Probably more structuring can be layered on later, but this makes it easy to incorporate legacy logging data into this format. Just take the string as delivered by the application/server, stuff the entire thing into <data> and wrap it in those other fields as well as can be done. <data> being JSON would be fine by me and the fact that it's JSON or whatever would be recorded as part of the version string.
I know something like that would make my job a lot easier. :)
They should be paying attention to the lumberjack project
Posted Apr 14, 2012 23:23 UTC (Sat) by dlang (✭ supporter ✭, #313)
[Link]
The biggest problem with your approach is that it requires throwing away all existing logging and log processing tools, and as you aren't going to get everyone to buy into the new scheme at once and modify every program in the world to use your new scheme the probable result is that nothing will happen instead.
They should be paying attention to the lumberjack project
Posted Apr 15, 2012 3:38 UTC (Sun) by drag (subscriber, #31333)
[Link]
I guess so.
I figured it would be the logging daemon's job to put in all the fields as well as it can, but shovel in the log from the application into the 'data' section. If it leaves the 'severity' section empty or whatever then that would be legal. It's a 'best effort' type thing rather then requiring strict compliance.
They should be paying attention to the lumberjack project
Posted Apr 15, 2012 4:03 UTC (Sun) by dlang (✭ supporter ✭, #313)
[Link]
The idea here (lumberjack and CEE) is to support and encourage the applications (including the kernel) to create structured logs so that the data that you are referring to as the 'data' section is easier (and thus more reliable) to deal with.
the first step is to have the normal message just stuck in the 'data' section, and the lumberlog library ( http://algernon.github.com/libumberlog/umberlog.html ) is designed to do just that. It can be LD_PRELOADed for any application and it modifies the syslog() call to log a structured log (JSON structure with added metadata). It then allows the application programmer to change syslog() calls to ul_syslog() calls and add additional name-value pairs.
the next step is to create a more complete logging API that allows the application programmer to more easily create structured logs. Debate over how that could/should work is ongoing.