LWN: Comments on "KS2011: Structured error logging"

KS2011: Structured error logging

renox — Thu, 03 Nov 2011 13:52:35 +0000

> But then, I think that nowadays every sysadmin has some basic understanding of english.

Totally wrong, just think about the use of Linux on desktops where each user is also a "sysadmin".

KS2011: Structured error logging

nix — Tue, 01 Nov 2011 14:32:03 +0000

But standardizing things before widespread implementation experience is available is a recipe for awful standards. You *must* implement first, or you end up with the export keyword all over again.

KS2011: Structured error logging

jcm — Tue, 01 Nov 2011 05:04:53 +0000

I'm fond of my 1980s technology, but I see the point here. Still, I would hope any such effort is properly standardized between distributions. If there's an opportunity for larger technology standardization between Linux and Unix-like systems out there, that should be done too. This kind of thing should not be brought to the attention of industry standards bodies after it's been done.

KS2011: Structured error logging

Felix.Braun — Thu, 27 Oct 2011 11:10:40 +0000

The thought of translating error messages was also floated. That would obviously change the has value. But then, I think that nowadays every sysadmin has some basic understanding of english. So I'm not so sure translation would even be necessary.

KS2011: Structured error logging

epierre — Thu, 27 Oct 2011 07:18:08 +0000

If you change the format string to either correct a typo or add a little more information, the hash value changes, that makes the identifier very unstable.

KS2011: Structured error logging

dlang — Thu, 27 Oct 2011 01:24:19 +0000

Ok, that makes sense.

KS2011: Structured error logging

dlang — Thu, 27 Oct 2011 01:19:01 +0000

Ahh, Nothing deliberate, just pure laziness. I do tend to be lazy about capitalisation except when writing formally.

KS2011: Structured error logging

nix — Wed, 26 Oct 2011 14:01:58 +0000

That's why I said syslog() needed to be tweaked. Not syslogd, the syslog() function in libc.

KS2011: Structured error logging

sorpigal — Wed, 26 Oct 2011 13:56:33 +0000

e. e. cummings is famous for using only lower-case characters

KS2011: Structured error logging

sdalley — Wed, 26 Oct 2011 06:56:02 +0000

Hmm. He does seem to have a serene disregard for the rule that sentences should start with a capital letter. On the other hand, he doesn't seem an up so floating many bells down kind of guy, at all ...

KS2011: Structured error logging

raven667 — Tue, 25 Oct 2011 19:21:26 +0000

Here is a crazy thought; is it possible at compile time to get a hash of the contents of the printk() call (before format string replacements) to generate a unique ID? That should give every unique version of a message a unique identifier that could be a key for searching and parsing. Other kinds of conventions or standards for the contents of a message, especially for multi-line messages, would be useful as well to simplify parsing but anything that requires massive changes all over the kernel is probably no good.

KS2011: Structured error logging

dlang — Tue, 25 Oct 2011 17:04:49 +0000

no, and I'm not sure what you are referring to about my writing style.

KS2011: Structured error logging

erwbgy — Tue, 25 Oct 2011 16:53:39 +0000

I have often wondered. Do you intentionally write like e.e.cummings ? :-)

KS2011: Structured error logging

dlang — Tue, 25 Oct 2011 16:34:42 +0000

have you looked at the latest syslog RFC? I think it supports exactly what you are looking for.

however you need to note that the problematic formatting isn't done by syslog, but by the applications that are generating the log.

even in the old syslog RFC, the portions under the control of the syslog daemon are well defined and specified (although there is a LOT of stuff out there that violates these specs)

KS2011: Structured error logging

nix — Tue, 25 Oct 2011 10:02:08 +0000

This I think is proof that you need a proper parser rather than just regex matching. Regexps are not a parser, they are (the core of) a tokenizer.

KS2011: Structured error logging

nix — Tue, 25 Oct 2011 10:00:57 +0000

There is something else that should be done. The syslog network protocol should be changed (or, rather, a new one defined) that contains *both* the formatted message *and* its facility, priority, format string and args, cleanly separated, and syslog() tweaked to generate that. Then syslog consumers can do proper classification without needing to bother with all this UUID nonsense, nor with (as now) some horrific scheme involving analyzing large numbers of messages (or by-hand work) to figure out which bit of them is the format string and which bit is not.

KS2011: Structured error logging

nix — Tue, 25 Oct 2011 09:55:31 +0000

Parsing log files is a hard problem - tools that do so generally turn into [Kay Sievers and Lennart Poettering] "regex horrors."

This is why syslog-ng has had a proper parser, driven by a database of log message formats, for many years now. No regex horrors needed. It's really quite nice (and vastly underused).

KS2011: Structured error logging

liljencrantz — Tue, 25 Oct 2011 09:35:40 +0000

Or "(\\.|[^\\"])*"

KS2011: Structured error logging

iq-0 — Tue, 25 Oct 2011 08:53:08 +0000

Try this one: "(\\\\|\\[^\\]|[^\\"])*"

KS2011: Structured error logging

l0b0 — Tue, 25 Oct 2011 08:21:44 +0000

You also need to account for the fact that you might have an even or odd number of backslashes before the quote:

echo '"foo \"bar\" baz"' | grep -E '"([^"]|\\.)*"' # Succeeds
echo '"foo \"bar\\" baz"' | grep -E '"([^"]|\\.)*"' # Ouch, that's a literal backslash, not an escaped quote!

To fix it, we would need to check that any quotes are preceded by an *odd* number of backslashes: "([^"]|(?<=\\(\\\\)*)")*" Unfortunately this doesn't work with grep -P ("lookbehind assertion is not fixed length"). I don't know if any other regex engines support this.

KS2011: Structured error logging

liljencrantz — Tue, 25 Oct 2011 07:56:39 +0000

Minor nit: You can easily parse quoted strings using something like:

"([^"]|\\.)*"

This will work since regexps choose the longest mathing string. Or am I missing something?

KS2011: Structured error logging

dlang — Tue, 25 Oct 2011 07:50:45 +0000

you can also botch rules for any structured format that you make as well

the fact that your program outputs bogus logs now is a problem with your program, not with the logging protocol.

KS2011: Structured error logging

Cyberax — Tue, 25 Oct 2011 07:08:15 +0000

So, let's see - a record from our code:
============
[Sat 26 Oct 2011] "Joe User" requested lmpp://myserver.dc=some.dc=com/service , "SUCCESS" has been returned.
============

How should we parse it? Well, let's start with the date. It can be parsed by regexps, but it's already some amount of code.

Then there's user name. It can't be parsed by regexps at all (because of quoting, for example "Joe \"the mad\" User"). Then there's URL, which also can not be reliably parsed by regexps. And then finally the exit code which luckily is just a pre-defined string.

It's _really_ _really_ easy to make log unparseable accidentally. And given that quite a lot of log messages are printed only during exceptional/error conditions you might not discover it until it's too late.

So something which just CAN NOT be misused is sorely needed.

KS2011: Structured error logging

dlang — Tue, 25 Oct 2011 03:19:59 +0000

since I don't think I was clear enough in my prior message

don't invent a new logging mechanism, just clean up the formatting of your logs

if the kernel could just adopt a rule along the lines of:

All new log messages would start at the beginning of a line, all continuations of a log message would start with whitespace

it would be a huge win.

this would be enough to let log daemons figure out what a complete log is, and from there log parsers can take it reasonably.

log parsers frequently devolve into regex hell, but they don't have to, and it doesn't require creating a new logging protocol to solve the problem.

KS2011: Structured error logging

dlang — Tue, 25 Oct 2011 02:49:20 +0000

to paraphrase

those who do not understand syslog are doomed to reinvent it, poorly

there are some real issues with syslog (the lack of any ability to know where the log message _really_ came from on a system for example), but most of the problems that are attributed to syslog are really failures in sane log formatting on the part of the application generating the log message, and what makes anyone thing that if you change to some new mechanism the application programmers will be consistent any more than they have in the past?

I am dealing with this at work as well where the application programmers declare syslog 'obsolete' and go on to create their own logging mechanism.

the end result after much effort?? someone finally writes a tool to get this custom log into syslog, and then things settle down.

KS2011: Structured error logging

Cyberax — Tue, 25 Oct 2011 02:34:38 +0000

I think that's one area where Windows did TheRightThing(tm). Messages are structured as XML with each message having a unique UUID (wonderful for event correlation).

XML might be replaced by something better, but the principle stands.