|
|
Log in / Subscribe / Register

KS2011: Structured error logging

KS2011: Structured error logging

Posted Oct 25, 2011 3:19 UTC (Tue) by dlang (guest, #313)
In reply to: KS2011: Structured error logging by dlang
Parent article: KS2011: Structured error logging

since I don't think I was clear enough in my prior message

don't invent a new logging mechanism, just clean up the formatting of your logs

if the kernel could just adopt a rule along the lines of:

All new log messages would start at the beginning of a line, all continuations of a log message would start with whitespace

it would be a huge win.

this would be enough to let log daemons figure out what a complete log is, and from there log parsers can take it reasonably.

log parsers frequently devolve into regex hell, but they don't have to, and it doesn't require creating a new logging protocol to solve the problem.


to post comments

KS2011: Structured error logging

Posted Oct 25, 2011 7:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

So, let's see - a record from our code:
============
[Sat 26 Oct 2011] "Joe User" requested lmpp://myserver.dc=some.dc=com/service , "SUCCESS" has been returned.
============

How should we parse it? Well, let's start with the date. It can be parsed by regexps, but it's already some amount of code.

Then there's user name. It can't be parsed by regexps at all (because of quoting, for example "Joe \"the mad\" User"). Then there's URL, which also can not be reliably parsed by regexps. And then finally the exit code which luckily is just a pre-defined string.

It's _really_ _really_ easy to make log unparseable accidentally. And given that quite a lot of log messages are printed only during exceptional/error conditions you might not discover it until it's too late.

So something which just CAN NOT be misused is sorely needed.

KS2011: Structured error logging

Posted Oct 25, 2011 7:50 UTC (Tue) by dlang (guest, #313) [Link]

you can also botch rules for any structured format that you make as well

the fact that your program outputs bogus logs now is a problem with your program, not with the logging protocol.

KS2011: Structured error logging

Posted Oct 25, 2011 7:56 UTC (Tue) by liljencrantz (guest, #28458) [Link] (4 responses)

Minor nit: You can easily parse quoted strings using something like:
"([^"]|\\.)*"
This will work since regexps choose the longest mathing string. Or am I missing something?

KS2011: Structured error logging

Posted Oct 25, 2011 8:21 UTC (Tue) by l0b0 (guest, #80670) [Link] (3 responses)

You also need to account for the fact that you might have an even or odd number of backslashes before the quote:
echo '"foo \"bar\" baz"' | grep -E '"([^"]|\\.)*"' # Succeeds
echo '"foo \"bar\\" baz"' | grep -E '"([^"]|\\.)*"' # Ouch, that's a literal backslash, not an escaped quote!
To fix it, we would need to check that any quotes are preceded by an *odd* number of backslashes:
"([^"]|(?<=\\(\\\\)*)")*"
Unfortunately this doesn't work with grep -P ("lookbehind assertion is not fixed length"). I don't know if any other regex engines support this.

KS2011: Structured error logging

Posted Oct 25, 2011 8:53 UTC (Tue) by iq-0 (subscriber, #36655) [Link] (2 responses)

Try this one: "(\\\\|\\[^\\]|[^\\"])*"

KS2011: Structured error logging

Posted Oct 25, 2011 9:35 UTC (Tue) by liljencrantz (guest, #28458) [Link] (1 responses)

Or "(\\.|[^\\"])*"

KS2011: Structured error logging

Posted Oct 25, 2011 10:02 UTC (Tue) by nix (subscriber, #2304) [Link]

This I think is proof that you need a proper parser rather than just regex matching. Regexps are not a parser, they are (the core of) a tokenizer.

KS2011: Structured error logging

Posted Oct 25, 2011 10:00 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

There is something else that should be done. The syslog network protocol should be changed (or, rather, a new one defined) that contains *both* the formatted message *and* its facility, priority, format string and args, cleanly separated, and syslog() tweaked to generate that. Then syslog consumers can do proper classification without needing to bother with all this UUID nonsense, nor with (as now) some horrific scheme involving analyzing large numbers of messages (or by-hand work) to figure out which bit of them is the format string and which bit is not.

KS2011: Structured error logging

Posted Oct 25, 2011 16:34 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

have you looked at the latest syslog RFC? I think it supports exactly what you are looking for.

however you need to note that the problematic formatting isn't done by syslog, but by the applications that are generating the log.

even in the old syslog RFC, the portions under the control of the syslog daemon are well defined and specified (although there is a LOT of stuff out there that violates these specs)

KS2011: Structured error logging

Posted Oct 26, 2011 14:01 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

That's why I said syslog() needed to be tweaked. Not syslogd, the syslog() function in libc.

KS2011: Structured error logging

Posted Oct 27, 2011 1:24 UTC (Thu) by dlang (guest, #313) [Link]

Ok, that makes sense.

KS2011: Structured error logging

Posted Oct 25, 2011 16:53 UTC (Tue) by erwbgy (subscriber, #4104) [Link] (4 responses)

I have often wondered. Do you intentionally write like e.e.cummings ? :-)

KS2011: Structured error logging

Posted Oct 25, 2011 17:04 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

no, and I'm not sure what you are referring to about my writing style.

KS2011: Structured error logging

Posted Oct 26, 2011 13:56 UTC (Wed) by sorpigal (subscriber, #36106) [Link] (1 responses)

e. e. cummings is famous for using only lower-case characters

KS2011: Structured error logging

Posted Oct 27, 2011 1:19 UTC (Thu) by dlang (guest, #313) [Link]

Ahh, Nothing deliberate, just pure laziness. I do tend to be lazy about capitalisation except when writing formally.

KS2011: Structured error logging

Posted Oct 26, 2011 6:56 UTC (Wed) by sdalley (subscriber, #18550) [Link]

Hmm. He does seem to have a serene disregard for the rule that sentences should start with a capital letter. On the other hand, he doesn't seem an up so floating many bells down kind of guy, at all ...


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds