LWN: Comments on "KS2011: Structured error logging" https://lwn.net/Articles/464276/ This is a special feed containing comments posted to the individual LWN article titled "KS2011: Structured error logging". en-us Thu, 11 Sep 2025 23:59:25 +0000 Thu, 11 Sep 2025 23:59:25 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net KS2011: Structured error logging https://lwn.net/Articles/465530/ https://lwn.net/Articles/465530/ renox <div class="FormattedComment"> <font class="QuotedText">&gt; But then, I think that nowadays every sysadmin has some basic understanding of english.</font><br> <p> Totally wrong, just think about the use of Linux on desktops where each user is also a "sysadmin".<br> <p> </div> Thu, 03 Nov 2011 13:52:35 +0000 KS2011: Structured error logging https://lwn.net/Articles/465143/ https://lwn.net/Articles/465143/ nix <div class="FormattedComment"> But standardizing things before widespread implementation experience is available is a recipe for awful standards. You *must* implement first, or you end up with the export keyword all over again.<br> </div> Tue, 01 Nov 2011 14:32:03 +0000 KS2011: Structured error logging https://lwn.net/Articles/465107/ https://lwn.net/Articles/465107/ jcm <div class="FormattedComment"> I'm fond of my 1980s technology, but I see the point here. Still, I would hope any such effort is properly standardized between distributions. If there's an opportunity for larger technology standardization between Linux and Unix-like systems out there, that should be done too. This kind of thing should not be brought to the attention of industry standards bodies after it's been done.<br> </div> Tue, 01 Nov 2011 05:04:53 +0000 KS2011: Structured error logging https://lwn.net/Articles/464681/ https://lwn.net/Articles/464681/ Felix.Braun <div class="FormattedComment"> The thought of translating error messages was also floated. That would obviously change the has value. But then, I think that nowadays every sysadmin has some basic understanding of english. So I'm not so sure translation would even be necessary. <br> </div> Thu, 27 Oct 2011 11:10:40 +0000 KS2011: Structured error logging https://lwn.net/Articles/464654/ https://lwn.net/Articles/464654/ epierre <div class="FormattedComment"> If you change the format string to either correct a typo or add a little more information, the hash value changes, that makes the identifier very unstable.<br> </div> Thu, 27 Oct 2011 07:18:08 +0000 KS2011: Structured error logging https://lwn.net/Articles/464638/ https://lwn.net/Articles/464638/ dlang <div class="FormattedComment"> Ok, that makes sense.<br> </div> Thu, 27 Oct 2011 01:24:19 +0000 KS2011: Structured error logging https://lwn.net/Articles/464636/ https://lwn.net/Articles/464636/ dlang <div class="FormattedComment"> Ahh, Nothing deliberate, just pure laziness. I do tend to be lazy about capitalisation except when writing formally.<br> </div> Thu, 27 Oct 2011 01:19:01 +0000 KS2011: Structured error logging https://lwn.net/Articles/464534/ https://lwn.net/Articles/464534/ nix <div class="FormattedComment"> That's why I said syslog() needed to be tweaked. Not syslogd, the syslog() function in libc.<br> </div> Wed, 26 Oct 2011 14:01:58 +0000 KS2011: Structured error logging https://lwn.net/Articles/464532/ https://lwn.net/Articles/464532/ sorpigal <div class="FormattedComment"> e. e. cummings is famous for using only lower-case characters<br> </div> Wed, 26 Oct 2011 13:56:33 +0000 KS2011: Structured error logging https://lwn.net/Articles/464496/ https://lwn.net/Articles/464496/ sdalley Hmm. He does seem to have a serene disregard for the rule that sentences should start with a capital letter. On the other hand, he doesn't seem an <a href="http://www.poets.org/viewmedia.php/prmMID/15403"> up so floating many bells down</a> kind of guy, at all ... Wed, 26 Oct 2011 06:56:02 +0000 KS2011: Structured error logging https://lwn.net/Articles/464452/ https://lwn.net/Articles/464452/ raven667 <div class="FormattedComment"> Here is a crazy thought; is it possible at compile time to get a hash of the contents of the printk() call (before format string replacements) to generate a unique ID? That should give every unique version of a message a unique identifier that could be a key for searching and parsing. Other kinds of conventions or standards for the contents of a message, especially for multi-line messages, would be useful as well to simplify parsing but anything that requires massive changes all over the kernel is probably no good.<br> </div> Tue, 25 Oct 2011 19:21:26 +0000 KS2011: Structured error logging https://lwn.net/Articles/464445/ https://lwn.net/Articles/464445/ dlang <div class="FormattedComment"> no, and I'm not sure what you are referring to about my writing style.<br> </div> Tue, 25 Oct 2011 17:04:49 +0000 KS2011: Structured error logging https://lwn.net/Articles/464442/ https://lwn.net/Articles/464442/ erwbgy <div class="FormattedComment"> I have often wondered. Do you intentionally write like e.e.cummings ? :-)<br> </div> Tue, 25 Oct 2011 16:53:39 +0000 KS2011: Structured error logging https://lwn.net/Articles/464436/ https://lwn.net/Articles/464436/ dlang <div class="FormattedComment"> have you looked at the latest syslog RFC? I think it supports exactly what you are looking for.<br> <p> however you need to note that the problematic formatting isn't done by syslog, but by the applications that are generating the log.<br> <p> even in the old syslog RFC, the portions under the control of the syslog daemon are well defined and specified (although there is a LOT of stuff out there that violates these specs)<br> </div> Tue, 25 Oct 2011 16:34:42 +0000 KS2011: Structured error logging https://lwn.net/Articles/464384/ https://lwn.net/Articles/464384/ nix <div class="FormattedComment"> This I think is proof that you need a proper parser rather than just regex matching. Regexps are not a parser, they are (the core of) a tokenizer.<br> <p> </div> Tue, 25 Oct 2011 10:02:08 +0000 KS2011: Structured error logging https://lwn.net/Articles/464383/ https://lwn.net/Articles/464383/ nix <div class="FormattedComment"> There is something else that should be done. The syslog network protocol should be changed (or, rather, a new one defined) that contains *both* the formatted message *and* its facility, priority, format string and args, cleanly separated, and syslog() tweaked to generate that. Then syslog consumers can do proper classification without needing to bother with all this UUID nonsense, nor with (as now) some horrific scheme involving analyzing large numbers of messages (or by-hand work) to figure out which bit of them is the format string and which bit is not.<br> <p> </div> Tue, 25 Oct 2011 10:00:57 +0000 KS2011: Structured error logging https://lwn.net/Articles/464382/ https://lwn.net/Articles/464382/ nix <blockquote> Parsing log files is a hard problem - tools that do so generally turn into [Kay Sievers and Lennart Poettering] "regex horrors." </blockquote> This is why syslog-ng has had a proper parser, driven by a database of log message formats, for many years now. No regex horrors needed. It's really quite nice (and vastly underused). Tue, 25 Oct 2011 09:55:31 +0000 KS2011: Structured error logging https://lwn.net/Articles/464378/ https://lwn.net/Articles/464378/ liljencrantz <div class="FormattedComment"> Or "(\\.|[^\\"])*"<br> </div> Tue, 25 Oct 2011 09:35:40 +0000 KS2011: Structured error logging https://lwn.net/Articles/464371/ https://lwn.net/Articles/464371/ iq-0 <div class="FormattedComment"> Try this one: "(\\\\|\\[^\\]|[^\\"])*"<br> </div> Tue, 25 Oct 2011 08:53:08 +0000 KS2011: Structured error logging https://lwn.net/Articles/464361/ https://lwn.net/Articles/464361/ l0b0 You also need to account for the fact that you might have an even or odd number of backslashes before the quote: <code><pre>echo '"foo \"bar\" baz"' | grep -E '"([^"]|\\.)*"' # Succeeds echo '"foo \"bar\\" baz"' | grep -E '"([^"]|\\.)*"' # Ouch, that's a literal backslash, not an escaped quote!</pre></code> To fix it, we would need to check that any quotes are preceded by an *odd* number of backslashes: <code><pre>"([^"]|(?<=\\(\\\\)*)")*"</pre></code> Unfortunately this doesn't work with <code>grep -P</code> ("lookbehind assertion is not fixed length"). I don't know if any other regex engines support this. Tue, 25 Oct 2011 08:21:44 +0000 KS2011: Structured error logging https://lwn.net/Articles/464357/ https://lwn.net/Articles/464357/ liljencrantz Minor nit: You can easily parse quoted strings using something like: <pre> "([^"]|\\.)*" </pre> This will work since regexps choose the <em>longest</em> mathing string. Or am I missing something? Tue, 25 Oct 2011 07:56:39 +0000 KS2011: Structured error logging https://lwn.net/Articles/464355/ https://lwn.net/Articles/464355/ dlang <div class="FormattedComment"> you can also botch rules for any structured format that you make as well<br> <p> the fact that your program outputs bogus logs now is a problem with your program, not with the logging protocol.<br> </div> Tue, 25 Oct 2011 07:50:45 +0000 KS2011: Structured error logging https://lwn.net/Articles/464348/ https://lwn.net/Articles/464348/ Cyberax <div class="FormattedComment"> So, let's see - a record from our code:<br> ============<br> [Sat 26 Oct 2011] "Joe User" requested lmpp://myserver.dc=some.dc=com/service , "SUCCESS" has been returned.<br> ============<br> <p> How should we parse it? Well, let's start with the date. It can be parsed by regexps, but it's already some amount of code.<br> <p> Then there's user name. It can't be parsed by regexps at all (because of quoting, for example "Joe \"the mad\" User"). Then there's URL, which also can not be reliably parsed by regexps. And then finally the exit code which luckily is just a pre-defined string.<br> <p> It's _really_ _really_ easy to make log unparseable accidentally. And given that quite a lot of log messages are printed only during exceptional/error conditions you might not discover it until it's too late.<br> <p> So something which just CAN NOT be misused is sorely needed.<br> </div> Tue, 25 Oct 2011 07:08:15 +0000 KS2011: Structured error logging https://lwn.net/Articles/464339/ https://lwn.net/Articles/464339/ dlang <div class="FormattedComment"> since I don't think I was clear enough in my prior message<br> <p> don't invent a new logging mechanism, just clean up the formatting of your logs<br> <p> if the kernel could just adopt a rule along the lines of:<br> <p> All new log messages would start at the beginning of a line, all continuations of a log message would start with whitespace<br> <p> it would be a huge win.<br> <p> this would be enough to let log daemons figure out what a complete log is, and from there log parsers can take it reasonably.<br> <p> log parsers frequently devolve into regex hell, but they don't have to, and it doesn't require creating a new logging protocol to solve the problem.<br> </div> Tue, 25 Oct 2011 03:19:59 +0000 KS2011: Structured error logging https://lwn.net/Articles/464338/ https://lwn.net/Articles/464338/ dlang <div class="FormattedComment"> to paraphrase<br> <p> those who do not understand syslog are doomed to reinvent it, poorly<br> <p> there are some real issues with syslog (the lack of any ability to know where the log message _really_ came from on a system for example), but most of the problems that are attributed to syslog are really failures in sane log formatting on the part of the application generating the log message, and what makes anyone thing that if you change to some new mechanism the application programmers will be consistent any more than they have in the past?<br> <p> I am dealing with this at work as well where the application programmers declare syslog 'obsolete' and go on to create their own logging mechanism.<br> <p> the end result after much effort?? someone finally writes a tool to get this custom log into syslog, and then things settle down.<br> </div> Tue, 25 Oct 2011 02:49:20 +0000 KS2011: Structured error logging https://lwn.net/Articles/464337/ https://lwn.net/Articles/464337/ Cyberax <div class="FormattedComment"> I think that's one area where Windows did TheRightThing(tm). Messages are structured as XML with each message having a unique UUID (wonderful for event correlation).<br> <p> XML might be replaced by something better, but the principle stands.<br> </div> Tue, 25 Oct 2011 02:34:38 +0000