LWN.net Logo

Various tweaks to printk()

By Jonathan Corbet
May 16, 2012
For the most part, the logging reliability patches covered here in April have been quietly stabilizing and appear to be set for merging for 3.5. But printk() is a heavily-used function, so there are a lot of people with strong opinions on how it should work. Thus the discussion on how printk() can be improved has stretched out for some time. The result, so far, is a better understanding of how continuation lines should be handled and, possibly, a new format for timestamps.

Messages are sent to the system log with printk(), but that function has an interesting bit of historical behavior: like printf() in user space, printk() can be used to send partial lines to the log. Multiple printk() calls can be used to produce a single line in the log stream, piece by piece. The patches for 3.5 make printk() much more record-oriented internally, but the API does not change. So there is a bit of an impedance mismatch between a record-oriented logging system and its stream-oriented API. That mismatch has been there since the beginning, but it has become more clear over time.

The mixed nature of kernel logging leads to a bit of an ambiguity, because any message can be either of two things: (1) a new message to be logged or (2) a continuation of a previous log message. The kernel decides which of the two situations holds by remembering whether the previous log message ended with a newline or not. If there was no trailing newline, a new message will be appended to the previous line.

This approach works much of the time, but it is not without its hazards. In particular, there is nothing that guarantees that two successive printk() calls will be executed one right after the other. Even on a uniprocessor system, interrupt handlers can emit messages between two printk() calls that are supposed to produce a single line of output. Adding more processors to the system clearly makes the situation worse; there is only one log buffer containing messages from all processors, so it is easy for one processor to jump into the middle of a sequence of printk() calls being executed on another. What happens then is not especially pretty: messages get mashed together and corrupted. The result is a log that is harder for humans to read, and which can totally confuse automated log-processing tools.

This patch set was supposed to be about increasing logging reliability, so that sort of message corruption is not welcome. The original plan devised by developer Kay Sievers was to require an explicit KERN_CONT "log level" marker for continuations. In this scheme, every printk() call will generate a new log line unless merging has been explicitly requested with the KERN_CONT "log level." There is a little problem in that most continuation lines are not so-marked in current kernels, leading to lines being split up; Kay's plan was to audit the kernel and fix all of those calls to work properly in the new scheme.

Linus didn't like that idea, saying that things work well as they are now; to him, adding all those KERN_CONT markers just represented unnecessary noise. After some back-and-forth, Kay came around to Linus's point of view, but he still wanted to avoid the corruption of messages whenever possible. The result was a new patch that tries to explicitly remember partial printk() calls and associate them with a specific process. Lines passed to printk() will be merged only if they both come from the same process and only if the second line is clearly not the start of a new log message. The end result is not perfect: if two processors try to output partial lines at the same time, at least one of them will be split. But there will be no more joining of unrelated messages, and that seems like a good thing.

A different branch of the same discussion got into the formatting of timestamps, which will always be present in the new scheme. In current kernels, that timestamp comes in the form of seconds and microseconds since the system booted. But what developers often really want to see is some combination of the absolute time of an event and the relative time from previous events. After some discussion with Sasha Levin, Linus requested a format that looks like this:

    [May12 11:27] foo
    [May12 11:28] bar
    [  +5.077527] zoot
    [ +10.235225] foo
    [  +0.002971] bar
    [May12 11:29] zoot
    [  +0.003081] foo

In other words, events that are relatively far apart in time would be marked with the absolute time with one-minute precision. When things happen more closely in time, the elapsed time between successive events would be printed instead. For any driver developer trying to figure out the relative timing of device-related events, this kind of output format would help to save a lot of mental arithmetic.

The patches to produce this format have not yet been posted, so it is looking likely that we will not see it in the 3.5 kernel. The rest of the logging work should be there for 3.5, though, taking Linux one small step closer to the sort of structured and reliable logging that many users and developers would like to see.


(Log in to post comments)

More precision, pretty please?

Posted May 17, 2012 11:58 UTC (Thu) by renox (subscriber, #23785) [Link]

While the format is nice for the relative events, I disagree about the precision of the 'absolute' events: they should also go to millisecond precision: when you want to correlate something which happened in your application with the kernel logs, you need this.

More precision, pretty please?

Posted May 17, 2012 15:30 UTC (Thu) by vrfy (subscriber, #13362) [Link]

Things that need the raw data should use the /dev/kmsg interface, not the human readable deltas in the text output of /proc/kmsgm, syslog(), or the console.

The nice thing with the new infrastructure is that all this is only calculated during the output, and not during the recording of the data. So the real data is all available and not affected by any of the timestamp output options which might get added.

More precision, pretty please?

Posted May 17, 2012 18:56 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

this requires that you know ahead of time that you are going to have a problem that you need to analyze.

the vast majority of the time you only find out afterwords, and as a result, you are stuck with whatever data was written to log files (i.e. syslog)

More precision, pretty please?

Posted May 17, 2012 23:16 UTC (Thu) by vrfy (subscriber, #13362) [Link]

Sure, if you are stuck in the 80's with plain old syslog, you get plain old syslog. :)

More precision, pretty please?

Posted May 18, 2012 0:03 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

what form or logging do you have on your system that is storing the raw data somewhere?

by the way, syslog today is about as different from 80's syslog as linux today is from linux in the early 90's. it's backwards compatible so most people don't have to deal with the differences, but there are vast new capabilities

More precision, pretty please?

Posted May 18, 2012 2:01 UTC (Fri) by vrfy (subscriber, #13362) [Link]

$ journalctl -o export

__CURSOR=s=b9d82f2af1474b40a9071646c5fb4e92;i=3de;b=17c0ad3537824db790049ffa3960b690;m=656e3d;t=4c03f10a04953;x=16f15d45d2a7d49e;p=system.journal
__REALTIME_TIMESTAMP=1337277001255251
__MONOTONIC_TIMESTAMP=6647357
_BOOT_ID=17c0ad3537824db790049ffa3960b690
_SOURCE_MONOTONIC_TIMESTAMP=6766042
_TRANSPORT=kernel
PRIORITY=6
SYSLOG_IDENTIFIER=kernel
MESSAGE=[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
_MACHINE_ID=ab0fa8cb1d428ef153d49b2a8c7c18cc
_HOSTNAME=mop

More precision, pretty please?

Posted May 17, 2012 22:42 UTC (Thu) by mlawren (subscriber, #10136) [Link]

It is also a shame about the named month abbreviations, in US order. How about a day-of-year: [139 10:58] foo or iso-style year-month-day: [20120518 10:58] foo

More precision, pretty please?

Posted May 18, 2012 3:45 UTC (Fri) by xanni (subscriber, #361) [Link]

Yeah, culturally-specific dates are not a great idea. ISO8601 would be much better.

More precision, pretty please?

Posted May 24, 2012 15:56 UTC (Thu) by ThomasBellman (subscriber, #67902) [Link]

Just spare us the "T" between date and time, and give us a space instead.

More precision, pretty please?

Posted May 26, 2012 19:23 UTC (Sat) by job (guest, #670) [Link]

Yes! Most significant date first and the rest in descending order, please.

All other date formats are for retarded monkeys (replace with favorite name calling) only.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds