Reimplementing printk()

By Jake Edge
February 26, 2019

The venerable printk() function has been part of Linux since the very beginning, though it has undergone a fair number of changes along the way. Now, John Ogness is proposing to fundamentally rework printk() in order to get rid of handful of issues that currently plague it. The proposed code does this by adding yet another ring-buffer implementation to the kernel; this one is aimed at making printk() work better from hard-to-handle contexts. For a task that seems conceptually simple—printing messages to the console—printk() is actually a rather complex beast; that won't change if these patches are merged, though many of the problems with the current implementation will be removed.

In the cover letter of his RFC patch set, Ogness lays out seven problems that he sees with the current printk() implementation. The buffer used by printk() is protected by a raw spinlock, which restricts the contexts from which the buffer can be accessed. Calling printk() from a non-maskable interrupt (NMI) or a recursive context, where something called from printk() also calls printk(), currently means that the logging of the message is deferred, which could cause the message to be lost. Printing to slow consoles can result in large latencies, a printk() call may end up taking unbounded time while other deferred messages are printed ahead of the one the caller actually wanted to print.

Two other problems are identified by Ogness. Timestamps on the messages are added when a message is added to the buffer but, due to deferrals, that can happen well after the printk() call was made. While that behavior has the side effect of nicely sorting the messages in terms of time, it is "neither accurate nor reliable". In addition, because printk() tries to satisfy all of its users, it ends up compromising too much:

Loglevel INFO is handled the same as ERR. There seems to be an endless effort to get printk to show _all_ messages as quickly as possible in case of a panic (i.e. printing from any context), but at the same time try not to have printk be too intrusive for the callers. These are conflicting requirements that lead to a printk implementation that does a sub-optimal job of satisfying both sides.

In order to fix those problems, he is proposing the addition of a kernel-internal printk() ring buffer that allows for multiple lockless readers; writers use a per-CPU synchronization mechanism that works in any context. This ring-buffer implementation was inspired by a suggestion from Peter Zijlstra, Ogness said. The actual writing of the messages is moved to a dedicated kernel thread, which is preemptible, so the latencies can be bounded. The new ring buffer will be allocated in the initialized data segment of the kernel, so it will be available early in the boot process, even before the memory-management subsystem is available. Timestamps will be generated early in the printk() function.

Beyond that, a new kind of "emergency message" is defined. Those messages will appear on certain consoles immediately; no waiting for any other printk() messages that are queued up for the kernel thread. In order to participate, consoles will need to implement the new write_atomic() operation. The patch set includes an implementation of write_atomic() for the 8250 UART driver. There is a new kernel configuration parameter, CONFIG_LOGLEVEL_EMERGENCY, which sets the lowest log-level value for emergency messages; it defaults to LOG_WARNING. That value can also be set by the "emergency_loglevel=" kernel command-line parameter at boot time.

So instead of trying to ensure that all messages go out as quickly as possible, the proposal would effectively partition printk() messages into two buckets—at least for consoles that implement write_atomic(). Regular messages will be written out by the kernel thread, which gets scheduled and preempted normally, so it may not be flushing all of the messages right away. The messages at the emergency level and above will go out immediately to an emergency console. As he noted in the thread, there are options if losing regular messages in a crash becomes a problem:

As long as all critical messages are print directly and immediately to an emergency console, why is it is problem if the informational messages to consoles are sometimes delayed or lost? And if those informational messages _are_ so important, there are things the user can do. For example, create a realtime userspace task to read /dev/kmsg.

There are some downsides and open issues in the proposal, which Ogness also lists. The output from printk() has changed somewhat, which may have unforeseen consequences. A CPU ID will be printed as part of emergency messages to help disambiguate multiple simultaneous messages; those messages are separated from regular printk() messages using a newline character, though there may be an option added to make them stand out more. In addition:

Be aware that printk output is no longer time-sorted. Actually, it never was, but now you see the real timestamps. This seems strange at first.

More details on the ring buffer and its API, along with some early performance numbers, can be seen in a patch adding a file to the Documentation directory.

The reaction to the patch set has been positive overall; there have been the usual questions and comments, of course. printk() has been the subject of two relatively recent Kernel Summit sessions (in 2016 and 2017); two of those proposing printk() changes at the summits, Sergey Senozhatsky and Petr Mladek, have also commented on the patches. Mladek had suggestions for improvements to the ring-buffer code, as did Linus Torvalds. There have been no major complaints about the overall goal and plan, however, so it would seem that we could see these changes go into the mainline in a development cycle or two.

Index entries for this article
Kernel	Kernel messages
Kernel	printk()

Reimplementing printk()

Posted Feb 27, 2019 0:43 UTC (Wed) by gdt (subscriber, #6284) [Link] (4 responses)

Options to make messages stand out more tend to be painful when automatically processing log messages. Kernel programmers are humans and want log output tuned for humans, but many more log readers are scripts using some sort of regular expression.

Reimplementing printk()

Posted Feb 27, 2019 8:49 UTC (Wed) by jogness (subscriber, #49775) [Link] (3 responses)

The point of making emergency messages stand out is exactly for easier automatic processing. From the cover letter:

"Right now the emergency messages are set apart from the non-emergency messages using '\n'. There have been requests that some special markers could be specifiable to make it easier for parsers."

Reimplementing printk()

Posted Feb 27, 2019 9:24 UTC (Wed) by NAR (subscriber, #1313) [Link] (2 responses)

If logs are parsed by software, what about using a more structured log format like logmft? I guess this idea could be as controversial as systemd :-)

Reimplementing printk()

Posted Feb 27, 2019 9:42 UTC (Wed) by jani (subscriber, #74547) [Link]

> I guess this idea could be as controversial as systemd :-)

That part would be https://lwn.net/Articles/464276/

Reimplementing printk()

Posted Feb 27, 2019 20:54 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

Diagnostic output isn't supposed to be "parsed by software", that's usually just a crude attempt at getting around lack of an interfacen supposed to provided what said software really wants to have. Hence, optimizing the output format to make the crude workaround easier to implement at the expense of making the intended purpose (debugging) more complicated are a bad idea.

There's nothing "controversial" here, just lack of understanding.

Reimplementing printk()

Posted Feb 27, 2019 1:23 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

I have a tool that depends on increasing message timestamps (it uses them as a position marker). Would it be possible to add an option to fall back to the previous behavior for timestamps?

Reimplementing printk()

Posted Feb 27, 2019 3:40 UTC (Wed) by smurf (subscriber, #17840) [Link] (11 responses)

What does your tool do with boot messages? Today, the first 20 lines or so always end up with a timestamp of zero.

In any case, if I were you I'd add my own timestamp (time when the message was read from the kernel).

Reimplementing printk()

Posted Feb 27, 2019 7:06 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

The application itself is designed to be restartable and I'm using the timestamp to find out unsynchronized logs - I'm just checking the last uploaded log timestamp and then proceed to sync everything that is at or after that timestamp.

Multiple entries with the same timestamp are treated just fine (de-duped), the problem is timestamps going backwards.

I guess I'll have to add more complicated logic to detect unsynchronized messages.

Reimplementing printk()

Posted Feb 27, 2019 8:45 UTC (Wed) by jogness (subscriber, #49775) [Link] (4 responses)

You really should be using the sequence number. That is a reliable, monotonically increasing value. (For my proposal as well.) Also, it allows you to identify if messages are missing.

Reimplementing printk()

Posted Feb 27, 2019 9:45 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

I can't read /dev/kmsg, it might not be accessible. dmesg is all I can reliably get :(

Reimplementing printk()

Posted Feb 27, 2019 11:10 UTC (Wed) by jogness (subscriber, #49775) [Link] (2 responses)

Isn't dmesg just using /dev/kmsg? Or are you using "dmesg -S"?

Reimplementing printk()

Posted Feb 28, 2019 20:34 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I get piped dmesg output. I can play with its options but I don't see how I can make it print sequence numbers.

Reimplementing printk()

Posted Mar 6, 2019 10:46 UTC (Wed) by jogness (subscriber, #49775) [Link]

If you can call dmesg, I expect you can read /dev/kmsg as well. Take a look at the output of:

cat /dev/kmsg

The second field is the sequence number. You can read about the format here:

https://www.kernel.org/doc/Documentation/ABI/testing/dev-...

Reimplementing printk()

Posted Feb 27, 2019 16:44 UTC (Wed) by vadim (subscriber, #35271) [Link] (4 responses)

Why not take advantage of that journald already has the metadata needed to uniquely identify a line?

Reimplementing printk()

Posted Feb 27, 2019 21:00 UTC (Wed) by claytonc (guest, #98599) [Link]

Maybe they are on a distro without systemd?

Reimplementing printk()

Posted Feb 27, 2019 22:15 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I'm dreaming about the moment I can do this :( For now there's still RHEL6 and Amazon Linux 1 that I need to support.

Reimplementing printk()

Posted Mar 6, 2019 7:39 UTC (Wed) by gravious (guest, #7662) [Link]

The older distros you care about won't have this new printk() stuff backported, surely?

Reimplementing printk()

Posted Mar 6, 2019 13:10 UTC (Wed) by jdel (guest, #130810) [Link]

printk goes straight out serial console ports. journald just taps into a copy of the console data to log it locally.

Print less

Posted Feb 27, 2019 6:06 UTC (Wed) by shemminger (subscriber, #5739) [Link] (1 responses)

What about silence?
99% of what is in the kernel log is leftovers that some developer was too timid or vain to remove.

Print less

Posted Feb 27, 2019 10:39 UTC (Wed) by LtWorf (subscriber, #124958) [Link]

Ah, had this discussion with a coworker, who doesn't want to remove the "verbose" from tar in build logs… Which makes build logs 99% the same list of files over and over.