Reimplementing printk()
The venerable printk() function has been part of Linux since the very beginning, though it has undergone a fair number of changes along the way. Now, John Ogness is proposing to fundamentally rework printk() in order to get rid of handful of issues that currently plague it. The proposed code does this by adding yet another ring-buffer implementation to the kernel; this one is aimed at making printk() work better from hard-to-handle contexts. For a task that seems conceptually simple—printing messages to the console—printk() is actually a rather complex beast; that won't change if these patches are merged, though many of the problems with the current implementation will be removed.
In the cover letter of his RFC patch set, Ogness lays out seven problems that he sees with the current printk() implementation. The buffer used by printk() is protected by a raw spinlock, which restricts the contexts from which the buffer can be accessed. Calling printk() from a non-maskable interrupt (NMI) or a recursive context, where something called from printk() also calls printk(), currently means that the logging of the message is deferred, which could cause the message to be lost. Printing to slow consoles can result in large latencies, a printk() call may end up taking unbounded time while other deferred messages are printed ahead of the one the caller actually wanted to print.
Two other problems are identified by Ogness. Timestamps on the messages
are added when a message is added to the buffer but, due to deferrals, that
can happen well after the printk() call was made. While that
behavior has the side
effect of nicely sorting the messages in terms of time, it is
"neither accurate nor reliable
". In addition, because printk()
tries to satisfy all of its users, it ends up compromising too much:
In order to fix those problems, he is proposing the addition of a kernel-internal printk() ring buffer that allows for multiple lockless readers; writers use a per-CPU synchronization mechanism that works in any context. This ring-buffer implementation was inspired by a suggestion from Peter Zijlstra, Ogness said. The actual writing of the messages is moved to a dedicated kernel thread, which is preemptible, so the latencies can be bounded. The new ring buffer will be allocated in the initialized data segment of the kernel, so it will be available early in the boot process, even before the memory-management subsystem is available. Timestamps will be generated early in the printk() function.
Beyond that, a new kind of "emergency message" is defined. Those messages will appear on certain consoles immediately; no waiting for any other printk() messages that are queued up for the kernel thread. In order to participate, consoles will need to implement the new write_atomic() operation. The patch set includes an implementation of write_atomic() for the 8250 UART driver. There is a new kernel configuration parameter, CONFIG_LOGLEVEL_EMERGENCY, which sets the lowest log-level value for emergency messages; it defaults to LOG_WARNING. That value can also be set by the "emergency_loglevel=" kernel command-line parameter at boot time.
So instead of trying to ensure that all messages go out as quickly as possible, the proposal would effectively partition printk() messages into two buckets—at least for consoles that implement write_atomic(). Regular messages will be written out by the kernel thread, which gets scheduled and preempted normally, so it may not be flushing all of the messages right away. The messages at the emergency level and above will go out immediately to an emergency console. As he noted in the thread, there are options if losing regular messages in a crash becomes a problem:
There are some downsides and open issues in the proposal, which Ogness also lists. The output from printk() has changed somewhat, which may have unforeseen consequences. A CPU ID will be printed as part of emergency messages to help disambiguate multiple simultaneous messages; those messages are separated from regular printk() messages using a newline character, though there may be an option added to make them stand out more. In addition:
More details on the ring buffer and its API, along with some early performance numbers, can be seen in a patch adding a file to the Documentation directory.
The reaction to the patch set has been positive overall; there have been the usual questions and comments, of course. printk() has been the subject of two relatively recent Kernel Summit sessions (in 2016 and 2017); two of those proposing printk() changes at the summits, Sergey Senozhatsky and Petr Mladek, have also commented on the patches. Mladek had suggestions for improvements to the ring-buffer code, as did Linus Torvalds. There have been no major complaints about the overall goal and plan, however, so it would seem that we could see these changes go into the mainline in a development cycle or two.
Index entries for this article | |
---|---|
Kernel | Kernel messages |
Kernel | printk() |
Posted Feb 27, 2019 0:43 UTC (Wed)
by gdt (subscriber, #6284)
[Link] (4 responses)
Posted Feb 27, 2019 8:49 UTC (Wed)
by jogness (subscriber, #49775)
[Link] (3 responses)
"Right now the emergency messages are set apart from the non-emergency messages using '\n'. There have been requests that some special markers could be specifiable to make it easier for parsers."
Posted Feb 27, 2019 9:24 UTC (Wed)
by NAR (subscriber, #1313)
[Link] (2 responses)
Posted Feb 27, 2019 9:42 UTC (Wed)
by jani (subscriber, #74547)
[Link]
That part would be https://lwn.net/Articles/464276/
Posted Feb 27, 2019 20:54 UTC (Wed)
by rweikusat2 (subscriber, #117920)
[Link]
There's nothing "controversial" here, just lack of understanding.
Posted Feb 27, 2019 1:23 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (12 responses)
Posted Feb 27, 2019 3:40 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (11 responses)
In any case, if I were you I'd add my own timestamp (time when the message was read from the kernel).
Posted Feb 27, 2019 7:06 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (10 responses)
Multiple entries with the same timestamp are treated just fine (de-duped), the problem is timestamps going backwards.
I guess I'll have to add more complicated logic to detect unsynchronized messages.
Posted Feb 27, 2019 8:45 UTC (Wed)
by jogness (subscriber, #49775)
[Link] (4 responses)
Posted Feb 27, 2019 9:45 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Feb 27, 2019 11:10 UTC (Wed)
by jogness (subscriber, #49775)
[Link] (2 responses)
Posted Feb 28, 2019 20:34 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Mar 6, 2019 10:46 UTC (Wed)
by jogness (subscriber, #49775)
[Link]
cat /dev/kmsg
The second field is the sequence number. You can read about the format here:
https://www.kernel.org/doc/Documentation/ABI/testing/dev-...
Posted Feb 27, 2019 16:44 UTC (Wed)
by vadim (subscriber, #35271)
[Link] (4 responses)
Posted Feb 27, 2019 21:00 UTC (Wed)
by claytonc (guest, #98599)
[Link]
Posted Feb 27, 2019 22:15 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Mar 6, 2019 7:39 UTC (Wed)
by gravious (guest, #7662)
[Link]
Posted Mar 6, 2019 13:10 UTC (Wed)
by jdel (guest, #130810)
[Link]
Posted Feb 27, 2019 6:06 UTC (Wed)
by shemminger (subscriber, #5739)
[Link] (1 responses)
Posted Feb 27, 2019 10:39 UTC (Wed)
by LtWorf (subscriber, #124958)
[Link]
Reimplementing printk()
Reimplementing printk()
If logs are parsed by software, what about using a more structured log format like logmft? I guess this idea could be as controversial as systemd :-)
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Reimplementing printk()
Print less
99% of what is in the kernel log is leftovers that some developer was too timid or vain to remove.
Print less