Organizing kernel messages
Andrew Morton recently described the kernel's approach to messaging this way:
At LinuxCon Japan, Hisashi Hashimoto described an effort which, he hopes, will get over the line. To that end, he and others have examined previous attempts to bring order to kernel messaging. Undeterred, they have pushed forward with a new project; he then introduced Kazuo Ito who discussed the actual work.
Attempts to regularize kernel messaging usually involve either attaching an identifier to kernel messages or standardizing the message format in some way. One thing that Ito-san noted at the outset is that any scheme requiring wholesale changes to printk() lines is probably not going to get very far. There are over 75,000 such lines in the kernel, many of them buried within macros; there is no practical way to change them all. Other wrapper functions, such as dev_printk(), complicate the situation further. So any change will have to be effected in a way which works with the existing mass of printk() calls.
A few approaches were considered. One would be to create a set of wrapper
macros which would format message identifiers and pass them to
printk(); the disadvantage of this method, of course, is that it
still requires changing all of the printk() call sites. It's also
possible to turn printk() into a macro which would assemble a
message identifier from the available file name and line number
information; those identifiers, though, would be too volatile for the
intended use. So the approach which the developers favored was hooking
into printk() itself to add message identifiers to messages as
they find their way to the console and the logs.
These message identifiers (also called "message-locating helper tokens") must be assigned in some sort of automatic manner; asking the development community to maintain a list of identifiers and attach them to messages seems like a sure road to disappointment. So one must immediately think of how those identifiers will be generated; the two main concerns are uniqueness and stability. It turns out that Ito-san is not concerned with absolute uniqueness; if, on occasion, two or three kernel messages end up with the same identifier, the administrator should still be able to sort things out without a great deal of pain.
Stability is important, though; if message identifiers change frequently between releases - not to mention between boots - their value will be reduced. For that reason, generating messages at compile time using preprocessor variables like __FILE__ and __LINE__ to generate the identifiers, while easy, is not sufficient. One could also use the virtual address of the printk() call site, which is guaranteed to be unique, but that could even change from one system boot to the next, depending on things like the order in which modules are loaded. So a different approach needs to be found.
What this group has settled on is generating a CRC32 hash of the message format string at run time. There is a certain runtime cost to that which would have been nice to avoid, but it's not that high and, if printk() calls are a bottleneck to system performance, there are other problems. If the system has been configured to output message identifiers, this hash value will be prepended (with a "(%08x):" format) to the message before it is printed. A CRC32 hash is not guaranteed to produce a unique identifier for each message (though it is better than CRC16, which is guaranteed to have collisions with 75,000 messages), but it will be close enough.
Discussion of the current implementation during the talk revealed that there are some remaining problems. Messages printed with dev_printk() will all end up with the same identifier, which is an undesirable result. The newly-added "%pV" format directive (which indicates the passing of a structure containing a new format string and argument list) also complicates things significantly by adding recursive format string processing. So the implementation will require some work, but there was not a lot of disagreement over the basic approach.
It was only toward the end of the talk that there was some discussion of what the use cases for this feature are. The initial goal is simply to make it easier to find where a message is coming from in the kernel code. The use of macros, helper functions, etc. can make it hard to track down a message with a simple grep operation. But, with a message ID and a supporting database (to be maintained with a user-space tool), developers should be able to go directly to the correct printk() call. Vinod Kutty noted that, in large installations, automatic monitoring systems could use the identifiers to recognize situations requiring some sort of response. There are also long-term goals around creating databases of messages translated to other languages and help information for specific messages.
So there are real motivations for this sort of work. But, as was noted
back at the beginning, getting any kind of message identifier patch through
the process has always been a losing proposition so far. It is hoped that,
this time around, the solution will be sufficiently useful (even to kernel
developers) and sufficiently nonintrusive that it might just get over the
line. We should find out soon; once the patch has been fixed, it will be
posted to the mailing list for comments.
| Index entries for this article | |
|---|---|
| Kernel | Messages |
| Conference | LinuxCon Japan/2010 |
