Keeping printk() under control
As a way of improving the situation, Anton Blanchard has put together a new rate limiting scheme which has found its way into the -mm patch tree. This code, which is derived from a rate limiting mechanism used in the networking subsystem, does not automatically solve the problem, since it requires explicit changes to code which could generate message floods. Such code is often easy to identify, however, and easy to fix.
The patch adds a new function:
int printk_ratelimit(void);
Code which could generate lots of messages should call printk_ratelimit() and only call printk() if the return value is nonzero. Thus, printk_ratelimit() returns a failure status if rate limiting is currently in effect and printk() output should be avoided.
By default, the code limits messages to one every five seconds. It will, however, allow ten messages through in a short period before the rate limiting clamps down on the rest. These values are, of course, tuneable via sysctl parameters.
A mechanism like this is only useful if it is used throughout the code.
Core kernel code can be fixed up relatively easily; the patch includes a
fix for the page allocator, for example. The source of message floods,
however, is often a driver which want to be sure that its "my device has
joined the Dark Side" messages are heard. Fixing all of those is a
daunting task, but even a partial solution leaves the kernel less
susceptible to this particular problem than before.
Index entries for this article | |
---|---|
Kernel | printk_ratelimit() |
Posted Jan 15, 2004 3:09 UTC (Thu)
by brugolsky (guest, #28)
[Link] (1 responses)
Posted Jan 15, 2004 3:14 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Jan 15, 2004 17:55 UTC (Thu)
by rwmj (subscriber, #5474)
[Link]
Our testing code usually started off with an outer loop which ran all the tests over M * 1000 passes, and inside that a loop which did tests 1 - N, and inside each of those tests, several layers of inner loops. We maintained a *stack* of messages, such that the outer loop messages were at level 1, then the next inner messages at level 2 and the next ones at level 3 and so on. The key here is that you *don't* print out any messages at all until you get a failure - at which point you print out the stack of messages. Thus a typical failure message would look like: 1: In pass 2345 The verbosity is reduced to nothing, except in the case where you actually have some sort of error, in which case you know exactly where it happened. Also, you could set the verbosity. For example, you could ask it to print all messages at level 1. It seems to me the Linux kernel could implement something similar by using: push_printk (); Rich.
Posted Jan 15, 2004 19:47 UTC (Thu)
by sjmadsen (guest, #4035)
[Link] (1 responses)
If printk_ratelimit() tells the caller not to print anything out, why can't the squelch be done in Yes, critical messages might be missed, but that is solved by always allowing messages at
Posted Jan 16, 2004 16:25 UTC (Fri)
by dw (subscriber, #12017)
[Link]
Here's another example - given a forced rate limiting system, an evil user could do something to cause many thousands of printks, then switch to another terminal and run a kernel exploit as many times as required to guess the correct offsets, etc. Assuming the failed exploit didn't crash the machine, any messages that it may have caused would have been hidden.
Posted Jan 16, 2004 18:53 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
I might modify my position on one severity level - informational. Those probably are truly optional. (But the lower severity level - debug - is not). Solving this problem requires more sophisticated message control within the issuers of the messages. E.g. a rule that a certain error will be reported only 5 times in every minute. Or something that saves up events and reports them in a single message. And it should probably issue a message telling you that you're missing messages. The specific problem of messages getting lost in the blizzard is best solved in klogd/syslogd, not the issuers are the messages.
Should that be Keeping printk() under control
int printk_ratelimit(void)?
Obviously it returns int. In my defense, I plead too much Australian beer.
Corrected.
A long long time ago (13 years?) I used to write testing software for scientific X-ray detectors and we used to have a similar problem with the log messages. We devised a technique to reduce the number of messages printed and make them relevant, which might be of interest.Keeping printk() under control
2: .. In test 3
3: .. .. Testing memory line 4567
4: .. .. .. Write test failed
for (i = 0; i < 10; ++i) {
printk ("very verbose message about where we are now - don't worry \
it won't get printed unless there's a failure");
}
pop_printk ();
My guess is that the answer to this question is somewhere in the kernel archives, but I'm too Dumb question
lazy to go find it.
printk() itself and thus avoid touching code everywhere else?
KERN_ERR (or whatever) through, no matter what. Sure, it can be abused, but
printk_ratelimit() can be ignored, too.
I can envisage a few situations in which the kernel should really try as hard as it can to get message to the administrator, for instance, a kernel-mode NULL pointer dereference which is just about to cause the system to hang should definately get printed to the console, regardless of the current ratelimiting in effect.Dumb question
Code which issues messages only when message traffic is low would be really flaky to use. If the message is worth issuing, it's worth issuing all the time.Keeping printk() under control