Kernel development [LWN.net]

Kernel release status

The current 2.6 kernel is 2.6.1, which was released on January 8. The contents of this kernel are pretty much as described last week: a whole lot of fixes along with a few new features (MSI support, EFI support, a couple of internal API changes, etc.). See the long-format changelog for the details.

The latest patch from Andrew Morton, as of this writing, is 2.6.1-mm3. Recent additions to the -mm tree include some anticipatory I/O scheduler work ("This is the 114th patch against the anticipatory scheduler and we're nearly finished, honest"), improved CPU scheduler support for hyperthreaded processors, working modular IDE drivers, a number of big architecture updates, some SELinux updates, several NFS fixes, an ALSA update, the kthread abstraction (discussed here last week), and many other fixes and updates.

The current 2.4 kernel is 2.4.24; Marcelo has released no 2.4.25 prepatches since 2.4.25-pre4 on January 6.

Comments (none posted)

Kernel page editor Down Under

This week's Kernel Page is a little thin as a result of its normal editor being in Australia to attend Linux.Conf.AU. There are limits to the sort of kernel content that can be written over a conference wireless link while simultaneously making a show of listening to whoever is speaking. This page will be back to its normal form next week.

Comments (none posted)

Read-copy-update and interrupt latency

The read-copy-update (RCU) algorithm has found many applications since it was added to the 2.5 kernel. By eliminating lock contention in many situations, RCU can greatly improve performance and scalability on multiprocessor systems. For more information on how RCU works, see this description or this Driver Porting Series article. Or talk to the SCO Group, which claims to own any code which ever even dreamed of using RCU.

It turns out, however, that there is one little problem with RCU - its effect on interrupt response times. RCU works by setting aside cleanup work until a later time, when it is known that the data structures of interest have no further references in the kernel. That cleanup work is done with a software interrupt, meaning it can happen after a hardware interrupt or at rescheduling time. But the list of RCU-protected data to be cleaned up can get quite long; it is used, for example, in high-turnover data structures like the dentry cache. So that software interrupt can, potentially, take a long time to run. The RCU cleanup code, in other words, can monopolize a processor for a relatively long period at just the times when a high-priority process might be trying to run.

Dipankar Sarma has taken a look at the situation and found that processing RCU callbacks can, in some situations, take as much as 400 microseconds or so. That may not seem like a lot of time, but it can be enough to significantly increase response latencies. So he has sent out a set of patches which address the problem.

In modern-day kernel programming, it sometimes seems like there is a standard answer to every problem: create a new kernel thread. Dipankar's patch does exactly that; it adds a new per-CPU "krcud" thread which handles RCU cleanup whenever the list of callbacks gets to be too long. Short callback lists are still dealt with at software interrupt time, since that is a faster way of doing things. But, if the list is too long (256 entries, by default) and, in particular, if there is a real-time process waiting to run, the tail end of the list is delegated over to krcud and control is returned to the scheduler.

Dipankar reports good results in his tests, with overall system latencies of less than 400 microseconds. He's not pushing this patch for inclusion yet; it needs more testing first. But, if things pan out, a faster-responding 2.6 kernel may result in the near future.

Comments (8 posted)

Keeping printk() under control

Log messages from the kernel can often be an indispensable aid in tracking down problems or generally figuring out what is going on inside the system. As most system administrators find out sooner or later, however, kernel logging can also become a problem in its own right. If a situation develops which causes the kernel to continually spew out logging information, disks can fill up and log messages can be lost. What can be worse, however, is when log messages sent to the console cause the kernel to spend all of its time just scrolling the console frame buffer. In this case, the system can become completely unresponsive. The logging code already tries to mitigate this problem by detecting and suppressing streams of identical messages. That simple mechanism breaks down, however, when the messages being logged differ from each other.

As a way of improving the situation, Anton Blanchard has put together a new rate limiting scheme which has found its way into the -mm patch tree. This code, which is derived from a rate limiting mechanism used in the networking subsystem, does not automatically solve the problem, since it requires explicit changes to code which could generate message floods. Such code is often easy to identify, however, and easy to fix.

The patch adds a new function:

    int printk_ratelimit(void);

Code which could generate lots of messages should call printk_ratelimit() and only call printk() if the return value is nonzero. Thus, printk_ratelimit() returns a failure status if rate limiting is currently in effect and printk() output should be avoided.

By default, the code limits messages to one every five seconds. It will, however, allow ten messages through in a short period before the rate limiting clamps down on the rest. These values are, of course, tuneable via sysctl parameters.

A mechanism like this is only useful if it is used throughout the code. Core kernel code can be fixed up relatively easily; the patch includes a fix for the page allocator, for example. The source of message floods, however, is often a driver which want to be sure that its "my device has joined the Dark Side" messages are heard. Fixing all of those is a daunting task, but even a partial solution leaves the kernel less susceptible to this particular problem than before.

Comments (6 posted)

Linus Torvalds Linux-2.6.1 ?

Andrew Morton 2.6.1-mm1 ?

Andrew Morton 2.6.1-mm2 ?

Andrew Morton 2.6.1-mm3 ?

Martin J. Bligh 2.6.1-mjb1 ?

Matt Mackall 2.6.1-tiny1 tree for small systems ?

Randy.Dunlap 2.6.1-kj1 patchset ?

Linus Torvalds 2.6.1-rc3 ?

Martin J. Bligh 2.6.1-rc3-mjb1 (newish: kexec, ALSA fixes, v4l2, ivtv) ?

Randy.Dunlap 2.6.1-rc3-kj1 patchset ?

Andrew Morton 2.6.1-rc2-mm1 ?

Adrian Bunk better i386 CPU selection ?

Adrian Bunk move "struct movsl_mask movsl_mask" to usercopy.c ?

Adrian Bunk proof of concept: make arch/i386/kernel/cpu/Makefile CPU specific ?

Adrian Bunk proof of concept: make arch/i386/kernel/cpu/mtrr/Makefile CPU specific ?

Stephen D. Williams User Mode Linux (UML) Host Skas3 patch for Linux Kernel 2.6.1 2004-01-09 ?

Jeff Dike uml-patch-2.6.0 ?

Dipankar Sarma RCU for low latency [0/1] ?

Dipankar Sarma RCU for low latency [1/2] ?

Dipankar Sarma RCU for low latency [2/2] ?

Nigel Cunningham Is this too ugly to merge? ?

Bart Samwel Laptop-mode v7 for linux 2.6.1 ?

IWAMOTO Toshihiro a new version of memory hotremove patch ?

Inaky Perez-Gonzalez FUSYN Realtime & Robust mutexes for Linux try 2.1 ?

Robert Williamson Linux Test Project January Release Announcement ?

Amit S. Kale kgdb 2.0.1 for kernel 2.6.1 ?

Mikael Pettersson perfctr-2.6.4 released with PPC32 support ?

Greg KH sysfs input class patch - [1/1] ?

Jaroslav Kysela ALSA 1.0.1 ?

Bernhard Kuhn real-time interrupts for the Linux kernel ?

James Simmons New FBDev patch ?

Mukker, Atul ANNOUNCE: megaraid driver version 2.10.1 ?

Jeff Garzik experimental net driver queue updated ?

Scott Long Proposed enhancements to MD ?

Jesper Juhl stronger ELF sanity checks v2 ?

Hirokazu Takahashi dynamic allocation of huge continuous pages ?

Jeff Dike /dev/anon ?

Greg KH udev 013 release ?

Kernel development

Brief items

Kernel release status

Kernel development news

Kernel page editor Down Under

Read-copy-update and interrupt latency

Keeping printk() under control

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Janitorial

Memory management

Miscellaneous