User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.20, released by Linus on February 4, otherwise known as Super Kernel Sunday. There's a bunch of new stuff in 2.6.20, including paravirt_ops and KVM, lots of new drivers (including your editor's OLPC camera controller driver), the UDP-Lite protocol, Playstation 3 support, and more. See the short-form changelog for details, the long-format changelog for more details, the LWN 2.6 API changes page for a summary of internal API differences, or the KernelNewbies Linux Changes page for lots more information.

The patches for the 2.6.21 merge have just begun to find their way into the mainline git repository as of this writing. A number of architecture updates have been merged, along with a GFS2 patch set.

There have been no -mm tree releases over the last week.

For older kernels: was released on February 5. It contains quite a long list of fixes. The -stable team had originally intended not to release any more 2.6.18 updates. It seems that there are some fixes for that kernel which are worth distributing, however, so one more 2.6.18.x release can be expected in the near future.

Adrian Bunk has released with a relatively small number of fixes.

For 2.4 users, Willy Tarreau has released with only three patches.

Comments (3 posted)

Kernel development news

Quote of the week

Pretty simple: you read the largely-useless changelog then call the bravely uncommented blk_plug_current() when you're about to submit some IO and you call the audaciously uncommented blk_unplug_current() when you've finished and you're ready to let it rip.

-- Andrew Morton

Comments (none posted)

Kernel fibrillation

Last week's article on fibrils caught the discussion in a relatively early state. That discussion is still in an early state, but some interesting ground has been covered. Here, we'll catch up on a few themes from that conversation.

Alan Cox has requested that the "fibril" name be dumped:

The constructs Zach is using appear to be identical to co-routines, and they've been called that in computer science literature for fifty years. They are one of the great and somehow forgotten ideas.

Alan also points out that a number of hazards lie between the current state of the fibril patch and anything robust enough for the mainline kernel - but everybody involved already knew that. Linus acknowledges the similarities with coroutines, but also maintains that they are sufficiently different to merit their own name. A full coroutine implementation in the kernel, he says, would be impractical.

Linus has also responded to Ingo Molnar's criticisms of the fibril concept. He maintains that the real benefits to fibrils are (1) the elimination of the separate code paths currently associated with asynchronous I/O, and (2) reductions in setup and teardown costs. The latter is significant, he says, because the bulk of asynchronous operations can actually be satisfied from cache; being able to run those operations without going through the full AIO setup would be a big win.

Ingo has clarified his comments somewhat. The stumbling point seems to be the addition of a new scheduling concept which, he thinks, is not necessary. He has proposed alternatives which take the form of a pool of kernel threads; rather than create a fibril, a blocking system call could simply switch to another kernel thread which is there waiting for just that occasion. Ingo believes that kernel threads perform well enough to handle this task, and they could be made lighter; in addition, the use of kernel threads would allow asynchronous calls to spread across a multi-CPU system. Fibrils, instead, are currently limited to a single processor. Zach Brown, the creator of the fibril patchset, seems to think that the idea is at least worth a try. Linus, instead, has said that any adaptation of kernel threads to this task would end up looking a lot like fibrils anyway. Rather than bear the expense of keeping a (potentially large) pool of kernel threads around, one might as well just create a truly lightweight object - a fibril.

Some discussion of the eventual user-space API has occurred. Linus has suggested that the asynchronous submission call look something like this:

    long async_submit(unsigned long flags, long *result_pointer,
                      long syscall_number, unsigned long *args);

The role of the flags argument has not really been discussed; one just assumes such an argument will be necessary, sooner or later. The result_pointer argument tells the kernel where to put the result of the operation. Interestingly, the result code would follow the in-kernel conventions: zero for success or a negative error code for failure. While the operation is outstanding, the kernel would store a positive "cookie" value which could be used by the application to wait for (or cancel) the call.

The wait_for_async() system call remains for applications wanting to get the completion status of their asynchronous operations. There have been a couple of requests, however, for a mechanism by which applications could obtain completion status without having to go back into the kernel. That inspired David Miller to complain about a big part of the conversation which is not happening: the integration with the kevent patches. Much of the kevent work has been aimed at solving just this problem, but Evgeniy Polyakov continues to have trouble getting people to look at it. To a great extent, wait_for_async() is another event interface. It seems unlikely that the kernel needs two of them.

What does all this work bode for the existing asynchronous I/O interface, and, in particular, the buffered filesystem AIO patches which have not yet been merged? Seeking to fend off doubt about the future of that interface, Suparna Bhattacharya has argued that the buffered AIO patches should still be merged:

Since this is going to be a new interface, not the existing linux AIO interface, I do not see any conflict between the two. Samba4 already uses fsaio, and we now have the ability to do POSIX AIO over kernel AIO (which depends on fsaio). The more we delay real world usage the longer we take to learn about the application patterns that matter. And it is those patterns that are key.

Decision time will be soon, since the buffered AIO patches seem to be ready for merging into 2.6.21. Over the next couple of weeks, somebody will have to decide whether to merge those patches - and maintain them indefinitely - or hold off with the idea that fibrils will evolve into the preferred solution.

Finally, Bert Hubert noted that DragonFly BSD had an asynchronous system call interface - until last July, when the developers pulled it out. DragonFly had created two system calls - sendsys2() and waitsys2() - which split up the tasks of initiating a system call and waiting for its completion. A followup suggests that DragonFly BSD had taken a different approach, requiring that every system call have asynchronous support built into it. In that sense, their asynchronous interface looked like a more general version of Linux AIO.

Pushing asynchronous support down into system calls, filesystems, and device drivers brings a lot of complexity; the slow progress of Linux AIO illustrates just how hard it can be. One of the major advantages of the fibril idea is that (with few exceptions) the system calls do not have to be changed; they do not need to be aware of asynchronous operation at all. The ability to pull asynchronous support into a relatively small chunk of core kernel code may be the key idea that sells the entire fibril concept.

Comments (3 posted)

Review: Linux Kernel in a Nutshell

Once upon a time, the ability to download, compile, and install a new kernel was a vital skill for any Linux system administrator. That skill is less in demand now; the kernels shipped with most distributions tend to be adequate for most needs. Still, there comes a time, even for those who do not hack on the kernel itself, when a system needs a custom kernel. Many system administration books devote a bit of space to this task, but they tend to pass over it fairly quickly. Configuring, building, and installing a kernel remains a relatively dark art for many.

Kernel hacker Greg Kroah-Hartman decided to do something about it; the result is Linux Kernel in a Nutshell, published by O'Reilly. By the standards of other kernel books from that publisher, this is a thin volume indeed: just over 180 pages, including the index. But it is packed with information that should be useful to just about anybody who has to deal with the kernels on their systems.

The early chapters cover some of the basics: what tools are required, where to get the kernel source, etc. There is a chapter on the various ways of configuring a kernel. Your editor remembers the days of configuring kernels by stepping through the entire "make config" process; it's nice to see Greg recommending against that approach now. The build process is discussed, as are the necessary steps for installing the kernel once it's built.

The second major part of the book discusses customizations - in particular, enabling support for a device. The process for determining which driver should be enabled for a specific device is distressingly hairy; it involves listing out the PCI bus configuration, digging through sysfs, then trying to find a match in the kernel source. It's not for nothing that Greg says:

The easiest way to figure out which driver controls a new device is to build all of the different drivers of that type in the kernel source tree as modules, and let the udev startup process match the driver to the device.

As they say, there really should be a better way. But one can't fault Greg for telling it like it is.

Next there is a set of "kernel configuration recipes" for enabling specific behavior. The advice here is terse, sometimes to a fault. The discussion on enabling kernel preemption, for example, could have benefited from a mention of the reliability concerns which have kept most distributors from turning preemption on. Similarly, it talks about how to enable SELinux with no mention of the need for an accompanying policy loaded from user space. The audience for this book seems likely to include quite a few people from the "know just enough to hurt themselves" population; a few more hints might have proved most helpful to those readers.

The final section, making up almost half of the book, is devoted to reference material. There is an extensive list of kernel command line parameters and what they do - though the treatment is, once again, terse. There is a useful chapter on the various make targets and options for the kernel; somehow your editor had managed to avoid learning about make randconfig until now. There is also a reference chapter for configuration options. This chapter is incomplete, however, and the options do not appear to be listed in any particular order.

Minor grumbles aside, there is value in this book's conciseness. When faced with a question about kernel configuring, building, or booting, this book is likely to yield an answer without forcing the reader to search for a needle in an 800-page haystack. It covers an area which was very much in need of some improved documentation; it is also reasonably up to date, having been written for the 2.6.18 kernel. Happily, Greg has made the book available online. Overall, Linux Kernel in a Nutshell is a more than welcome addition to your editor's bookshelf.

Comments (2 posted)

Priority-Boosting RCU Read-Side Critical Sections

Read-copy update (RCU) is a synchronization API that is sometimes used in place of reader-writer locks. RCU's read-side primitives offer extremely low overhead and deterministic execution time. These properties imply that RCU updaters cannot block RCU readers, which means that RCU readers can be expensive, as they must leave old versions of the data structure in place to accommodate pre-existing readers. Furthermore, these old versions must be reclaimed after all pre-existing readers complete. The Linux kernel offers a number of RCU implementations, the first such implementation being called "Classic RCU".

The RCU implementation for the -rt patchset is unusual in that it permits read-side critical sections to be blocked waiting for locks and due to preemption. If these critical sections are blocked for too long, grace periods will be stalled, and the amount of memory awaiting the end of a grace period will continually increase, eventually resulting in an out-of-memory condition. This theoretical possibility was apparent from the start, but when Trevor Woerner actually made it happen, it was clear that something needed to be done. Because priority boosting is used in locking, it seemed natural to apply it to realtime RCU.

Unfortunately, the priority-boosting algorithm used for locking could not be applied straightforwardly to RCU because this algorithm uses locking, and the whole point of RCU is to avoid common-case use of such heavy-weight operations in read-side primitives. In fact, RCU's read-side primitives need to avoid common-case use of all heavyweight operations, including atomic instructions, memory barriers, and cache misses. Therefore, bringing priority boosting to RCU turned out to be rather challenging, not because the eventual solution is all that complicated, but rather due to the large number of seductive but subtly wrong almost-solutions.

This document describes a way of providing light-weight priority boosting to RCU, and also describes several of the number of seductive but subtly wrong almost-solutions.


This paper describes three approaches to priority-boosting blocked RCU read-side critical sections. The first approach minimizes scheduler-path overhead and uses locking on non-fastpaths to decrease complexity. The second approach is similar to the first, and was in fact a higher-complexity intermediate point on the path to the first approach. The third approach uses a per-task lock solely for its priority-inheritance properties, which introduces the overhead of acquiring this lock into the scheduler path, but avoids adding an "RCU boost" component to the priority calculations. Unfortunately, this third approach also cannot be made to reliably boost tasks blocked in RCU read-side critical sections, so the first approach should be used to the exclusion of the other two. Each of these approaches is described in a following section, after which is a section enumerating other roads not taken.

[ Editor's note: this article is long - but worth the read. Please go to the full article text to learn more about this technique.]

Comments (2 posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

Device drivers

Filesystems and block I/O


Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds