Brief items
The current 2.6 kernel is 2.6.20,
released by Linus on
February 4, otherwise known as Super Kernel Sunday. There's a bunch
of new stuff in 2.6.20, including
paravirt_ops and KVM, lots of
new drivers (including your editor's OLPC camera controller driver), the
UDP-Lite
protocol, Playstation 3 support, and more. See
the short-form changelog for details, the
long-format
changelog for more details, the
LWN
2.6 API changes page for a summary of internal API differences, or the
KernelNewbies Linux Changes
page for lots more information.
The patches for the 2.6.21 merge have just begun to find their way into the
mainline git repository as of this writing. A number of architecture
updates have been merged, along with a GFS2 patch set.
There have been no -mm tree releases over the last week.
For older kernels: 2.6.19.3 was released on
February 5. It contains quite a long list of fixes. The -stable team
had originally intended not to release any more 2.6.18 updates. It seems
that there are some fixes for that kernel which are worth distributing,
however, so one more 2.6.18.x release can be expected in the near future.
Adrian Bunk has released 2.6.16.40-rc1 with a relatively
small number of fixes.
For 2.4 users, Willy Tarreau has released 2.4.34.1 with only three
patches.
Comments (3 posted)
Kernel development news
Pretty simple: you read the largely-useless changelog then call the
bravely uncommented blk_plug_current() when you're about to submit
some IO and you call the audaciously uncommented
blk_unplug_current() when you've finished and you're ready to let
it rip.
-- Andrew Morton
Comments (none posted)
Last week's article on
fibrils caught the discussion in a relatively early state. That
discussion is
still in an early state, but some interesting ground
has been covered. Here, we'll catch up on a few themes from that
conversation.
Alan Cox has requested that the "fibril" name
be dumped:
The constructs Zach is using appear to be identical to co-routines,
and they've been called that in computer science literature for
fifty years. They are one of the great and somehow forgotten ideas.
Alan also points out that a number of hazards lie between the current state
of the fibril patch and anything robust enough for the mainline kernel -
but everybody involved already knew that. Linus acknowledges the similarities with coroutines,
but also maintains that they are sufficiently different to merit their own
name. A full coroutine implementation in the kernel, he says, would be
impractical.
Linus has also responded to Ingo Molnar's
criticisms of the fibril concept. He maintains that the real benefits to
fibrils are (1) the elimination of the separate code paths currently
associated with asynchronous I/O, and (2) reductions in setup and
teardown costs. The latter is significant, he says, because the bulk of
asynchronous operations can actually be satisfied from cache; being able to
run those operations without going through the full AIO setup would be a
big win.
Ingo has clarified his comments somewhat. The stumbling point seems to be
the addition of a new scheduling concept which, he thinks, is not
necessary. He has proposed alternatives which take the form of a pool of
kernel threads; rather than create a fibril, a blocking system call could
simply switch to another kernel thread which is there waiting for just that
occasion. Ingo believes that kernel threads
perform well enough to handle this task, and they could be made lighter; in
addition, the use of kernel threads would allow asynchronous calls to
spread across a multi-CPU system. Fibrils, instead, are currently limited to a
single processor. Zach Brown, the creator of the fibril patchset, seems to
think that the idea is at least worth a try. Linus, instead, has said that any adaptation of kernel threads to
this task would end up looking a lot like fibrils anyway. Rather than bear
the expense of keeping a (potentially large) pool of kernel threads around,
one might as well just create a truly lightweight object - a fibril.
Some discussion of the eventual user-space API has occurred. Linus has suggested that the asynchronous submission
call look something like this:
long async_submit(unsigned long flags, long *result_pointer,
long syscall_number, unsigned long *args);
The role of the flags argument has not really been discussed; one
just assumes such an argument will be necessary, sooner or later. The
result_pointer argument tells the kernel where to put the result
of the operation. Interestingly, the result code would follow the
in-kernel conventions: zero for success or a negative error code for
failure. While the operation is outstanding, the kernel would store a
positive "cookie" value which could be used by the application to wait for
(or cancel) the call.
The wait_for_async() system call remains for applications wanting
to get the completion status of their asynchronous operations. There have
been a couple of requests, however, for a mechanism by which applications
could obtain completion status without having to go back into the kernel.
That inspired David Miller to complain
about a big part of the conversation which is not happening: the
integration with the kevent
patches. Much of the kevent work has been aimed at solving just this
problem, but Evgeniy Polyakov continues to have trouble getting people to
look at it. To a great extent, wait_for_async() is another event
interface. It seems unlikely that the kernel needs two of them.
What does all this work bode for the existing asynchronous I/O interface,
and, in particular, the buffered
filesystem AIO patches which have not yet been merged? Seeking to fend
off doubt about the future of that interface, Suparna Bhattacharya has argued that the buffered AIO patches should still
be merged:
Since this is going to be a new interface, not the existing linux
AIO interface, I do not see any conflict between the two. Samba4
already uses fsaio, and we now have the ability to do POSIX AIO
over kernel AIO (which depends on fsaio). The more we delay real
world usage the longer we take to learn about the application
patterns that matter. And it is those patterns that are key.
Decision time will be soon, since the buffered AIO patches seem to be ready
for merging into 2.6.21. Over the next couple of weeks, somebody will have
to decide whether to merge those patches - and maintain them indefinitely -
or hold off with the idea that fibrils will evolve into the preferred
solution.
Finally, Bert Hubert noted that DragonFly
BSD had an asynchronous system call interface - until last July, when the
developers pulled it out. DragonFly had created two system calls -
sendsys2() and waitsys2() - which split up the tasks of
initiating a system call and waiting for its completion. A followup suggests that DragonFly BSD had taken
a different approach, requiring that every system call have asynchronous
support built into it. In that sense, their asynchronous interface looked
like a more general version of Linux AIO.
Pushing asynchronous support down into system calls, filesystems, and
device drivers brings a lot of complexity; the slow progress of Linux AIO
illustrates just how hard it can be. One of the major advantages of the
fibril idea is that (with few exceptions) the system calls do not have to
be changed; they do not need to be aware of asynchronous operation at all.
The ability to pull asynchronous support into a relatively small chunk of
core kernel code may be the key idea that sells the entire fibril concept.
Comments (3 posted)
Once upon a time, the ability to download, compile, and install a new
kernel was a vital skill for any Linux system administrator. That skill is
less in demand now; the kernels shipped with most distributions tend to be
adequate for most needs. Still, there comes a time, even for those who do
not hack on the kernel itself, when a system needs a custom kernel. Many
system administration books devote a bit of space to this task, but they

tend to pass over it fairly quickly. Configuring, building, and installing
a kernel remains a relatively dark art for many.
Kernel hacker Greg Kroah-Hartman decided to do something about it; the
result is Linux Kernel in a Nutshell, published by O'Reilly. By the
standards of other kernel books from that publisher, this is a thin volume
indeed: just over 180 pages, including the index. But it is packed with
information that should be useful to just about anybody who has to deal
with the kernels on their systems.
The early chapters cover some of the basics: what tools are required, where
to get the kernel source, etc. There is a chapter on the various ways of
configuring a kernel. Your editor remembers the days of configuring
kernels by stepping through the entire "make config" process; it's nice to
see Greg recommending against that approach now. The build process is
discussed, as are the necessary steps for installing the kernel once it's
built.
The second major part of the book discusses customizations - in particular,
enabling support for a device. The process for determining which driver
should be enabled for a specific device is distressingly hairy; it involves
listing out the PCI bus configuration, digging through sysfs, then trying
to find a match in the kernel source. It's not for nothing that Greg says:
The easiest way to figure out which driver controls a new device is
to build all of the different drivers of that type in the kernel
source tree as modules, and let the udev startup process match the
driver to the device.
As they say, there really should be a better way. But one can't fault Greg
for telling it like it is.
Next there is a set of "kernel configuration recipes" for enabling specific
behavior. The advice here is terse, sometimes to a fault. The discussion
on enabling kernel preemption, for example, could have benefited from a
mention of the reliability concerns which have kept most distributors from
turning preemption on. Similarly, it talks about how to enable SELinux with
no mention of the need for an accompanying policy loaded from user space.
The audience for this book seems likely to include quite a few people from
the "know just enough to hurt themselves" population; a few more hints
might have proved most helpful to those readers.
The final section, making up almost half of the book, is devoted to
reference material. There is an extensive list of kernel command line
parameters and what they do - though the treatment is, once again, terse.
There is a useful chapter on the various make targets and options
for the kernel; somehow your editor had managed to avoid learning about
make randconfig until now. There is also a reference chapter
for configuration options. This chapter is incomplete, however, and the
options do not appear to be listed in any particular order.
Minor grumbles aside, there is value in this book's conciseness. When
faced with a question about kernel configuring, building, or booting, this
book is likely to yield an answer without forcing the reader to search for
a needle in an 800-page haystack. It covers an area which was very much in
need of some improved documentation; it is also reasonably up to date,
having been written for the 2.6.18 kernel. Happily, Greg has
made the book available online.
Overall, Linux Kernel in a Nutshell is a more than welcome addition
to your editor's bookshelf.
Comments (2 posted)
| February 5, 2007 |
| This article was contributed by Paul McKenney |
Read-copy update (RCU) is a synchronization API that is sometimes used
in place of reader-writer locks. RCU's read-side primitives offer
extremely low overhead and deterministic execution time.
These properties imply that RCU updaters cannot block RCU readers,
which means that RCU readers can be expensive, as they must leave
old versions of the data structure in place to accommodate pre-existing
readers.
Furthermore, these old versions must be reclaimed after all pre-existing
readers complete.
The Linux kernel offers a number of RCU implementations, the first
such implementation being called "Classic RCU".
The RCU implementation for the -rt patchset is unusual in that
it permits read-side critical
sections to be blocked waiting for locks and due to preemption.
If these critical sections are blocked for too long,
grace periods will be stalled,
and the amount of memory awaiting the end of a grace
period will continually increase, eventually resulting
in an out-of-memory condition.
This theoretical possibility was apparent from the start,
but when Trevor Woerner actually made it happen, it was
clear that something needed to be done.
Because priority boosting is used in locking, it seemed natural to
apply it to realtime RCU.
Unfortunately, the priority-boosting algorithm used for locking
could not be applied straightforwardly to RCU because this
algorithm uses locking, and the whole point of RCU is to
avoid common-case use of such heavy-weight operations
in read-side primitives.
In fact, RCU's read-side primitives need to avoid common-case
use of all
heavyweight operations, including atomic instructions,
memory barriers, and cache misses.
Therefore, bringing priority boosting to RCU turned out to
be rather challenging, not because the eventual solution is
all that complicated, but rather due to the large number of
seductive but subtly wrong almost-solutions.
This document describes a way of providing light-weight
priority boosting to RCU, and also describes several of the
number of seductive but subtly wrong almost-solutions.
Approaches
This paper describes three approaches to priority-boosting blocked RCU
read-side critical sections.
The first approach minimizes scheduler-path overhead and uses locking
on non-fastpaths to decrease complexity.
The second approach is similar to the first, and was in fact a
higher-complexity intermediate point on the path to the first approach.
The third approach uses a per-task lock solely for its priority-inheritance
properties, which introduces the overhead of acquiring this lock into
the scheduler path, but avoids adding an "RCU boost" component to the
priority calculations.
Unfortunately, this third approach also cannot be made to reliably
boost tasks blocked in RCU read-side critical sections, so the first
approach should be used to the exclusion of the other two.
Each of these approaches is described in a following section,
after which is a section enumerating other roads not taken.
[ Editor's note: this article is long - but worth the read. Please
go to the full article text
to learn more about this technique.]
Comments (2 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>