Release status
Kernel release status
The current 2.6 prepatch remains 2.6.16-rc1. A handful of fixes has
appeared in the mainline git repository, including a few new features (see
below).
The current -mm release is 2.6.16-rc1-mm3. Recent changes
to -mm include more semaphore-to-mutex conversions, two-column stack
backtraces on i386 (to make oops traces fit on one screen), various memory
management tweaks, the SMP
alternatives patch, and lots of fixes.
Comments (none posted)
Kernel development news
Quotes of the week
The Linux kernel is under the GPL version 2. Not anything else. Some
individual files are licenceable under v3, but not the kernel in general.
And quite frankly, I don't see that changing. I think it's insane to
require people to make their private signing keys available, for example.
I wouldn't do it. So I don't think the GPL v3 conversion is going to
happen for the kernel, since I personally don't want to convert any of my
code.
-- Linus Torvalds
I am against personal attacks and this is the first time where it
tooks more than a day before LKML people started with personal
attacks against me. So in principle this is some sort of progress
compared to former times.
-- Joerg Schilling
Comments (11 posted)
The 2.6.16 straggler list
The release of 2.6.16-rc1 was supposed to signal the closing of the window
for new features. For the most part, things have happened that way. A few
additional features did find their way in after 2.6.16-rc1 came out,
though. Here is a quick list.
- The work of making the slab allocator smarter on NUMA machines
continues. In previous versions of the kernel, slab allocations
made during the bootstrap process would all end up on the boot node,
causing an imbalance across the NUMA system. It was also possible for
processes with non-default memory allocation policies to "contaminate"
allocations for other processes. The 2.6.16 slab allocator will make
more explicit decisions about just how allocations should be performed
to spread out boot-time allocations and to ensure that each process
gets the allocation policy it asked for.
- NUMA systems can also perform memory reclamation on individual memory
zones, on the theory that forcing out pages can be cheaper than
allocating non-local pages.
- A number of new system calls, including openat() and friends,
ppoll(), and pselect(), have been merged. These
calls were discussed here last December.
- Perhaps the biggest late addition is the EDAC ("error detection and
correction") subsystem. The purpose of the EDAC code is to watch for
errors in the operation of the system and to scream when they are
detected. EDAC, as merged, is oriented mainly toward memory errors.
It will poll the memory controllers (drivers for a few families of
controllers have been merged) on a regular basis for both correctable
and uncorrectable errors. Log messages can be generated for both
types of errors, and there is a sysfs interface as well. Optionally,
the EDAC code can be told to immediately panic the system on an
uncorrectable error; in this way, it is hoped, uncorrectable errors
will not lead to data corruption elsewhere in the system.
One assumes that uncorrectable errors will be rare, however. The real
intent is to allow administrators to see when significant numbers of
correctable errors are being detected. Since those errors will often
degrade, over time, into uncorrectable problems, the presence of
correctable errors is a strong indication that the affected memory
bank should be replaced.
The EDAC code can also watch for parity errors on the system's PCI
buses. Getting good information from the PCI subsystem can be harder,
however, since, apparently, some vendors do not follow the specs when
it comes to the generation of parity information.
For more information on EDAC, including details on the sysfs interface, see drivers/edac/edac.txt in the current
mainline documentation directory.
At this point, the 2.6.16 merge window can truly be considered closed; the
feature set for this release is probably complete.
Comments (none posted)
Review: Understanding Linux Network Internals
The
net/ directory tree in the Linux kernel source is an
intimidating place. We all use the kernel's networking features, but even
experienced kernel hackers often hesitate to wander into the code which
implements those features. To many, the networking stack is a black box,
maintained by a distinct set of developers who keep many of their secrets to
themselves. There is little documentation on how Linux networking is
implemented, adding to the challenge of understanding how it all works.
Your editor had been told that O'Reilly had a book on the networking stack
- a sort of companion to Understanding The Linux Kernel - in the
works. But it was still a nice surprise to see the end result - a book by Christian Benvenuti
entitled Understanding Linux Network Internals - show up on the
doorstep. A couple of weeks later, after having read much of the book,
your editor is ready to share some comments. The short version would be: this
book is a welcome addition to the (short) list of books about the kernel.
It is not as good a book as it could have been, however, and leaves some
significant gaps.
Let's get one pet peeve out of the way immediately: any kernel book
should disclose, on the cover, which version of the kernel is
covered. As LWN readers know well, things change quickly in the kernel. A
book which covers one version will likely be obsolete in many places a few
versions later. If a kernel book does not include version information,
there is no way to know which reality it matches or whether it will be even
remotely relevant to current kernels.
In the case of this book, there is no word anywhere regarding which version
is covered. It is clearly a 2.6 book, but that is all we know. Your
editor has come to the conclusion from his reading that the book was a long
time in the writing (not surprising: the subject matter is complex, and the
book is over 1,000 pages long), and that, if an effort was made to make it
consistently current for a specific kernel version, that effort was
incomplete. The section on interrupts, for example, presents the old
prototype for interrupt handlers last seen in the 2.5.68 kernel. Other
parts are much more current. The book is a bit of a patchwork in that
regard.
And in other regards as well. Some parts of the book seem to want to be a
programming manual - to the point that the slab cache functions
(kmem_cache_create() and friends) are presented on page 4.
Page 13 talks about the likely() and unlikely()
constructs. Yet, in other areas, detail is much more scarce, and there is
no complete discussion of how to write code for the kernel. And (another
pet peeve of your editor's) the issues of concurrency and race conditions
are passed over almost completely.
Similarly, the section on network device drivers offers a great deal of information on
device registration, queueing discipline bits, notifiers, power management,
ethtool, dealing with the PCI bus, module initialization, and more. There
is even a section on how bottom halves worked in the 2.2 kernel. But there
is almost no information on how to write transmit and receive functions.
At one point the author writes "This chapter does not strive to be a
guide on how to write NIC device drivers." No problem, there are
(ahem) other books which cover that ground. But then why bother with
things like PCI device registration?
This book does contain a great deal of information. It may pass over
driver transmit and receive functions, but it does cover packet
transmission and reception in the higher levels of the networking stack in
some detail - and that is just what one would want. There is a long
section on IPv4 and ICMP, and quite a bit of information on the complicated
"neighbor" code (the ARP protocol and such). The last major section is on
routing. Stuffed into the middle is a 110-page section on the bridging
subsystem.
Networking is a large area, and a large part of the kernel, so it is hard
to cover everything even in a 1000-page book. So some important things
were left out of Understanding Linux Network Internals. These
include TCP, IPv6, IPsec, netfilter, traffic control, and several other
topics. And that leads to your editor's last, and perhaps biggest
complaint. The inconsistent focus and somewhat irregular choice of topics
seen at the lower levels is also present in the large scale. Your editor
would have happily traded the four chapters on bridging for a solid
overview of how the TCP protocol works in Linux, and your editor suspects
that he is not alone. Netfilter and traffic control, perhaps, merit a book
of their own, but maybe some of the other chapters could have been
tightened up enough to make room for an introduction to IPv6 or IPsec.
So it is hard to recommend this book in an unreserved fashion. That said,
there is a great deal of useful information to be found in Understanding
Linux Network Internals, and your editor is glad to have it on his
bookshelf. It has already come in useful a couple of times while trying to
figure out how parts of networking-related patches work. So this book is a
welcome addition to the body of kernel-related documentation, even if it is
not everything one might wish it would be.
Comments (2 posted)
MD / DM
The Linux software RAID code (often called "MD" for "multi-device") is a
longstanding feature of the kernel. RAID users appreciate its robustness,
configurability, and the fact that it performs well; better performance
than that achieved with hardware RAID controllers is not unheard of. In
recent years, little has been heard about the MD code, however. Its feature set has
changed slowly, and developments with the device mapper code have taken a
higher profile. That, perhaps, is as it should be; a storage subsystem
which attracts attention is rarely a good thing.
That said, MD hacker Neil Brown has been busy. His latest patch set
implements RAID5 reshaping:
the ability to add devices to a RAID5 array without going through a backup
and restore cycle - or even shutting the array down. This is a nontrivial
task; adding a drive to a RAID5 array requires redistributing data and
parity blocks across the entire array. With this version of the patch,
Linux MD can not only perform this task, but it can do it while still
handling normal I/O to the array. The new patch also checkpoints the
process, so that it can be restarted if interrupted in the middle; this
corrects a minor defect in the previous version, wherein interrupting the
reshaping task would cause all data in the array to be lost.
Neil notes that things could still go wrong:
There is still a small window ( < 1 second) at the start of the
reshape during which a crash will cause unrecoverable corruption.
My plan is to resolve this in mdadm rather than md. The critical
data will be copied into the new drive(s) prior to commencing the
reshape. If there is a crash the kernel will refuse the reassemble
the array. mdadm will be able to re-assemble it by first restoring
the critical data and then letting the remainder of the reshape run
it's course.
Neil has various other enhancements in mind, including the ability to upgrade
a RAID5 array to RAID6 (which increases fault tolerance by adding another
set of parity blocks). Quite a bit, clearly, is happening in the MD world.
All this activity drew queries from a couple of observers who had, it
seems, assumed that the addition of the device mapper to the kernel meant
that the MD code would eventually whither away. The device mapper can
handle some of the lower RAID levels (mirroring and striping) now, and
there is work in progress to add RAID5 support. Since the device mapper is
a general framework for mixing and matching drives, it makes sense to some
that the RAID functionality should move there too.
Unsurprisingly, Neil disagrees. His
suggestion is that "anything with redundancy," including RAID5 and RAID6,
is best handled in the MD code. The device mapper, instead, is good for
fancier arrangements like multipath, encryption, volume management,
snapshots, etc. Certainly, those who are placing trust in RAID for
redundancy should be comforted by the rather longer track record built up
by the MD code. MD is also said to be faster than the device mapper at
this time.
As others have pointed out, however, there is a cost to carrying multiple
RAID implementations in the kernel. Each must be maintained, and each will
have its own unique bugs to contribute to the whole. So, as the device
mapper develops higher-level RAID capabilities, it would be nice if some of
the core code could be shared between MD and DM. Making that happen,
however, will require developer effort - and it's not clear that any
hackers are interested in doing that work at this time.
Comments (25 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
- Junio C Hamano: GIT 1.1.4.
(January 20, 2006)
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Architecture-specific
Page editor: Jonathan Corbet
Next page: Distributions>>