The current 2.6 kernel is 2.6.2-rc1
, which was announced
by Linus on January 20.
A massive set of patches was merged into this release;
included therein is a new Qlogic SCSI driver, a bunch of USB work,
infrastructural work to better support hotplug block devices, several
architecture updates, some I/O scheduler work, a rework of the
PCMCIA drivers, sysfs support for several new types of devices, an XFS
update, and much more. See the long-format
for the details.
The latest kernel from Andrew Morton, as of this writing, is 2.6.1-mm5. Recent additions to the -mm tree
include a working modular IDE implementation, improved x86 CPU type
selection options, a user-mode Linux update, and many other fixes.
The current 2.4 kernel is 2.4.24. Marcelo released 2.4.25-pre5 on January 15; a "deadly
mistake" there forced the release of 2.4.25-pre6 one day later. The 2.4.25
prepatches have been getting steadily smaller; there may be a release
candidate coming in the near future.
Comments (2 posted)
Kernel development news
Well, you don't get to be a kernel hacker simply by looking good in
-- Rusty Russell
Comments (5 posted)
Peter Chubb works with the Gelato
project, which works toward better Linux performance on the IA-64
architecture. Among other things, Peter is responsible for the 64-bit
sector support which went into the 2.5 kernel. At Linux.Conf.Au, Peter
discussed device drivers. He pointed out that drivers, while making up roughly
50% of the code in the kernel, are responsible for 85% of all kernel bugs.
Drivers tend to be written by people who would not normally be considered
kernel hackers: hardware engineers, for example. These people tend to have
a hard time dealing with the special nature of kernel programming, where
interfaces are fluid, bugs are lethal, and many normal development tools
are not available.
Driver authors - and their users - might have a much easier time if
drivers could be written to run in user space. In addition to mitigating
the above-mentioned kernel programming issues, user-space driver
development would allow the creation of a stable ABI; it also, presumably,
would eliminate any licensing issues associated with closed-source
drivers. User-space driver writers could also use any language they
choose, "even Python."
Peter and company have set out to make user-space drivers possible. Some
of the necessary pieces are already in place. Standard Linux will allow a
suitably privileged process to access I/O ports, for example. Low-address
memory-mapped I/O registers can be accessed via a mmap() of
/dev/mem. There is also an interface which gives user-space
processes access to the PCI configuration space; this interface works via
ioctl() calls on /proc files, though, thus upsetting the
sensibilities of most kernel hackers. These facilities are enough to allow
some user-space drivers (particularly XFree86) to work, but they are not
sufficient to enable a wider range of drivers to move out of the kernel.
One of the big gaps is interrupts; there is no way, currently, for
user-space processes to register and respond to device interrupts. A patch
from the Gelato project addresses this gap by creating a set of files under
/proc. A process wanting to deal with interrupt 11, say, would
open /proc/irq/11/irq. Reading the resulting file descriptor
enables the interrupt and blocks the process until a device interrupt
happens; control then returns to user-space, which can figure out what to
do. A typical user-space driver will set up a separate thread to wait for
interrupts in this manner; the actual work can be handed off to a different
thread within the program.
Peter presented some graphs showing that interrupt response times suffer
very little when interrupt handlers run in user space. The main limitation
at the moment seems to be the fact that shared interrupts are not
Another thing that user-space processes cannot normally do is set up DMA
operations. To enable DMA, a new set of system calls has been added. The
interface appears to be in a bit of flux, but it will be something like the
following. The driver starts by opening a special file for device
int usr_pci_open(int bus, int slot, int function);
There is then a function for setting up DMA mappings:
int usr_pci_map(int fd, int cmd, struct mapping_info *info);
The cmd argument can be USR_ALLOC_CONSISTENT to set up a
long-lived consistent mapping, or USR_MAP to create a streaming,
scatter/gather mapping. In either case, the info argument is used
to pass in the relevant information, and to get the necessary address(es).
There is also, of course, a USR_UNMAP operation for when the DMA
Many user-space drivers will be able to obtain their requests directly from
user space; the X server works in this way. Many other drivers, however,
will need to hook into the kernel for this information. The current patch
includes a mechanism (Peter described it as ugly) for a user-space block
driver to register itself with the kernel and get I/O requests. It works
by opening another special file and using it to communicate requests and
responses back and forth. A similar interface apparently exists for
Getting a user-space driver patch into the kernel could be an interesting
challenge. Many kernel hackers, certainly, resist changes that look like
they are pushing Linux toward something that looks like a microkernel
architecture - or which might legitimize binary-only drivers. On the other
hand, some drivers bring a great deal of baggage into the kernel with them
which might be better kept in user space; think of some of the code
required by some sound drivers or the modulation software needed by "linmodem"
drivers. The ability to run these drivers in user space could be a nice
thing to have.
Gelato user-level drivers page for more information.
Comments (11 posted)
It will come as no surprise to most Linux users that the kernel has grown
over time. In general, the expansion in the kernel has been more than
offset by the increasing power of the systems that it runs on, but there is
still a price to be paid for kernel bloat. Extra memory has to be paid
for, and other overhead - such as cache misses - can hurt the overall
performance of the system.
Andi Kleen has been putting some effort into making the kernel smaller
through the use of some relatively new and obscure gcc options. He starts
with -Os, as do most kernel shrinkers; this one simply tells the
compiler to optimize for size rather than strictly for performance.
Anecdotal evidence suggests that -Os not only produces a smaller
kernel, but the resulting code also often runs faster as well.
The next step was to use
-funit-at-a-time. This option is new; it will be part of the
upcoming gcc 3.4 release. It causes the compiler to load the entire
source file into memory before it begins generating code; the result is
better inlining and dropping of unused functions. The result was a little
over 3% reduction in kernel text size. The reasons for this shrinkage require
further investigation; it may be that there is a significant amount of dead
code in the kernel.
Finally, Andi has also enabled
-mregparm=3, which instructs the compiler to pass up to three
function arguments in registers, rather than on the stack. This option
helps even more than -funit-at-a-time. Using all three options,
Andi is able to reduce the text size by over 700KB.
There is one potential problem with -mregparm=3, however: it
changes the calling conventions within the kernel, and thus breaks binary
modules. As one might imagine, some kernel developers are more worried
about this than others. Red Hat kernel packager Arjan van de Ven has stated that he is using this option, and
intends to build production kernels that way as well. As always, sympathy
for the difficulties encountered by distributors of binary-only modules is
low. If the kernel hackers decide that this option is worth using, they'll
not let some broken binary modules stop them.
Comments (14 posted)
The FUTEX subsystem, which is part of the 2.6 kernel, provides fast mutual
exclusion primitives for user space. The FUTEX functionality is similar to
that of the longstanding semaphores, but with a nicer interface and better
performance. A FUTEX lock can be acquired (in the non-contention case)
without going into the kernel at all. FUTEXes are a part of the
high-performance native POSIX threading implementation.
FUTEXes are an improvement on what came before, but they do not yet provide
the functionality that some users - particularly real-time system
implementers - would like to have. To help fill in the gap, Iñaky
Pérez-González has been working (with others) on a new set of "robust
mutexes" which go by the name of FUSYNs. The project has a simple web site
based at OSDL and a set of patches. Some information can be found in fusyn.txt, which is included with the patch.
FUSYNs enhance FUTEXes with:
- Priority-based locks. When a lock is released, it is not handed over
to a random process. Instead, the highest-priority process waiting
for the lock will be allowed to proceed. If a process changes
priority while waiting for a lock, the system will take the change
into account properly.
- Priority inheritance. Processes which take out FUSYN locks
("fulocks") can have their priority raised to a specified level while
they hold the lock. This mechanism is an attempt to avoid priority
inversion problems, where a low-priority process can obtain a lock,
lose the processor, and keep a high-priority process from running for
a long time.
- Robustness features. The kernel can take remedial action when a
process dies while holding a lock. There is also deadlock protection
code which looks at the chains of locks held by various processes and
reacts when a deadlock situation is detected.
Future plans include the addition of features like condition variables,
reader/writer locks, spinlocks, etc.
Inside the kernel, this functionality is implemented through the addition
of some new facilities which could be useful beyond the FUSYN code. The
"vlocator" structure allows the kernel to associate objects with user-space
processes via a hash table. In the longer term, vlocators could be used to
provide some relief for the ever-growing task structure. The
unfortunately-named "fuqueue" functions much like an ordinary kernel wait
queue, except that wakeups take process priority into account - only the
highest-priority process is awakened. To support this functionality, a new
"plist" type is added; it implements a general, priority-sorted,
doubly-linked list capability.
The reaction to posts of FUSYN patches on linux-kernel has tended to be
quiet. There does not appear to be any strong opposition to the addition
of this capability to the kernel. Whether FUSYNs go into 2.6, or have to
wait for 2.7, however, remains to be seen.
Comments (1 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>