User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.2-rc1, which was announced by Linus on January 20. A massive set of patches was merged into this release; included therein is a new Qlogic SCSI driver, a bunch of USB work, infrastructural work to better support hotplug block devices, several architecture updates, some I/O scheduler work, a rework of the PCMCIA drivers, sysfs support for several new types of devices, an XFS update, and much more. See the long-format changelog for the details.

The latest kernel from Andrew Morton, as of this writing, is 2.6.1-mm5. Recent additions to the -mm tree include a working modular IDE implementation, improved x86 CPU type selection options, a user-mode Linux update, and many other fixes.

The current 2.4 kernel is 2.4.24. Marcelo released 2.4.25-pre5 on January 15; a "deadly mistake" there forced the release of 2.4.25-pre6 one day later. The 2.4.25 prepatches have been getting steadily smaller; there may be a release candidate coming in the near future.

Comments (2 posted)

Kernel development news

Quote of the week

Well, you don't get to be a kernel hacker simply by looking good in Speedos.

-- Rusty Russell

Comments (5 posted)

User-space device drivers

Peter Chubb works with the Gelato project, which works toward better Linux performance on the IA-64 architecture. Among other things, Peter is responsible for the 64-bit sector support which went into the 2.5 kernel. At Linux.Conf.Au, Peter discussed device drivers. He pointed out that drivers, while making up roughly 50% of the code in the kernel, are responsible for 85% of all kernel bugs. Drivers tend to be written by people who would not normally be considered kernel hackers: hardware engineers, for example. These people tend to have a hard time dealing with the special nature of kernel programming, where interfaces are fluid, bugs are lethal, and many normal development tools are not available.

Driver authors - and their users - might have a much easier time if drivers could be written to run in user space. In addition to mitigating the above-mentioned kernel programming issues, user-space driver development would allow the creation of a stable ABI; it also, presumably, would eliminate any licensing issues associated with closed-source drivers. User-space driver writers could also use any language they choose, "even Python."

Peter and company have set out to make user-space drivers possible. Some of the necessary pieces are already in place. Standard Linux will allow a suitably privileged process to access I/O ports, for example. Low-address memory-mapped I/O registers can be accessed via a mmap() of /dev/mem. There is also an interface which gives user-space processes access to the PCI configuration space; this interface works via ioctl() calls on /proc files, though, thus upsetting the sensibilities of most kernel hackers. These facilities are enough to allow some user-space drivers (particularly XFree86) to work, but they are not sufficient to enable a wider range of drivers to move out of the kernel.

One of the big gaps is interrupts; there is no way, currently, for user-space processes to register and respond to device interrupts. A patch from the Gelato project addresses this gap by creating a set of files under /proc. A process wanting to deal with interrupt 11, say, would open /proc/irq/11/irq. Reading the resulting file descriptor enables the interrupt and blocks the process until a device interrupt happens; control then returns to user-space, which can figure out what to do. A typical user-space driver will set up a separate thread to wait for interrupts in this manner; the actual work can be handed off to a different thread within the program.

Peter presented some graphs showing that interrupt response times suffer very little when interrupt handlers run in user space. The main limitation at the moment seems to be the fact that shared interrupts are not supported.

Another thing that user-space processes cannot normally do is set up DMA operations. To enable DMA, a new set of system calls has been added. The interface appears to be in a bit of flux, but it will be something like the following. The driver starts by opening a special file for device operations:

    int usr_pci_open(int bus, int slot, int function);

There is then a function for setting up DMA mappings:

    int usr_pci_map(int fd, int cmd, struct mapping_info *info);

The cmd argument can be USR_ALLOC_CONSISTENT to set up a long-lived consistent mapping, or USR_MAP to create a streaming, scatter/gather mapping. In either case, the info argument is used to pass in the relevant information, and to get the necessary address(es). There is also, of course, a USR_UNMAP operation for when the DMA is complete.

Many user-space drivers will be able to obtain their requests directly from user space; the X server works in this way. Many other drivers, however, will need to hook into the kernel for this information. The current patch includes a mechanism (Peter described it as ugly) for a user-space block driver to register itself with the kernel and get I/O requests. It works by opening another special file and using it to communicate requests and responses back and forth. A similar interface apparently exists for network drivers.

Getting a user-space driver patch into the kernel could be an interesting challenge. Many kernel hackers, certainly, resist changes that look like they are pushing Linux toward something that looks like a microkernel architecture - or which might legitimize binary-only drivers. On the other hand, some drivers bring a great deal of baggage into the kernel with them which might be better kept in user space; think of some of the code required by some sound drivers or the modulation software needed by "linmodem" drivers. The ability to run these drivers in user space could be a nice thing to have.

See the Gelato user-level drivers page for more information.

Comments (11 posted)

Shrinking the kernel with gcc

It will come as no surprise to most Linux users that the kernel has grown over time. In general, the expansion in the kernel has been more than offset by the increasing power of the systems that it runs on, but there is still a price to be paid for kernel bloat. Extra memory has to be paid for, and other overhead - such as cache misses - can hurt the overall performance of the system.

Andi Kleen has been putting some effort into making the kernel smaller through the use of some relatively new and obscure gcc options. He starts with -Os, as do most kernel shrinkers; this one simply tells the compiler to optimize for size rather than strictly for performance. Anecdotal evidence suggests that -Os not only produces a smaller kernel, but the resulting code also often runs faster as well.

The next step was to use -funit-at-a-time. This option is new; it will be part of the upcoming gcc 3.4 release. It causes the compiler to load the entire source file into memory before it begins generating code; the result is better inlining and dropping of unused functions. The result was a little over 3% reduction in kernel text size. The reasons for this shrinkage require further investigation; it may be that there is a significant amount of dead code in the kernel.

Finally, Andi has also enabled -mregparm=3, which instructs the compiler to pass up to three function arguments in registers, rather than on the stack. This option helps even more than -funit-at-a-time. Using all three options, Andi is able to reduce the text size by over 700KB.

There is one potential problem with -mregparm=3, however: it changes the calling conventions within the kernel, and thus breaks binary modules. As one might imagine, some kernel developers are more worried about this than others. Red Hat kernel packager Arjan van de Ven has stated that he is using this option, and intends to build production kernels that way as well. As always, sympathy for the difficulties encountered by distributors of binary-only modules is low. If the kernel hackers decide that this option is worth using, they'll not let some broken binary modules stop them.

Comments (14 posted)

FUSYNs - robust user-space synchronization primitives

The FUTEX subsystem, which is part of the 2.6 kernel, provides fast mutual exclusion primitives for user space. The FUTEX functionality is similar to that of the longstanding semaphores, but with a nicer interface and better performance. A FUTEX lock can be acquired (in the non-contention case) without going into the kernel at all. FUTEXes are a part of the high-performance native POSIX threading implementation.

FUTEXes are an improvement on what came before, but they do not yet provide the functionality that some users - particularly real-time system implementers - would like to have. To help fill in the gap, Iñaky Pérez-González has been working (with others) on a new set of "robust mutexes" which go by the name of FUSYNs. The project has a simple web site based at OSDL and a set of patches. Some information can be found in fusyn.txt, which is included with the patch.

FUSYNs enhance FUTEXes with:

  • Priority-based locks. When a lock is released, it is not handed over to a random process. Instead, the highest-priority process waiting for the lock will be allowed to proceed. If a process changes priority while waiting for a lock, the system will take the change into account properly.

  • Priority inheritance. Processes which take out FUSYN locks ("fulocks") can have their priority raised to a specified level while they hold the lock. This mechanism is an attempt to avoid priority inversion problems, where a low-priority process can obtain a lock, lose the processor, and keep a high-priority process from running for a long time.

  • Robustness features. The kernel can take remedial action when a process dies while holding a lock. There is also deadlock protection code which looks at the chains of locks held by various processes and reacts when a deadlock situation is detected.

Future plans include the addition of features like condition variables, reader/writer locks, spinlocks, etc.

Inside the kernel, this functionality is implemented through the addition of some new facilities which could be useful beyond the FUSYN code. The "vlocator" structure allows the kernel to associate objects with user-space processes via a hash table. In the longer term, vlocators could be used to provide some relief for the ever-growing task structure. The unfortunately-named "fuqueue" functions much like an ordinary kernel wait queue, except that wakeups take process priority into account - only the highest-priority process is awakened. To support this functionality, a new "plist" type is added; it implements a general, priority-sorted, doubly-linked list capability.

The reaction to posts of FUSYN patches on linux-kernel has tended to be quiet. There does not appear to be any strong opposition to the addition of this capability to the kernel. Whether FUSYNs go into 2.6, or have to wait for 2.7, however, remains to be seen.

Comments (1 posted)

Patches and updates

Kernel trees

  • Andrew Morton: 2.6.1-mm4. (January 16, 2004)
  • Andrew Morton: 2.6.1-mm5. (January 20, 2004)


Build system

Core kernel code

Development tools

Device drivers


Filesystems and block I/O

Memory management



Benchmarks and bugs


Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds