User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current ultra-stable 2.6 kernel is, which was released on March 15; it contains two security fixes. Previously, was released on March 12 with a larger set of fixes. The form of the 2.6.11.x patches has changed slightly: they now apply directly to the 2.6.11 root, rather than to the previous .x release.

There still have been no 2.6.12 prepatches, though it looks like one should appear soon.

When that prepatch shows up, it will include over 2000 patches currently sitting in Linus's BitKeeper repository. These include a driver for the "trusted computing" TPM chip (see the Trusted Computing Group site for more information on TPM), SuperHyway bus support, a new multi-level security implementation for SELinux, a user-mode Linux update, support for hot-pluggable parallel ports, the "cpuset" patch (see cpusets.txt for information on cpusets), a new nVidia framebuffer driver, the device mapper multipath patches, a big set of input driver patches, an ALSA update, an IPv6 update (including a patch removing the "experimental" designation for IPv6), a rearrangement of the net_device structure (which will break binary-only drivers), a 21,000-line DVB whitespace cleanup patch, a rework of the page table access functions (which is still causing some trouble on ia-64), a patch enabling an administrator to enable a subset of the "magic SysRq" functions, numerous driver updates, the address space randomization patches, a new packet classifier mechanism for the networking layer, a new workqueue API function, a Tiger digest algorithm implementation, the restoration of the Philips webcam driver, some software suspend improvements, some readahead improvements, a big block I/O barrier rewrite (which enables full barrier support on serial ATA drives), a set of patches to shrink the kernel for embedded use, a generic sort() function, high-resolution POSIX CPU clock support (not the full high-resolution timers patch), a USB API change (usb_control_msg() and usb_bulk_msg() now take a timeout in milliseconds rather than in jiffies), and lots of fixes.

Also to be found in BitKeeper is an (almost) direct merge of the first three 2.6.11.x releases.

The current -mm patch is 2.6.11-mm4. Recent changes to -mm include a big CFQ I/O scheduler update, a new and smaller relayfs patch, a set of sparse memory support patches, a performance counter API update, a reiser4 update, and various fixes.

The current 2.4 prepatch remains 2.4.30-pre3; there have been no 2.4 prepatches since March 9.

Comments (2 posted)

Kernel development news

Quote of the week

This patch causes a CONFIG_PREEMPT=y, CONFIG_PREEMPT_BKL=y, CONFIG_DEBUG_PREEMPT=y kernel on a ppc64 G5 to hang immediately after displaying the penguins, but apparently not before having set the hardware clock backwards 101 years.

After having carefully reviewed the above description and having decided that these effects were not a part of the patch's design intent I have temporarily set it aside, thanks.

-- Andrew Morton

Comments (2 posted)

Linux Device Drivers, Third Edition now online

LWN is happy to host an online version of Linux Device Drivers, Third Edition by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. As of this writing, only the PDF version of the book is available; it will eventually be released in HTML and DocBook form as well. The book has been released under the Creative Commons Attribution-ShareAlike license, but you're going to want to run out and buy a copy or three anyway.

Comments (29 posted)

HALs considered harmful

It is a nice thing when hardware vendors provide Linux drivers for their products. Since these drivers are written by the vendor, there is usually no trouble getting information on how the hardware is controlled. With luck, that hardware will "just work" for Linux users, and all will be as it should be. In the real world, however, things are not always that simple. Hardware companies often take interesting approaches to coding drivers, and the people involved are not always well tied into the Linux kernel development community. The result can be conflicts between the vendors, who simply want to get things done, and the kernel developers, who are increasingly unwilling to accept code which does not meet their standards.

For a current example, consider the proposed new Neterion/S2io 10GbE network driver. This driver has been rewritten from the beginning; it supports many of the hardware's advanced features and provides high performance. It looks like just the thing for high-end Linux-based networking uses.

The problem is that the driver does not deal directly with the Linux kernel API. It is, instead, based on a "hardware abstraction layer" (HAL) which glues the driver to the kernel. So, for example, the driver builds lists with a structure like:

    typedef struct xge_list_t {
	struct xge_list_t* prev;
	struct xge_list_t* next;
    } xge_list_t;

Such lists are accessed with functions like xge_list_insert() and even xge_list_for_each(). Similarly, the driver uses xge_os_spin_lock() to acquire a lock, xge_os_malloc() to allocate memory, and xge_os_pio_mem_read8() to read a byte from I/O memory. This approach helps Neterion support a variety of systems with the same core driver code, but it does not sit well with the kernel hackers. Networking maintainer David Miller responded this way:

I totally reject this driver, HAL is unacceptable for in-tree drivers. We've been over this a thousand times.

One problem with the HAL approach is that there can be a performance cost. A 10G network adaptor can handle thousands of packets per second; at that sort of load, even the minimal overhead of a simple wrapper function can make a significant difference. The extra memory taken by the glue code, parallel linked list implementation, etc. also hurts. A developer community which is dedicated to obtaining the best possible performance from the hardware will be unwilling to swallow even a small cost in the name of portability.

The bigger issue, however, is in the maintainability of the driver. A driver written for a HAL layer has its own idioms and conventions; it works with a completely different API. It simply does not look like a Linux driver; Linux developers will have a harder time understanding and modifying it. One might think that this is not a big issue, since Neterion has said that it plans to maintain the driver, but there are a couple of problems that come up:

  • When a kernel developer changes an internal function, he or she will usually go through and fix all of the in-tree users of that function. So developers who are not employed by the hardware vendor will almost certainly have to work with the driver code at some point.

  • Hardware vendors have a short attention span. Product cycles tend to be short, and the vendor will, before too long, move on to new products requiring new and different drivers. Once a given driver no longer applies to the products which are currently in the vendor's catalog, the vendor will, most likely, see little reason to continue maintaining that driver. The Linux community, however, will have an interest in keeping that driver working for several more years.

Additionally, the vendor may resist patches which affect the HAL layer itself, making it harder for the community to work on the driver. Overall, the Linux kernel developers plan to maintain the kernel for many years into the future; they tend to be concerned about taking on code which will make that maintenance task harder in the future.

So the kernel hackers have some solid reasons for resisting HAL-based drivers. The vendors also have good reasons for wanting to write such drivers. To them, the resistance to HAL looks like a "Linux is the only important system" attitude, and it forces them in incur extra costs when writing their code. In this case, Neterion has reluctantly said that it will produce a non-HAL driver if that is the only way to get into the tree; other vendors may not bother.

Comments (15 posted)

Handling interrupts in user space

Peter Chubb has long been working on a project to move device drivers into user space. Getting drivers out of the kernel, he points out, would have a number of benefits. Faults in drivers (the source of a large percentage of kernel bugs) would be less likely to destabilize the entire system. Drivers could be easily restarted and upgraded. And a user-space implementation would make it possible to provide a relatively stable driver API, which would appeal to many vendors.

Much of the support needed for user-space drivers is already in place. A process can communicate with hardware by mapping the relevant I/O memory directly into its address space, for example; that is how the X server works with video adaptors. One piece, however, is missing: user-space drivers cannot handle device interrupts. In many cases, a proper driver cannot be written without using interrupts, so a user-space implementation is not possible.

Peter has now posted his user-space interrupts patch for review and possible inclusion. The mechanism that he ended up with is simple and easy to work with, but it suffers from an important limitation.

The mechanism is this: a process wishing to respond to interrupts opens a new /proc file; for IRQ 10, the file would be /proc/irq/10/irq. A read on that file will yield the number of interrupts which have occurred since the last read. If no interrupts have occurred, the read() call will block until the next interrupt happens. The select() and poll() system calls are properly supported, so it is possible to include interrupt handling as just another thing to do in an event loop.

On the kernel side, the real interrupt handler looks like this:

    static irqreturn_t irq_proc_irq_handler(int irq, void *vidp, 
                                            struct pt_regs *regs)
 	struct irq_proc *idp = (struct irq_proc *)vidp;
 	BUG_ON(idp->irq != irq);
 	return IRQ_HANDLED;

In other words, all it does is count the interrupt and wake up any process that might be waiting to handle it.

The handler also disables the interrupt before returning. There is an important reason for this action: since the handler knows nothing of the device which is actually interrupting, it is unable to acknowledge or turn off the interrupt. So, when the handler returns, the device will still be signalling an interrupt. If the interrupt were not disabled in the processor (or the APIC), the processor would be interrupted (and the handler called) all over again, repeatedly - at least, when level-triggered interrupts are in use. Disabling the interrupt allows life to go on until the user-space process gets scheduled and is able to tend to the interrupting device.

There is a problem here, however: interrupt lines are often shared between devices. Disabling a shared interrupt shuts it off for all devices using that line, not just the one being handled by a user-space driver. It is entirely possible that masking that interrupt will block a device which is needed by the user-space handler - a disk controller, perhaps. In that case, the system may well deadlock. For this reason, the patch does not allow user-space drivers to work with shared interrupts. This restriction avoids problems, but it also reduces the utility of the whole thing.

One possible solution was posted by Alan Cox. He would require user-space processes to pass a small structure into the kernel describing the hardware's IRQ interface. It would be just enough for the kernel to tell if a particular device is interrupting, acknowledge that interrupt, and tell the device to shut up. With that in place, the kernel could let user space deal with what the device really needs while leaving the interrupt enabled. It has been pointed out that this simple scheme would not work with some of the more complicated hardware, but it would be a step in the right direction regardless.

Meanwhile, Michael Raymond described a different user-space interrupt implementation (called "User Level Interrupt" or ULI) done at SGI. This patch is significantly more complicated. In this scheme, a user-space driver would register an interrupt handler function directly with the kernel. When an interrupt happens, the ULI code performs some assembly-code black magic so that its "return from interrupt" instruction jumps directly into the user-space handler, in user mode. Once that handler returns, the ULI library writes a code to a magic device which causes the kernel stack and related data structures to be restored to their pre-interrupt state. The implementation is more complex, and it currently only works on the ia-64 architecture, but it could conceivably offer better performance than the /proc method.

Comments (7 posted)

Some more 2.6.12 API changes

A few more changes to the 2.6 internal kernel API have been merged since last week's summary.

The driver model API has seen a couple of small changes. kref_put() no longer returns void:

    int kref_put(struct kref *kref, void (*release)(struct kref *kref));

The (new) return value is normally zero, but will be nonzero if the kref was actually removed. Note that a zero return does not imply that the kref is still valid; somebody else may have done the last kref_put() call in the mean time.

The kset type now has its own internal spinlock. That means that a kset is no longer required to be part of a subsystem.

Greg Kroah-Hartman has proposed a rather wider set of changes to the driver model class code. Essentially, he is pushing all users over to a form of the "class_simple" interface, and getting away from the original class implementation, which was hard to use correctly. These changes have not yet been merged, however.

The kernel has long held a variety of special-purpose sorting functions. These have now been replaced by a generic heap sort utility written by Matt Mackall. It's interface is:

    void sort(void *base, size_t num, size_t size, 
              int (*compare)(const void *a, const void *b),
              void (*swap)(void *a, void *b, int size));

Here, base is the array of items to sort; it contains num items of size bytes. The compare() function returns the integer equivalent of a-b; sort() will sort the array in ascending order as dictated by compare(). The swap() function is optional; it can be provided if the caller knows a faster way to exchange two elements in the array.

Comments (none posted)

Patches and updates

Kernel trees


Core kernel code

Device drivers

Filesystems and block I/O

  • Phillip Lougher: SquashFS. (March 14, 2005)


Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds