User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.2. The most recent 2.6.3 prepatch is 2.6.3-rc2, which was released on February 9. This prepatch is large, with many changes merged; the big ones include more network driver cleanup work, a USB update (including the removal of the USB scanner code), the new DMA pool abstraction (covered in last week's LWN Kernel Page), an ACPI update, an NFS update, and more. See the long-format changelog for the details.

The removal of the USB scanner code has concerned some readers. It was removed because it is broken and unmaintained, and because the accepted way of driving USB scanners in 2.6 is via the user-space libusb library.

2.6.3-rc1 was released on February 6. This one contained a lot of network driver cleanups, a number of gcc-3.5 fixes, various architecture updates, a big ALSA update, and more; once again, the long-format changelog has the details.

Linus's BitKeeper tree contains some architecture updates, a filesystem scalability improvement, some CPU frequency control updates, and a few other fixes.

The current tree from Andrew Morton, as of this writing, is 2.6.3-rc1-mm1. Recent additions include a lot of fixes, some performance improvements, but little in the way of new features.

The current 2.4 kernel is 2.4.24; the first 2.4.25 release candidate was announced on February 5.

The current stone-age kernel is 2.0.40, which was released by David Weinehall on February 8. It contains some security fixes, so if you have any systems still running 2.0 you may want to consider upgrading.

Comments (1 posted)

Kernel development news

Bringing kgdb into 2.6

The kernel development community has long been divided over the topic of interactive debuggers. Many hackers find debuggers to be an indispensable part of their development toolkits. Others claim that debuggers lead people to fix symptoms rather than problems; rather than use such a crutch, these people say, it is better to truly understand the code. Once you have "become one" with the code, finding bugs is not that hard.

The latter view is held by Linus Torvalds, who explained his approach in very clear terms back in 2000:

You can use a kernel debugger if you want to, and I won't give you the cold shoulder because you have "sullied" yourself. But I'm not going to help you use one, and I would frankly prefer people not to use kernel debuggers that much. So I don't make it part of the standard distribution, and if the existing debuggers aren't very well known I won't shed a tear over it.

The end result is that there has never been support for interactive debuggers in the mainline kernel - at least, for the more popular architectures.

The 2.6 kernel is now Andrew Morton's turf, however, and Andrew is more open to the value of debugging tools. In fact, he has carried a version of the kgdb patch in his -mm tree for a long time. Might Andrew merge kgdb into the 2.6 kernel at some point?

The answer from Andrew seems to be "maybe":

I wouldn't support inclusion of i386 kgdb until it has had a lot of cleanup, possible de-featuritisification and some thought has been applied to splitting it into arch and generic bits. It's quite a lot of work.

In other words, there is no disagreement with the idea of merging kgdb, but the code needs some work first. Problems include a large number of #ifdefs, and the fact that the patch is relatively intrusive, touching many files. There are also objections to how the debugger works with the virtual memory subsystem, especially for the i386 architecture. All of these problems are probably solvable, given enough development time. The interest in a mainline kgdb is probably high enough that the cleanup work will happen, and kgdb may well be merged; a kgdb CVS repository has been established for those interested in this effort. An eventual merge into 2.6 seems unlikely to carry forward into 2.7, however.

Comments (none posted)

How likely should likely() be?

Newcomers to the kernel code base are often surprised by the appearance of (what seems to be) a bunch of calls to functions called likely() and unlikely(). These calls always appear in conditional tests, along these lines:

    if (likely(some_condition)) {
	/* Do something */

In fact, likely() and unlikely() are not function calls at all; instead, they are hints to the compiler. If the compiler knows that one outcome is far more likely than the other, it can optimize the code it generates accordingly. On some architectures, this information can also be encoded into the object code, where it will override the branch prediction normally done by the processor.

David Woodhouse noted that the differing interpretation of these directives by different architectures makes it hard to know when likely() and unlikely() should be used. If the result of one of those directives is just a bit of code optimization, they should be used liberally whenever the programmer knows that one outcome will happen more often than the other. On some architectures, however, the cost of guessing wrong is fairly high, and these directives should only be used where the odds are overwhelmingly in favor of one outcome.

David's proposal is to replace likely() and unlikely() with a new probable() macro:

    probable(condition, percent)

Where "percent" is the programmer's estimation of how often the condition will evaluate true. Each architecture could then decide what to tell the compiler based on the given percentage.

Rusty Russell has a more straightforward answer, saying that these directives should be rarely used.

Sometimes, unlikely()/likely() help code readability. But generally it should be considered the register keyword of the 2000's: if the case isn't ABSOLUTELY CRYSTAL CLEAR, or doesn't show up on benchmarks, disdain is appropriate.

The "disdain" approach seems more likely to be adopted than a new macro. There will be very few code paths where these directives will make a measurable difference. And the fact is that programmers often guess wrong about which code paths will be taken how often.

David would also like to add a probability to the get_unaligned() macro, which is used to access data which might not have the alignment required by the processor. Some architectures can handle any alignment; on those, get_unaligned() expands to a direct pointer dereference. Others require that unaligned access be done via multiple, smaller fetches or stores. Of those, some architectures can fix up an unaligned access attempt in an exception handler, and others cannot. For architectures which can fix unaligned accesses, it might be faster to take an occasional exception if the probability of an unaligned access is small. Adding a probability to the get_unaligned() macro (and put_unaligned() as well) would allow each architecture to optimize those accesses. Whether the resulting performance improvement would justify the effort remains to be seen.

Comments (2 posted)

A warning for BSD pseudo terminal users

H. Peter Anvin wants to know if anybody is still using the old BSD pseudo terminal ("pty") interface. These devices show up on most systems as /dev/ptyXX; they were once used for applications like network logins. Most applications on most Linux systems have not used BSD ptys for some years now; instead, the newer /dev/pts devices are used.

Peter is asking because he has plans for the pseudo terminal subsystem; he'd like to clean it up, make it more dynamic, and make use of the larger device numbers available in 2.6. The need to maintain compatibility with the BSD interface is, it seems, interfering with that work. So Peter would like to remove the BSD pty interface if possible.

There have been a few complaints. The bootlogd utility used by some distributions apparently uses BSD terminals in some cases. Truly old systems may still use the old interface for network logins or terminal emulator windows; this is not functionality that one breaks lightly. Peter may yet find a way to maintain BSD pty support while making his other changes. Even so, the BSD pty interface may be headed toward the end of its life sometime in the 2.7 development series.

Comments (5 posted)

Safe sysfs support

It has long been intended that the sysfs virtual filesystem would contain information about all of the hardware (and more) installed on a given system. Implementation of this intention has lagged in places, however, and there are still parts of the system which lack sysfs support. One of those areas is the frame buffer device code. In an attempt to fill in that gap, James Simmons recently posted a patch adding sysfs support for frame buffer devices; this patch was merged into 2.6.3-rc1.

There is only one problem with this patch: it can oops the kernel when frame buffer driver modules are unloaded. The problem is the same one which has afflicted other subsystem sysfs implementations: lifecycle rules. Once a data structure has been exposed via sysfs, user space can hold references to that structure indefinitely. Open sysfs files can persist long after the underlying device has been removed from the system, and long after the relevant module has been unloaded. If the behavior of sysfs-exposed data structures has not been carefully laid out, the kernel can be left holding references to structures or code which no longer exist.

This sort of problem hit the networking subsystem hard. Once net_device structures were exposed via sysfs, it was no longer possible to allow individual network drivers to control what the lifecycle of those structures is. As a result, it is now necessary to allocate all net_device structures dynamically, and to let the networking subsystem decide when and how to free those structures. The networking code is also very careful not to access any module code after a net_device has been shut down. The end result is that net_device structures can persist in the system long after the module which created them has been removed. It all works, but the cost was a lengthy cleanup operation which has only now reached something close to completion.

The frame buffer patches attempted to do things right from the beginning by making the fb_info structure into a dynamic object. A support function exists to allocate the structure, and it is automatically freed when the last reference is removed. The only problem is that the frame buffer drivers do not use this interface; they allocate and destroy fb_info structures on their own. As a result, in the 2.6.3-rc1 (and -rc2) kernel, fb_info structures can be freed twice (or staticly-allocated structures can be freed once). That sort of error tends to create displays on the frame buffer that the user does not want to see.

Fixing this problem requires updating every frame buffer driver to use dynamically-allocated fb_info structures. James has stated his intent to make this change. In the mean time, the "stable" kernel release candidate has a known problem which will require a wide-ranging set of changes to fix. Al Viro, a master of this sort of transition, has grumbled that these changes should have been done in the opposite order, so as to avoid breaking things. Others have complained that this sort of change is too big for a stable kernel series and should have waited for 2.7.

Yet another approach, however, would be to use the "class_simple" interface, which was merged in 2.6.2-rc1. This interface makes it easy to retrofit a /sys/class interface into existing drivers without having to deal with some of the more complex lifecycle issues. The interface is straightforward; one starts by creating a class:

    struct class_simple *class_simple_create(struct module *owner,
                                             char *name);

The owner argument should almost always be passed as THIS_MODULE; the name will show up under /sys/class. The resulting class can be removed at some later time with:

    void class_simple_destroy(struct class_simple *class);

Entries for individual devices can be added with:

    struct class_device *class_simple_device_add(struct class_simple *class,
                                                 dev_t dev,
						 struct device *device,
						 const char *fmt, ...);

Here, class is the class which was created above, dev is the device number for the device, device is a struct device structure for this device (it can be NULL), and the rest is a printk()-style format string to create the name for the entry. The result (on success) is a sysfs directory with exactly one attribute: a file called dev which contains the device number. That is adequate for a tool like udev to create corresponding device nodes.

The entry can be removed, of course:

    void class_simple_remove(dev_t dev);

The whole thing works without maintaining references into the calling driver, so most of the lifetime rule issues are avoided. More recent changes to the class_simple interface include (in 2.6.3-rc) hotplug support.

Comments (none posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

Device drivers

Filesystems and block I/O



Benchmarks and bugs

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds