User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.6-rc3, which was announced by Linus on April 27. New patches this time around include an NTFS update, some generic snapshot support code for filesystems (taken from XFS), a CPU frequency control update, TCP "Vegas" congestion avoidance, a new single-threaded mode for workqueues, a CIFS update, various architecture updates, and lots of fixes. See the long-format changelog for the details.

Linus hopes to have a final 2.6.6 release out by the end of the week.

Linus's BitKeeper tree contains, as of this writing, a set of XFS patches and a few other fixes.

The current prepatch from Andrew Morton is 2.6.6-rc2-mm2. Recent additions to -mm include a set of reiserfs patches (see below), some more ext3 block reservation work, a "tickless" timer mode for the S/390 architecture, hotplug CPU support for ia-64 systems, and lots of fixes.

The current 2.4 prepatch is 2.4.27-pre1, released by Marcelo on April 22. This prepatch merges the 2.6 serial ATA drivers, but otherwise restricts itself to fixes and small updates. According to Marcelo, the serial ATA update is the last big change that will go into 2.4.x.

Comments (5 posted)

Kernel development news

On reiserfs and external attributes

The patch seemed relatively straightforward; Chris Mason had sent out a set of reiserfs changes which include data=journal support, an improved block allocator, metadata readahead, and external attribute support. One of those changes, however, does not sit well with Hans Reiser, the original creator of reiserfs.

External attributes are just a way of attaching extra metadata to files; they are used for things like access control lists and SELinux context information. Most of the standard Linux filesystems support external attributes in 2.6, but reiserfs does not yet have that capability. Given that features like SELinux will not work without external attributes, adding this capability has been high on the wish lists of many users and developers.

When the external attribute patch was posted, however, Hans Reiser sent out a protest asking that the patch not be applied. Those who have followed Hans's work over the years will know what his objection is: external attributes live in their own name space. Hans has dedicated much effort to the task of moving everything into the filesystem name space; he says:

The expressive power of an operating system is NOT proportional to the number of components, but instead is proportional to the number of possible connections between its components. If you fragment the namespaces of an OS, you reduce each component to effective interactions with only those components in its reduced size namespace. Designing the namespaces of an OS so that they possess closure and are unified may seem like a lot of effort, but it is very cost effective compared to building many times more other OS components to get the same expressive power.

The upcoming Reiser4 filesystem implements Hans's vision of how external attributes should be implemented; essentially, each attribute just looks like a small file containing the attribute value. The solution is fast and elegant; it may well be the way things are done in the future. For the moment, however, there are a few problems:

  • Reiser4 is still in beta testing, and has not yet been submitted for inclusion into the 2.6 kernel. Once it is submitted, it is not certain that it will be accepted quickly.

  • The Reiser4 external attribute API is different from the API used in the 2.6 kernel. Applications, to use this API, will have to be rewritten to use the special-purpose reiser4() system call.

  • Some users of reiserfs ("Reiser3") might be a little nervous about making an immediate jump to a completely new filesystem. They just might want to be able to continue using their existing filesystems and, simultaneously, make use of external attributes.

The solution seems reasonably clear: Reiser4, once it's ready, can be merged with its new ways of doing things. The existing reiserfs filesystem, meanwhile, can be augmented with the capabilities that its users would like to have now. This approach would seem to offer the best of both worlds. Mr. Reiser disagrees; he would rather not have (what he sees as) an inelegant hack grafted onto reiserfs to satisfy immediate needs. When code is released as free software, however, not even its creator can prevent its development in certain directions if that's what its users want.

Comments (6 posted)

Being honest with MODULE_LICENSE

MODULE_LICENSE() is a macro which allows loadable kernel modules to declare their license to the world. Its purpose is to let the kernel developers know when a non-free module has been inserted into a given kernel. If you submit an oops report showing a "tainted" kernel, chances you will be asked to reproduce the problem without the proprietary module loaded, or to talk to that module's vendor about the problem. In general, the kernel hackers want to hear about problems, but their interest drops remarkably when they cannot get at the source to diagnose or fix the problem.

The declared module license is also used to decide whether a given module can have access to the small number of "GPL-only" symbols in the kernel.

There is no central authority which checks license declarations; it is assumed that module authors will not want to lie about the license they are using. That assumption has generally proved to be valid, so people were surprised when Linuxant was found to have put a false module declaration into its binary-only "linmodem" driver. Or, if it's not false, it does cleverly manage to not tell the whole story.

The actual license string in the Linuxant driver reads:

GPL\0for files in the "GPL" directory; for others, only LICENSE file applies

The \0 is an ASCII NUL character, which, in C programs, terminates a string. Thus, while the above declaration would appear fairly clear to human eyes, the kernel only sees a license declaration of "GPL".

One might well wonder why Linuxant chose to do this. The driver in question does not use any GPL-only symbols, so it is not an attempt to get around the kernel's simplistic access control mechanism. According to Linuxant president Marc Boucher, they simply wanted to avoid bothering users with kernel warnings:

The purpose of the workaround is to avoid repetitive warning messages generated when multiple modules belonging to a single logical "driver" are loaded (even when a module is only probed but not used due to the hardware not being present). Although the issue may sound trivial/harmless to people on the lkml, it was a frequent cause of confusion for the average person.

Most developers seem to have taken this explanation at face value, though some remain unhappy about the approach that was used. Possible solutions include putting the "kernel tainted" warning in the system logfile only, simply suppressing the warning after the first time, or having the Linuxant drivers manually set the "tainted" flag themselves at load time. Finding a way to achieve Linuxant's aim (provide a driver which enables hardware that does not otherwise work with Linux while avoiding upsetting users with lots of scary messages) should not be that hard to do.

Meanwhile, of course, there is also interest in making it harder for others to get past the kernel license check. Carl-Daniel Hailfinger, who originally pointed out the problem, also submitted a patch which would explicitly "blacklist" modules from Linuxant; any such module would taint the kernel regardless of its claimed license. Linus suggested that the license be stored as a counted string as a way of defeating the "NUL attack." Rusty Russell, instead, noted that any check that would be accepted into the kernel can be defeated by an even moderately motivated attacker. His patch includes a quick compile-time check to defeat Linuxant's technique, but it explicitly avoids getting into a real arms race with potential violators.

Chances are we will see this sort of behavior again - with, perhaps, a less benign intent. The nature of a free kernel makes it hard to shut out those who are unwilling to play by the rules. But, as Linus said:

...playing the above kinds of games makes it pretty clear to everybody that any infringement was done wilfully. They should be talking to their lawyers about things like that.

Given that a number of free software hackers are increasingly unwilling to see their licenses ignored, anybody who wants to engage in this sort of behavior should, indeed, be talking to their lawyers.

Comments (19 posted)

The cost of inline functions

The kernel makes heavy use of inline functions. In many cases, inline expansion of functions is necessary; some of these functions employ various sorts of assembly language trickery that must be part of the calling function. In many other cases, though, inline functions are used as a way of improving performance. The thinking is that, by eliminating the overhead of performing actual function calls, inline functions can make things go faster.

The truth turns out not to be so simple. Consider, for example, this patch from Stephen Hemminger which removes the inline attribute from a set of functions for dealing with socket buffers ("SKBs", the structure used to represent network packets inside the kernel). Stephen ran some benchmarks after applying his patch; those benchmarks ran 3% faster than they did with the functions being expanded inline.

The problem with inline functions is that they replicate the function body every time they are called. Each use of an inline function thus makes the kernel executable bigger. A bigger executable means more cache misses, and that slows things down. The SKB functions are called in many places all over the networking code. Each one of those calls creates a new copy of the function; Denis Vlasenko recently discovered that many of them expand to over 100 bytes of code. The result is that, while many places in the kernel are calling the same function, each one is working with its own copy. And each copy takes space in the processor instruction cache. That cache usage hurts; each cache miss costs more than a function call.

Thus, the kernel hackers are taking a harder look at inline function declarations than they used to. An inline function may seem like it should be faster, but that is not necessarily the case. The notion of a "time/space tradeoff" which is taught in many computer science classes turns out, often, to not hold in the real world. Many times, smaller is also faster.

Comments (7 posted)

Ketchup with that?

Matt Mackall has released version 0.7 of his "ketchup" script. Ketchup can be thought of as a sort of apt-get for kernel trees; run "ketchup 2.6-bk" and it will go get the right combination of kernel tarballs and patch sets and put them together into a complete kernel tree. Several different trees are supported, including -mm, -tiny, and -mjb, and the script can string together a series of patches to get to the desired destination. If you find yourself playing with a number of different kernel trees, ketchup may prove to be a tasty condiment to add to your tool collection.

Comments (1 posted)

Single-threaded workqueues

The workqueue mechanism is the 2.6 kernel's replacement for task queues; a workqueue allows kernel code to defer work until some time in the future. Tasks submitted to work queues are run in the context of a special process, so they can sleep if need be. Workqueues go out of their way to keep work on the same processor by a dedicated worker thread for each processor on the system.

For many applications, one process per CPU is far more than is needed; a single worker process is plenty. There is a shared, generic workqueue which can be used in many of these situations. In others, however, use of that queue is not appropriate; perhaps the code in question performs long sleeps, or it may deadlock with another use of that queue. In these cases, there has been no alternative to paying the cost of all those worker threads.

As of 2.6.6, thanks to Rusty Russell, there will be a new function for creating workqueues:

    struct workqueue_struct *create_singlethread_workqueue(char *name);

As you might expect, this function creates a workqueue that relies on a single worker thread. Chances are, many of the current users of workqueues could switch over to the single-threaded variety.

Comments (none posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management



Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds