Brief itemsannounced by Linus on April 27. New patches this time around include an NTFS update, some generic snapshot support code for filesystems (taken from XFS), a CPU frequency control update, TCP "Vegas" congestion avoidance, a new single-threaded mode for workqueues, a CIFS update, various architecture updates, and lots of fixes. See the long-format changelog for the details.
Linus hopes to have a final 2.6.6 release out by the end of the week.
Linus's BitKeeper tree contains, as of this writing, a set of XFS patches and a few other fixes.
The current prepatch from Andrew Morton is 2.6.6-rc2-mm2. Recent additions to -mm include a set of reiserfs patches (see below), some more ext3 block reservation work, a "tickless" timer mode for the S/390 architecture, hotplug CPU support for ia-64 systems, and lots of fixes.
The current 2.4 prepatch is 2.4.27-pre1, released by Marcelo on April 22. This prepatch merges the 2.6 serial ATA drivers, but otherwise restricts itself to fixes and small updates. According to Marcelo, the serial ATA update is the last big change that will go into 2.4.x.
Kernel development newsThe patch seemed relatively straightforward; Chris Mason had sent out a set of reiserfs changes which include data=journal support, an improved block allocator, metadata readahead, and external attribute support. One of those changes, however, does not sit well with Hans Reiser, the original creator of reiserfs.
External attributes are just a way of attaching extra metadata to files; they are used for things like access control lists and SELinux context information. Most of the standard Linux filesystems support external attributes in 2.6, but reiserfs does not yet have that capability. Given that features like SELinux will not work without external attributes, adding this capability has been high on the wish lists of many users and developers.
When the external attribute patch was posted, however, Hans Reiser sent out a protest asking that the patch not be applied. Those who have followed Hans's work over the years will know what his objection is: external attributes live in their own name space. Hans has dedicated much effort to the task of moving everything into the filesystem name space; he says:
The upcoming Reiser4 filesystem implements Hans's vision of how external attributes should be implemented; essentially, each attribute just looks like a small file containing the attribute value. The solution is fast and elegant; it may well be the way things are done in the future. For the moment, however, there are a few problems:
The solution seems reasonably clear: Reiser4, once it's ready, can be merged with its new ways of doing things. The existing reiserfs filesystem, meanwhile, can be augmented with the capabilities that its users would like to have now. This approach would seem to offer the best of both worlds. Mr. Reiser disagrees; he would rather not have (what he sees as) an inelegant hack grafted onto reiserfs to satisfy immediate needs. When code is released as free software, however, not even its creator can prevent its development in certain directions if that's what its users want.
The declared module license is also used to decide whether a given module can have access to the small number of "GPL-only" symbols in the kernel.
There is no central authority which checks license declarations; it is assumed that module authors will not want to lie about the license they are using. That assumption has generally proved to be valid, so people were surprised when Linuxant was found to have put a false module declaration into its binary-only "linmodem" driver. Or, if it's not false, it does cleverly manage to not tell the whole story.
The actual license string in the Linuxant driver reads:
The \0 is an ASCII NUL character, which, in C programs, terminates a string. Thus, while the above declaration would appear fairly clear to human eyes, the kernel only sees a license declaration of "GPL".
One might well wonder why Linuxant chose to do this. The driver in question does not use any GPL-only symbols, so it is not an attempt to get around the kernel's simplistic access control mechanism. According to Linuxant president Marc Boucher, they simply wanted to avoid bothering users with kernel warnings:
Most developers seem to have taken this explanation at face value, though some remain unhappy about the approach that was used. Possible solutions include putting the "kernel tainted" warning in the system logfile only, simply suppressing the warning after the first time, or having the Linuxant drivers manually set the "tainted" flag themselves at load time. Finding a way to achieve Linuxant's aim (provide a driver which enables hardware that does not otherwise work with Linux while avoiding upsetting users with lots of scary messages) should not be that hard to do.
Meanwhile, of course, there is also interest in making it harder for others to get past the kernel license check. Carl-Daniel Hailfinger, who originally pointed out the problem, also submitted a patch which would explicitly "blacklist" modules from Linuxant; any such module would taint the kernel regardless of its claimed license. Linus suggested that the license be stored as a counted string as a way of defeating the "NUL attack." Rusty Russell, instead, noted that any check that would be accepted into the kernel can be defeated by an even moderately motivated attacker. His patch includes a quick compile-time check to defeat Linuxant's technique, but it explicitly avoids getting into a real arms race with potential violators.
Chances are we will see this sort of behavior again - with, perhaps, a less benign intent. The nature of a free kernel makes it hard to shut out those who are unwilling to play by the rules. But, as Linus said:
Given that a number of free software hackers are increasingly unwilling to see their licenses ignored, anybody who wants to engage in this sort of behavior should, indeed, be talking to their lawyers.
The truth turns out not to be so simple. Consider, for example, this patch from Stephen Hemminger which removes the inline attribute from a set of functions for dealing with socket buffers ("SKBs", the structure used to represent network packets inside the kernel). Stephen ran some benchmarks after applying his patch; those benchmarks ran 3% faster than they did with the functions being expanded inline.
The problem with inline functions is that they replicate the function body every time they are called. Each use of an inline function thus makes the kernel executable bigger. A bigger executable means more cache misses, and that slows things down. The SKB functions are called in many places all over the networking code. Each one of those calls creates a new copy of the function; Denis Vlasenko recently discovered that many of them expand to over 100 bytes of code. The result is that, while many places in the kernel are calling the same function, each one is working with its own copy. And each copy takes space in the processor instruction cache. That cache usage hurts; each cache miss costs more than a function call.
Thus, the kernel hackers are taking a harder look at inline function declarations than they used to. An inline function may seem like it should be faster, but that is not necessarily the case. The notion of a "time/space tradeoff" which is taught in many computer science classes turns out, often, to not hold in the real world. Many times, smaller is also faster.released version 0.7 of his "ketchup" script. Ketchup can be thought of as a sort of apt-get for kernel trees; run "ketchup 2.6-bk" and it will go get the right combination of kernel tarballs and patch sets and put them together into a complete kernel tree. Several different trees are supported, including -mm, -tiny, and -mjb, and the script can string together a series of patches to get to the desired destination. If you find yourself playing with a number of different kernel trees, ketchup may prove to be a tasty condiment to add to your tool collection. workqueue mechanism is the 2.6 kernel's replacement for task queues; a workqueue allows kernel code to defer work until some time in the future. Tasks submitted to work queues are run in the context of a special process, so they can sleep if need be. Workqueues go out of their way to keep work on the same processor by a dedicated worker thread for each processor on the system.
For many applications, one process per CPU is far more than is needed; a single worker process is plenty. There is a shared, generic workqueue which can be used in many of these situations. In others, however, use of that queue is not appropriate; perhaps the code in question performs long sleeps, or it may deadlock with another use of that queue. In these cases, there has been no alternative to paying the cost of all those worker threads.
As of 2.6.6, thanks to Rusty Russell, there will be a new function for creating workqueues:
struct workqueue_struct *create_singlethread_workqueue(char *name);
As you might expect, this function creates a workqueue that relies on a single worker thread. Chances are, many of the current users of workqueues could switch over to the single-threaded variety.
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds