Brief items
The current 2.6 kernel is 2.6.6-rc3, which was
announced by Linus on April 27. New
patches this time around include an NTFS update, some generic snapshot
support code for filesystems (taken from XFS), a CPU frequency control
update, TCP "Vegas" congestion avoidance, a new single-threaded mode for
workqueues, a CIFS update, various architecture updates, and lots of fixes.
See
the long-format changelog for the
details.
Linus hopes to have a final 2.6.6 release out by the end of the week.
Linus's BitKeeper tree contains, as of this writing, a set of XFS patches
and a few other fixes.
The current prepatch from Andrew Morton is 2.6.6-rc2-mm2. Recent additions to -mm include
a set of reiserfs patches (see below), some more ext3 block reservation
work, a "tickless" timer mode for the S/390 architecture, hotplug CPU
support for ia-64 systems, and lots of fixes.
The current 2.4 prepatch is 2.4.27-pre1, released by Marcelo on April 22. This
prepatch merges the 2.6 serial ATA drivers, but otherwise restricts itself
to fixes and small updates. According to Marcelo, the serial ATA update is
the last big change that will go into 2.4.x.
Comments (5 posted)
Kernel development news
The patch seemed relatively straightforward;
Chris Mason had sent out a set of reiserfs changes which include
data=journal support, an improved block allocator, metadata
readahead, and external attribute support. One of those changes, however,
does not sit well with Hans Reiser, the original creator of reiserfs.
External attributes are just a way of attaching extra metadata to files;
they are used for things like access control lists and SELinux context
information. Most of the standard Linux filesystems support external
attributes in 2.6, but reiserfs does not yet have that capability. Given that
features like SELinux will not work without external attributes, adding this
capability has been high on the wish lists of many users and developers.
When the external attribute patch was posted, however, Hans Reiser sent out
a protest asking that the patch not be
applied. Those who have followed Hans's work over the years will know what
his objection is: external attributes live in their own name space. Hans
has dedicated much effort to the task of moving everything into the
filesystem name space; he says:
The expressive power of an operating system is NOT proportional to
the number of components, but instead is proportional to the number
of possible connections between its components. If you fragment
the namespaces of an OS, you reduce each component to effective
interactions with only those components in its reduced size
namespace. Designing the namespaces of an OS so that they possess
closure and are unified may seem like a lot of effort, but it is
very cost effective compared to building many times more other OS
components to get the same expressive power.
The upcoming Reiser4 filesystem
implements Hans's vision of how external attributes should be implemented;
essentially, each attribute just looks like a small file containing the
attribute value. The solution is fast and elegant; it may well be the way
things are done in the future. For the moment, however, there are a few
problems:
- Reiser4 is still in beta testing, and has not yet been submitted for
inclusion into the 2.6 kernel. Once it is submitted, it is not
certain that it will be accepted quickly.
- The Reiser4 external attribute API is different from the API used in
the 2.6 kernel. Applications, to use this API, will have to be
rewritten to use the special-purpose reiser4() system call.
- Some users of reiserfs ("Reiser3") might be a little nervous about
making an immediate jump to a completely new filesystem. They just
might want to be able to continue using their existing filesystems
and, simultaneously, make use of external attributes.
The solution seems reasonably clear: Reiser4, once it's ready, can be
merged with its new ways of doing things. The existing reiserfs
filesystem, meanwhile, can be augmented with the capabilities that its
users would like to have now. This approach would seem to offer the best
of both worlds. Mr. Reiser disagrees; he would rather not have (what he
sees as) an inelegant hack grafted onto reiserfs to satisfy immediate
needs. When code is released as free software, however, not even its
creator can prevent its development in certain directions if that's what
its users want.
Comments (6 posted)
MODULE_LICENSE() is a macro which allows loadable kernel modules
to declare their license to the world. Its purpose is to let the kernel
developers know when a non-free module has been inserted into a given
kernel. If you submit an oops report showing a "tainted" kernel, chances
you will be asked to reproduce the problem without the proprietary module
loaded, or to talk to that module's vendor about the problem. In general,
the kernel hackers want to hear about problems, but their interest drops
remarkably when they cannot get at the source to diagnose or fix the
problem.
The declared module license is also used to decide whether a given module
can have access to the small number of "GPL-only" symbols in the kernel.
There is no central authority which checks license declarations; it is
assumed that module authors will not want to lie about the license they are
using. That assumption has generally proved to be valid, so people were
surprised when Linuxant was found to have
put a false module declaration into its binary-only "linmodem" driver. Or,
if it's not false, it does cleverly manage to not tell the whole story.
The actual license string in the Linuxant driver reads:
GPL\0for files in the "GPL" directory; for others, only LICENSE file
applies
The \0 is an ASCII NUL character, which, in C programs, terminates
a string. Thus, while the above declaration would appear fairly clear to
human eyes, the kernel only sees a license declaration of "GPL".
One might well wonder why Linuxant chose to do this. The driver in
question does not use any GPL-only symbols, so it is not an attempt to get
around the kernel's simplistic access control mechanism. According to Linuxant president Marc Boucher, they
simply wanted to avoid bothering users with kernel warnings:
The purpose of the workaround is to avoid repetitive warning
messages generated when multiple modules belonging to a single
logical "driver" are loaded (even when a module is only probed but
not used due to the hardware not being present). Although the
issue may sound trivial/harmless to people on the lkml, it was a
frequent cause of confusion for the average person.
Most developers seem to have taken this explanation at face value, though
some remain unhappy about the approach that
was used.
Possible solutions include putting the "kernel tainted" warning in the
system logfile only, simply suppressing the warning after the first
time, or having the Linuxant drivers manually set the "tainted" flag
themselves at load time. Finding a way to achieve Linuxant's aim (provide
a driver which
enables hardware that does not otherwise work with Linux while avoiding
upsetting users with lots of scary messages) should not be that hard to
do.
Meanwhile, of course, there is also interest in making it harder for others
to get past the kernel license check. Carl-Daniel Hailfinger, who
originally pointed out the problem, also submitted a patch which would
explicitly "blacklist" modules from Linuxant; any such module would taint
the kernel regardless of its claimed license. Linus suggested that the license be stored as a
counted string as a way of defeating the "NUL attack." Rusty Russell,
instead, noted that any check that would be
accepted into the kernel can be defeated by an even moderately motivated
attacker. His patch includes a quick compile-time check to defeat
Linuxant's technique, but it explicitly avoids getting into a real arms
race with potential violators.
Chances are we will see this sort of behavior again - with, perhaps, a less
benign intent. The nature of a free kernel makes it hard to shut out those
who are unwilling to play by the rules. But, as Linus said:
...playing the above kinds of games makes it pretty clear to
everybody that any infringement was done wilfully. They should be
talking to their lawyers about things like that.
Given that a number of free software hackers are increasingly unwilling to
see their licenses ignored, anybody who wants to engage in this sort of
behavior should, indeed, be talking to their lawyers.
Comments (19 posted)
The kernel makes heavy use of inline functions. In many cases, inline
expansion of functions is necessary; some of these functions employ various
sorts of assembly language trickery that must be part of the calling
function. In many other cases, though, inline functions are used as a
way of improving performance. The thinking is that, by eliminating the
overhead of performing actual function calls, inline functions can make
things go faster.
The truth turns out not to be so simple. Consider, for example, this patch from Stephen Hemminger which removes
the inline attribute from a set of functions for dealing with socket
buffers ("SKBs", the structure used to represent network packets inside the
kernel). Stephen ran some benchmarks after applying his patch; those
benchmarks ran 3% faster than they did with the functions being
expanded inline.
The problem with inline functions is that they replicate the function body
every time they are called. Each use of an inline function thus makes the
kernel executable bigger. A bigger executable means more cache misses, and
that slows things down. The SKB functions are called in many places all
over the networking code. Each one of those calls creates a new copy of
the function; Denis Vlasenko recently discovered that many of them expand to over 100
bytes of code. The result is that, while many places in the kernel are
calling the same function, each one is working with its own copy. And each
copy takes space in the processor instruction cache. That cache usage
hurts; each cache miss costs more than a function call.
Thus, the kernel hackers are taking a harder look at inline function
declarations than they used to. An inline function may seem like it should
be faster, but that is not necessarily the case. The notion of a
"time/space tradeoff" which is taught in many computer science classes
turns out, often, to not hold in the real world. Many times, smaller is
also faster.
Comments (7 posted)
Matt Mackall has
released version 0.7 of his
"ketchup" script. Ketchup can be thought of as a sort of apt-get for
kernel trees; run "
ketchup 2.6-bk" and it will go get the
right combination of kernel tarballs and patch sets and put them together
into a complete kernel tree. Several different trees are supported,
including
-mm,
-tiny, and
-mjb, and the script
can string together a series of patches to get to the desired destination.
If you find yourself playing with a number of different kernel trees,
ketchup may prove to be a tasty condiment to add to your tool collection.
Comments (1 posted)
The
workqueue mechanism is the 2.6 kernel's
replacement for task queues; a workqueue allows kernel code to defer work
until some time in the future. Tasks submitted to work queues are run in
the context of a special process, so they can sleep if need be. Workqueues
go out of their way to keep work on the same processor by a dedicated
worker thread for each processor on the system.
For many applications, one process per CPU is far more than is needed; a
single worker process is plenty. There is a shared, generic workqueue
which can be used in many of these situations. In others, however, use of
that queue is not appropriate; perhaps the code in question performs long
sleeps, or it may deadlock with another use of that queue. In these cases,
there has been no alternative to paying the cost of all those worker
threads.
As of 2.6.6, thanks to Rusty Russell, there will be a new function for
creating workqueues:
struct workqueue_struct *create_singlethread_workqueue(char *name);
As you might expect, this function creates a workqueue that relies on a
single worker thread. Chances are, many of the current users of workqueues
could switch over to the single-threaded variety.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>