The current 2.6 prepatch is 2.6.19-rc3
by Linus on
October 23. It contains a fairly long list of fixes, but things do seem to
be settling down a little bit. See the
for the details.
A very small number of patches - all fixes - have been merged since
2.6.19-rc3 was released.
Adrian Bunk is maintaining a list of known
regressions in 2.6.19; it is surprisingly short.
The current -mm tree is 2.6.19-rc2-mm2. Recent changes
to -mm include the addition of the I/OAT DMA engine tree, a big set of x86
patches, sharing of page tables for huge TLB pages, a set of library
functions for reversing the bits in a value, initial support for
virtualizing process sessions, and some ongoing tty driver work.
Comments (none posted)
Kernel development news
Developers like to joke about Al Viro's fearsome presence on linux-kernel,
but the truth of the matter is that he has been relatively quiet there for
some time. That does not mean, however, that he has become a full-time
Plan 9 developer. Instead, he has been steadily working to improve
the static analysis tools used to find kernel bugs before they bite users.
In recent times, Al's work has resulted in a long series of patches merged
into the mainline, almost all of which have been marked as "endianness
annotations." These patches mostly change the declared types for various
functions, variables, and structure members. The new types may be
unfamiliar to many, since they are relatively new - though not that
new; they were introduced in 2.6.9. These types are __le16,
__le32, __le64, __be16, __be32, and
What these types represent is an attempt to encode whether the (unsigned)
integer value is big-endian (most significant byte first) or
little-endian. For most programming, even within the kernel, endianness is
not a concern; things just work without much thought on the programmer's
part. Kernel code often must work with data encoded in a specific byte
ordering which might not match the processor's ordering, though. Network
protocols, filesystem on-disk data structures, and device registers are all
examples. In general, when the kernel works with data in a non-native
ordering, it must first swap the bytes around to match the processor's
expectations. Failure to do so can lead to all kinds of strange bugs.
There are a number of macros provided which can help with this task.
There are classic functions like htonl(), which converts a 32-bit
integer from host to "network" (big-endian) order. More generally, the
kernel provides macros like __le32_to_cpu(), which will convert a
little-endian 32-bit quantity to the ordering required by the processor.
These macros make for portable code; they perform the requested
transformation on systems where it is needed, and simply vanish in the
The conversion functions only work, however, when the programmer remembers
to use them. In their absence, values in non-native byte orders simply
look like integers, and there is no way to catch the error until something
blows up. And that might not happen to the original developer at all; the
code may work flawlessly until somebody tries to run it on a different
architecture and things fall apart.
It would be nice to catch endianness mistakes at an earlier stage. That is
the purpose of types like __be32; they allow a programmer to mark
data with a specific ordering when it first enters the system. Thereafter,
a suitably smart tool can check the code which manipulates that data and
ensure that it does not mix that data with native-order data, does not try
to do arithmetic with it, etc. Once everything is suitably annotated,
whole classes of bugs can be caught before the kernel is even booted. And
that can only be a good thing.
The "suitably smart tool" which does this work is "sparse," a static
checker which was originally written by Linus Torvalds. There is support
for sparse built into the kernel build system, making it easy to check code
for errors. The one thing which remains relatively difficult, for whatever
reason, is getting the "sparse" tool in the first place. Few distributors
package it, so prospective users must grab a copy and build it themselves.
The true source for sparse is the git repository on kernel.org. With git,
it's a simple matter of of running:
git clone git://git.kernel.org/pub/scm/devel/sparse/sparse.git
A simple "make" in the resulting directory will yield a working
sparse binary. This tool changes quickly enough that updating
from the repository on a regular basis is probably a good idea. For people
who don't have git handy, it is also possible to grab a tarball snapshot
Once sparse is installed, running it on the kernel is a simple
matter of going to your local source tree and running:
The parameter C=2 causes sparse to be run on every
.c file; if C=1 is used instead, only files which must be
recompiled are checked. Checking for endianness problems requires an
make C=2 CF=-D__CHECK_ENDIAN__
The number of warnings which result from this command can be large - though
it is dropping as Al works his way through the code.
Checking code submissions with sparse is highly recommended - it
is one of the steps in the patch submission
checklist packaged with the kernel. Use of sparse may still
be more of an exception than the rule, however. But it is easy enough -
and useful enough - that there really is no reason not to run the checker
on code before sending it out. It is, after all, much nicer to have the
computer find silly mistakes for you, in the privacy of your own computer,
before broadcasting them to the world.
Comments (2 posted)
The "ndiswrapper" module has been featured on this page before. It is a
special sort of glue module which allows Windows NDIS drivers to be loaded
into a Linux kernel. It can be found on systems using hardware (wireless
adapters in particular) which is not well supported by Linux drivers; by
gluing in the Windows driver, ndiswrapper allows this hardware to operate.
But, since it is a mechanism created to stuff the most proprietary of
binary modules into Linux, ndiswrapper was always going to raise some
One of the many changes that went into the 2.6.16 kernel was an explicit
check for the ndiswrapper module. It is, in fact, this explicit:
if (strcmp(mod->name, "ndiswrapper") == 0)
This test means that any system which has had ndiswrapper loaded will have
the "proprietary module" taint flag set. As a result, the kernel
developers are highly unlikely to be interested in helping with any
problems encountered running that kernel.
Since 2.6.16 was released last March, one might well wonder why ndiswrapper
author Giridhar Pemmasani is only now getting around to complaining about that test. It turns out
that the kernel developers have quietly taken things one step further in
the 2.6.19-rc kernels.
The kernel has long exported symbols to modules in two modes. Symbols
exported with EXPORT_SYMBOL are available to all modules loaded
into the kernel, while those exported with EXPORT_SYMBOL_GPL are
only available to those which declare a GPL-compatible license. This distinction has
never been a problem for ndiswrapper, which is licensed under the GPL. So,
even after the explicit taint was added, ndiswrapper could load and
For 2.6.19, a patch by Florin Malita was merged which changes the
calculation for GPL-only symbols slightly. Rather than checking whether a
module has a GPL-compatible license, the new code checks whether the module
has the "proprietary module" taint bit set. In most cases, the end result
is the same. For ndiswrapper, however, the result is that GPL-only symbols,
which were accessible in earlier kernels, are now unavailable. And that
means that ndiswrapper can no longer be loaded into the kernel. The
module's author thinks that this is unfair, since ndiswrapper is, in fact,
Alan Cox's response reads like this:
EXPORT_SYMBOL_GPL() is used to assert that the symbol is absolutely
definitely not a public symbol. EXPORT_SYMBOL exports symbols which
might be but even then the GPL derivative work rules apply. When
you mark a driver GPL it is permitted to use _GPL symbols, but if
it does so it cannot then go and load other non GPL [modules] and
expect people not to question its validity.
The core idea makes sense: the GPL-only restrictions are not worth much if
they can be trivially circumvented by loading a glue module. One cannot
help but wonder, however, if the wrong target has been chosen in this case.
The purpose of GPL-only exports is to inhibit the creation of proprietary
derived products of the kernel. It is hard to imagine an argument that
could demonstrate that a typical NDIS module is, in any way, a derived
product of the Linux kernel. These are drivers written for another
operating system entirely by people who, likely as not, have never had any
sort of contact with Linux source. Unlike certain other types of
proprietary modules, NDIS drivers are clearly independent works. One may
well balk at the notion of loading such a driver into one's kernel, but it
is hard to make a case that copyright law somehow prohibits such an action.
It also seems a little strange to penalize a module for having the wrong
name. There are no explicit checks for, say, the MadWifi module, which
also loads a binary-only component. Simply renaming the module would
circumvent this check, opening a window which would take the kernel
developers some months to close again. One could imagine a determined
programmer coming up with a random name every time a module is built,
decisively winning that particular battle. The ndiswrapper author seems
uninclined to play those games, however; he has, instead, tried to work
within the kernel community. The module already takes pains to add a
kernel taint itself whenever an NDIS driver is loaded.
There does not seem to be any particular interest in the kernel community
in backing down on this change however. That leaves the ndiswrapper author
in a position where he must either rework the code to avoid GPL-only
symbols or find some other way of enabling it to load once again. One
assumes that some sort of workaround will be found; it may not be an
optimal solution, but ndiswrapper does have a significant community which
depends on it to make its hardware work under Linux.
Comments (79 posted)
Here's a quick look at a few patches have been posted recently.
802.11 regulatory domains
Standard wisdom says that putting policy decisions into the kernel is
generally a bad idea. Policies implemented in kernel space limit the
flexibility of the system, potentially keeping user-space from doing
everything it could possibly accomplish. There are times, however, when
that is exactly what one might want to do.
Wireless networking presents a number of challenges for the kernel. One of
them is imposed entirely from the outside: anything which can transmit
tends to be heavily regulated. So wireless networking adapters must not
transmit on unauthorized frequencies or at power levels above those allowed
by law. Needless to say, the applicable rules vary from one jurisdiction
to the next, making it impossible to work with a single set of constraints,
especially if one wants to use the hardware to its full, legal potential in
any given country. The need to adhere to regulatory constraints is one of
the favorite reasons given by wireless adapter vendors when asked why they
cannot release programming information for their hardware.
Luis Rodriguez is trying to address regulatory issues with a patch set implementing
regulatory domain information in the kernel (and in the Devicescape 802.11
stack in particular). At this point, the work is just infrastructure which
tracks the constraints imposed by any given domain and the current domain
under which the system is operating. Actually implementing compliance with
the current domain has been left for a future exercise - there are some
802.11 stack issues which need to be resolved first.
If this patch set is eventually accepted, there will be a single framework
by which all wireless adapters can be operated in a legal manner, wherever
the computer might happen to be located. Beyond doing the right thing with
regard to the spectrum, Luis hopes that this mechanism might be enough to
satisfy the various regulatory agencies that Linux has its act together in
this regard - and that vendors will no longer feel the need to keep their
programming information secret. Luis, it seems, is an optimistic sort of
Meanwhile, things have been quiet for a while on the network channels
front. But that does not mean that nothing has been happening. As proof,
consider that Evgeniy Polyakov has just surfaced with a new net channels patch which,
he claims, can scale significantly better than the current networking
This version of network channels focuses more on the user-space interface
side of the problem, leaving most of the kernel infrastructure work for
another time. To that end, it adds a new system call,
netchannel_control(), to hook up channel functionality to
user-space code. netchannel_control() is another one of those
multiplexer interfaces that Evgeniy seems to favor; it functions like an
ioctl() call with three core operations:
- NETCHANNEL_CREATE creates a new channel bound to given local
and remote addresses. There is also a "type" specification which
describes how the channel operates with user space.
- NETCHANNEL_SEND will send a packet out on the network.
- NETCHANNEL_RECV blocks until an incoming packet is received,
then passes that packet to user space.
The kernel side of the implementation, for now, is simple and
straightforward: a NETCHANNEL_SEND call will allocate an
sk_buff structure and fill it with user data with
copy_from_user(); the packet is then sent on its way via the
network stack in the usual manner. The design envisions adding other,
faster ways of moving data around - using Evgeniy's network allocator
mechanism, for example - in the future.
The current patch adds a
user-space network stack which uses the new netchannel mechanism. It
claims to handle TCP and UDP currently, with a number of the expected
features; there is a "socket-like interface" presented to applications.
There has been no public reaction to this patch set so far, so it is hard
to say whether it makes sense to the other network developers or not.
Evgeniy appears to be a persistent sort of person, however, so expect to
see this code again.
Finally, this large patch set
posted by Avi Kivity may stir things up a bit in the virtualization area.
These patches implement support for Intel's virtualization extensions (AMD
support is said to be forthcoming), allowing Linux systems to easily run
virtual machines without the need for a full hypervisor like Xen. It
should be noted that the patch set includes a fair amount of Xen code,
With this patch set added, a Linux system implements a new device called
/dev/kvm. Opening this device creates a new virtual machine which
can then be manipulated with a set of ioctl() calls. One
important operation creates virtual CPUs for this machine; currently only a
single virtual CPU is supported. There is an
operation which adds a memory region to the client machine. Memory is
organized into "slots" modeled after the physical slots on a motherboard;
they are useful for setting up structures like the memory hole at 640K
still found on PC-type systems. Other operations allow for the creation of
page table entries in the client, manipulating virtual machine registers,
intercepting privileged operations, and actually running a program in the
client. A set of debugging operations is provided as well.
There is a fair amount of interest in this patch set; it looks like it
could be a (relatively!) simple way of adding hardware virtualization
support to the kernel. One comment which has been posted remarks on the
similarities between this functionality and the work which has been done to
support the "synergistic processing units" (SPUs) on the Cell
architecture. The SPU support, which has been in the kernel since 2.6.16,
uses a special-purpose filesystem (rather than ioctl()) to control
the clients. Any sort of merger between these two subsystems would thus
likely involve the /dev/kvm interface being changed. So this
patch set could change quite a bit as it heads toward eventual inclusion.
Comments (7 posted)
Patches and updates
Core kernel code
- Junio C Hamano: GIT 1.4.3.
(October 19, 2006)
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>