User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.19-rc3, released by Linus on October 23. It contains a fairly long list of fixes, but things do seem to be settling down a little bit. See the long-format changelog for the details.

A very small number of patches - all fixes - have been merged since 2.6.19-rc3 was released.

Adrian Bunk is maintaining a list of known regressions in 2.6.19; it is surprisingly short.

The current -mm tree is 2.6.19-rc2-mm2. Recent changes to -mm include the addition of the I/OAT DMA engine tree, a big set of x86 patches, sharing of page tables for huge TLB pages, a set of library functions for reversing the bits in a value, initial support for virtualizing process sessions, and some ongoing tty driver work.

Comments (none posted)

Kernel development news

Using sparse for endianness verification

Developers like to joke about Al Viro's fearsome presence on linux-kernel, but the truth of the matter is that he has been relatively quiet there for some time. That does not mean, however, that he has become a full-time Plan 9 developer. Instead, he has been steadily working to improve the static analysis tools used to find kernel bugs before they bite users.

In recent times, Al's work has resulted in a long series of patches merged into the mainline, almost all of which have been marked as "endianness annotations." These patches mostly change the declared types for various functions, variables, and structure members. The new types may be unfamiliar to many, since they are relatively new - though not that new; they were introduced in 2.6.9. These types are __le16, __le32, __le64, __be16, __be32, and __be64.

What these types represent is an attempt to encode whether the (unsigned) integer value is big-endian (most significant byte first) or little-endian. For most programming, even within the kernel, endianness is not a concern; things just work without much thought on the programmer's part. Kernel code often must work with data encoded in a specific byte ordering which might not match the processor's ordering, though. Network protocols, filesystem on-disk data structures, and device registers are all examples. In general, when the kernel works with data in a non-native ordering, it must first swap the bytes around to match the processor's expectations. Failure to do so can lead to all kinds of strange bugs.

There are a number of macros provided which can help with this task. There are classic functions like htonl(), which converts a 32-bit integer from host to "network" (big-endian) order. More generally, the kernel provides macros like __le32_to_cpu(), which will convert a little-endian 32-bit quantity to the ordering required by the processor. These macros make for portable code; they perform the requested transformation on systems where it is needed, and simply vanish in the remaining cases.

The conversion functions only work, however, when the programmer remembers to use them. In their absence, values in non-native byte orders simply look like integers, and there is no way to catch the error until something blows up. And that might not happen to the original developer at all; the code may work flawlessly until somebody tries to run it on a different architecture and things fall apart.

It would be nice to catch endianness mistakes at an earlier stage. That is the purpose of types like __be32; they allow a programmer to mark data with a specific ordering when it first enters the system. Thereafter, a suitably smart tool can check the code which manipulates that data and ensure that it does not mix that data with native-order data, does not try to do arithmetic with it, etc. Once everything is suitably annotated, whole classes of bugs can be caught before the kernel is even booted. And that can only be a good thing.

The "suitably smart tool" which does this work is "sparse," a static checker which was originally written by Linus Torvalds. There is support for sparse built into the kernel build system, making it easy to check code for errors. The one thing which remains relatively difficult, for whatever reason, is getting the "sparse" tool in the first place. Few distributors package it, so prospective users must grab a copy and build it themselves.

The true source for sparse is the git repository on With git, it's a simple matter of of running:

    git clone  git://

A simple "make" in the resulting directory will yield a working sparse binary. This tool changes quickly enough that updating from the repository on a regular basis is probably a good idea. For people who don't have git handy, it is also possible to grab a tarball snapshot from Dave Jones's site.

Once sparse is installed, running it on the kernel is a simple matter of going to your local source tree and running:

    make C=2

The parameter C=2 causes sparse to be run on every .c file; if C=1 is used instead, only files which must be recompiled are checked. Checking for endianness problems requires an additional parameter:

    make C=2 CF=-D__CHECK_ENDIAN__

The number of warnings which result from this command can be large - though it is dropping as Al works his way through the code.

Checking code submissions with sparse is highly recommended - it is one of the steps in the patch submission checklist packaged with the kernel. Use of sparse may still be more of an exception than the rule, however. But it is easy enough - and useful enough - that there really is no reason not to run the checker on code before sending it out. It is, after all, much nicer to have the computer find silly mistakes for you, in the privacy of your own computer, before broadcasting them to the world.

Comments (2 posted)

GPL-only symbols and ndiswrapper

The "ndiswrapper" module has been featured on this page before. It is a special sort of glue module which allows Windows NDIS drivers to be loaded into a Linux kernel. It can be found on systems using hardware (wireless adapters in particular) which is not well supported by Linux drivers; by gluing in the Windows driver, ndiswrapper allows this hardware to operate. But, since it is a mechanism created to stuff the most proprietary of binary modules into Linux, ndiswrapper was always going to raise some eyebrows.

One of the many changes that went into the 2.6.16 kernel was an explicit check for the ndiswrapper module. It is, in fact, this explicit:

    if (strcmp(mod->name, "ndiswrapper") == 0)
	add_taint_module(mod, TAINT_PROPRIETARY_MODULE);

This test means that any system which has had ndiswrapper loaded will have the "proprietary module" taint flag set. As a result, the kernel developers are highly unlikely to be interested in helping with any problems encountered running that kernel. Since 2.6.16 was released last March, one might well wonder why ndiswrapper author Giridhar Pemmasani is only now getting around to complaining about that test. It turns out that the kernel developers have quietly taken things one step further in the 2.6.19-rc kernels.

The kernel has long exported symbols to modules in two modes. Symbols exported with EXPORT_SYMBOL are available to all modules loaded into the kernel, while those exported with EXPORT_SYMBOL_GPL are only available to those which declare a GPL-compatible license. This distinction has never been a problem for ndiswrapper, which is licensed under the GPL. So, even after the explicit taint was added, ndiswrapper could load and function normally.

For 2.6.19, a patch by Florin Malita was merged which changes the calculation for GPL-only symbols slightly. Rather than checking whether a module has a GPL-compatible license, the new code checks whether the module has the "proprietary module" taint bit set. In most cases, the end result is the same. For ndiswrapper, however, the result is that GPL-only symbols, which were accessible in earlier kernels, are now unavailable. And that means that ndiswrapper can no longer be loaded into the kernel. The module's author thinks that this is unfair, since ndiswrapper is, in fact, GPL-licensed code.

Alan Cox's response reads like this:

EXPORT_SYMBOL_GPL() is used to assert that the symbol is absolutely definitely not a public symbol. EXPORT_SYMBOL exports symbols which might be but even then the GPL derivative work rules apply. When you mark a driver GPL it is permitted to use _GPL symbols, but if it does so it cannot then go and load other non GPL [modules] and expect people not to question its validity.

The core idea makes sense: the GPL-only restrictions are not worth much if they can be trivially circumvented by loading a glue module. One cannot help but wonder, however, if the wrong target has been chosen in this case.

The purpose of GPL-only exports is to inhibit the creation of proprietary derived products of the kernel. It is hard to imagine an argument that could demonstrate that a typical NDIS module is, in any way, a derived product of the Linux kernel. These are drivers written for another operating system entirely by people who, likely as not, have never had any sort of contact with Linux source. Unlike certain other types of proprietary modules, NDIS drivers are clearly independent works. One may well balk at the notion of loading such a driver into one's kernel, but it is hard to make a case that copyright law somehow prohibits such an action.

It also seems a little strange to penalize a module for having the wrong name. There are no explicit checks for, say, the MadWifi module, which also loads a binary-only component. Simply renaming the module would circumvent this check, opening a window which would take the kernel developers some months to close again. One could imagine a determined programmer coming up with a random name every time a module is built, decisively winning that particular battle. The ndiswrapper author seems uninclined to play those games, however; he has, instead, tried to work within the kernel community. The module already takes pains to add a kernel taint itself whenever an NDIS driver is loaded.

There does not seem to be any particular interest in the kernel community in backing down on this change however. That leaves the ndiswrapper author in a position where he must either rework the code to avoid GPL-only symbols or find some other way of enabling it to load once again. One assumes that some sort of workaround will be found; it may not be an optimal solution, but ndiswrapper does have a significant community which depends on it to make its hardware work under Linux.

Comments (79 posted)

Patch summary: regulatory domains, network channels, and virtualization

Here's a quick look at a few patches have been posted recently.

802.11 regulatory domains

Standard wisdom says that putting policy decisions into the kernel is generally a bad idea. Policies implemented in kernel space limit the flexibility of the system, potentially keeping user-space from doing everything it could possibly accomplish. There are times, however, when that is exactly what one might want to do.

Wireless networking presents a number of challenges for the kernel. One of them is imposed entirely from the outside: anything which can transmit tends to be heavily regulated. So wireless networking adapters must not transmit on unauthorized frequencies or at power levels above those allowed by law. Needless to say, the applicable rules vary from one jurisdiction to the next, making it impossible to work with a single set of constraints, especially if one wants to use the hardware to its full, legal potential in any given country. The need to adhere to regulatory constraints is one of the favorite reasons given by wireless adapter vendors when asked why they cannot release programming information for their hardware.

Luis Rodriguez is trying to address regulatory issues with a patch set implementing regulatory domain information in the kernel (and in the Devicescape 802.11 stack in particular). At this point, the work is just infrastructure which tracks the constraints imposed by any given domain and the current domain under which the system is operating. Actually implementing compliance with the current domain has been left for a future exercise - there are some 802.11 stack issues which need to be resolved first.

If this patch set is eventually accepted, there will be a single framework by which all wireless adapters can be operated in a legal manner, wherever the computer might happen to be located. Beyond doing the right thing with regard to the spectrum, Luis hopes that this mechanism might be enough to satisfy the various regulatory agencies that Linux has its act together in this regard - and that vendors will no longer feel the need to keep their programming information secret. Luis, it seems, is an optimistic sort of person.

Network channels

Meanwhile, things have been quiet for a while on the network channels front. But that does not mean that nothing has been happening. As proof, consider that Evgeniy Polyakov has just surfaced with a new net channels patch which, he claims, can scale significantly better than the current networking implementation.

This version of network channels focuses more on the user-space interface side of the problem, leaving most of the kernel infrastructure work for another time. To that end, it adds a new system call, netchannel_control(), to hook up channel functionality to user-space code. netchannel_control() is another one of those multiplexer interfaces that Evgeniy seems to favor; it functions like an ioctl() call with three core operations:

  • NETCHANNEL_CREATE creates a new channel bound to given local and remote addresses. There is also a "type" specification which describes how the channel operates with user space.

  • NETCHANNEL_SEND will send a packet out on the network.

  • NETCHANNEL_RECV blocks until an incoming packet is received, then passes that packet to user space.

The kernel side of the implementation, for now, is simple and straightforward: a NETCHANNEL_SEND call will allocate an sk_buff structure and fill it with user data with copy_from_user(); the packet is then sent on its way via the network stack in the usual manner. The design envisions adding other, faster ways of moving data around - using Evgeniy's network allocator mechanism, for example - in the future.

The current patch adds a user-space network stack which uses the new netchannel mechanism. It claims to handle TCP and UDP currently, with a number of the expected features; there is a "socket-like interface" presented to applications. There has been no public reaction to this patch set so far, so it is hard to say whether it makes sense to the other network developers or not. Evgeniy appears to be a persistent sort of person, however, so expect to see this code again.


Finally, this large patch set posted by Avi Kivity may stir things up a bit in the virtualization area. These patches implement support for Intel's virtualization extensions (AMD support is said to be forthcoming), allowing Linux systems to easily run virtual machines without the need for a full hypervisor like Xen. It should be noted that the patch set includes a fair amount of Xen code, though.

With this patch set added, a Linux system implements a new device called /dev/kvm. Opening this device creates a new virtual machine which can then be manipulated with a set of ioctl() calls. One important operation creates virtual CPUs for this machine; currently only a single virtual CPU is supported. There is an operation which adds a memory region to the client machine. Memory is organized into "slots" modeled after the physical slots on a motherboard; they are useful for setting up structures like the memory hole at 640K still found on PC-type systems. Other operations allow for the creation of page table entries in the client, manipulating virtual machine registers, intercepting privileged operations, and actually running a program in the client. A set of debugging operations is provided as well.

There is a fair amount of interest in this patch set; it looks like it could be a (relatively!) simple way of adding hardware virtualization support to the kernel. One comment which has been posted remarks on the similarities between this functionality and the work which has been done to support the "synergistic processing units" (SPUs) on the Cell architecture. The SPU support, which has been in the kernel since 2.6.16, uses a special-purpose filesystem (rather than ioctl()) to control the clients. Any sort of merger between these two subsystems would thus likely involve the /dev/kvm interface being changed. So this patch set could change quite a bit as it heads toward eventual inclusion.

Comments (7 posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

  • Junio C Hamano: GIT 1.4.3. (October 19, 2006)

Device drivers

Filesystems and block I/O



Virtualization and containers

Benchmarks and bugs


Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds