|
|
Log in / Subscribe / Register

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.20-rc1, released by Linus just after LWN came out on December 13. See the short-form changelog for a (long) list of patches merged for 2.6.20.

Several dozen patches (a relatively small number) have been merged into the mainline git repository since -rc1 came out.

The current -mm tree is 2.6.20-rc1-mm1. Recent changes to -mm include a new version of the user-space device driver feature, an idle notification facility for x86-64, the lumpy reclaim patch, and a new version of the dynamic tick patch.

For older kernels: 2.6.18.6 was released on December 18; it contains a fair number of fixes (including one which is security-related).

Adrian Bunk has released 2.6.16.36 with several patches and 2.6.16.37-rc1 with a few dozen more.

Willy Tarreau has been busy, having released 2.4.33.5 (two security patches), 2.4.33.6 (one more), and 2.4.34-rc3 (perhaps the last before the 2.4.34 final release).

Comments (2 posted)

Kernel development news

Quote of the week

Whilst Red Hat's medical coverage fully covers "mental health" issues, I'd really rather not proceed down this avenue. We can't support *one* kernel properly. On what planet does it make sense to throw more variants in the mix ?

-- Dave Jones

Comments (1 posted)

Why binary-only modules were not banned

For a moment, it seemed like things could happen pretty quickly. Martin Bligh suggested that, rather than trying to nickel-and-dime binary modules to death, it would be more honest to just ban them outright. Andrew Morton spoke out in favor of the idea as long as a one-year warning was provided. Greg Kroah-Hartman hacked up a patch to insert the warning. And Linus, at the outset, restricted himself to commenting on Greg's poetry.

The tide turned just as quickly, however. Linus spoke out against the change, and Greg withdrew it. It would appear that binary-only modules will continue to be loadable into the kernel for the foreseeable future - though other hazards may await those who distribute them.

The loading of proprietary modules was not banned for a few reasons, the first of which being that there is, in fact, nothing wrong with doing so. The GPL is quite clear in its statement that somebody who is in possession of GPL-licensed code can use it in any way they wish. If they want to combine their nice free kernel with a big, proprietary binary blob, they are fully within their rights to do so. So banning proprietary modules in the kernel source attacks the problem in the wrong place and attempts to forbid an activity which is allowed by the license.

Even if the GPL could be interpreted as forbidding the loading of binary-only modules, there is the fair use issue to consider. As a community, we tend to be generally in favor of a broad interpretation of fair-use rights. But fair use cuts both ways. A number of people in the discussion warned against adopting the tactics favored by the entertainment industry and taking an overly broad view of what the law allows copyright owners to do. As Ben Collins put it:

The gradual changes to lock down kernel modules to a particular license(s) tends to mirror the slow lock down of content (music/movies) that people complain about so loudly. It's basically becoming DRM for code.

The fact that some people were willing to discuss making use of the DMCA to make sure that nobody could patch a proprietary module ban out of the code tends to reinforce this view. Alan Cox noted that people tend to become that which they fight. Most people in the community would probably agree that the entertainment industry is not something we wish to become; this realization has, arguably, done a lot to erode support for the idea of banning proprietary modules.

What the GPL does cover is distribution; anybody who distributes something derived from GPL-licensed code must do so under the terms of the GPL. So it is the act of distributing proprietary modules which enters legally questionable territory. But, as Linus points out, the fact that a module can be loaded into the kernel does not imply that the module is necessarily a derived work of the kernel. The determination of derived work status is a complicated business, and can often require a court to provide the definitive word. But banning all proprietary modules on the idea that they are all illegal derived works is a hard action to defend.

The end result is that there will be no technical measures for the blocking of binary modules added to the kernel anytime soon. Unhappiness with these modules remains, however, as can be seen in Greg's message withdrawing the patch:

It's just that I'm so damn tired of this whole thing. I'm tired of people thinking they have a right to violate my copyright all the time. I'm tired of people and companies somehow treating our license in ways that are blatantly wrong and feeling fine about it. Because we are a loose band of a lot of individuals, and not a company or legal entity, it seems to give companies the chutzpah to feel that they can get away with violating our license.

It seems clear that the issue will not go away, even though this particular approach to addressing it has been rejected. The course which appears to be open to disgruntled kernel developers is legal action: if the distribution of a specific binary module can be shown to be a copyright violation, then the copyright owners have the right to go to court to put a stop to it. GPL enforcement efforts have, so far, tended to be successful. So it would not be surprising to see one or more developers decide to bring a suit against a binary module distributor in the next year or so. The discontent which is so visibly out there is unlikely to just fade away.

Comments (14 posted)

A gnarly 2.6.19 file corruption bug

When Linus released 2.6.19, he expressed a certain degree of confidence about its quality:

It's one of those rare "perfect" kernels. So if it doesn't happen to compile with your config (or it does compile, but then does unspeakable acts of perversion with your pet dachshund), you can rest easy knowing that it's all your own d*mn fault, and you should just fix your evil ways.

While this kernel may have lived up to expectations in a number of ways, it would appear that somebody's evil ways have messed things up - and dachshunds would be well advised to keep a low profile. It seems that this kernel can corrupt ext3 filesystems - behavior which was not in the original set of design goals.

The good news (for users) is that the bug is hard to trigger, and that most access patterns work just fine. The bulk of the trouble seems to come with a certain Bittorrent client, which has an unusual access pattern at best. On occasion, parts of a page will end up being written as zeroes, through to the end of the page. Please do not expect your editor to explain why this is happening; it seems that nobody really understands that yet. The solution, however, may involve some relatively serious low-level memory management surgery.

The apparent origin of the problem is a change in how dirty pages are tracked in the kernel. Prior to 2.6.19, this information lived in the page tables; the 2.6.19 kernel, however, moves some of this information into the page structure. This change enables better tracking of dirty pages in the system, which is a good thing, but it could also be bringing some old bugs out to play.

Not all of those bugs are necessarily in the kernel; at one point, Linus went off and wrote a demonstration program showing how a buggy program would work with older kernels but get surprising results in 2.6.19. What it comes down to is that if a program maps a file into memory, it cannot put data into that memory beyond the current length of the file and expect that data to make it to disk. It was a nice demonstration, but this behavioral change does not appear to be behind the problem reports.

Confusion surrounding the propagation and management of the page dirty bits is at the top of the suspect list, as of this writing. Nobody seems to be able to point at anything specific, however, beyond the fact that the code appears to be rather badly messed up. Says Linus:

A lot of this is actually historical cruft. Some of it may even be code that was never supposed to work, but because we maintained _other_ dirty bits in the PTE's, and never touched them before, we never even realized that the code that played with PG_dirty was totally insane.

So the approach being taken by Linus is to rework the dirty page accounting code into something a little more reasonable. To that end, test_clear_page_dirty() is no more, having been pronounced "insane" by Linus. Instead, the new code tries for a better defined sense of when the dirty bit on a page can be cleared; it comes down to either (1) the page is being written to backing store, or (2) the page is no longer relevant (when a file is truncated, for example). In typical fashion, Linus fixed enough to make his own configuration work, leaving the rest as an exercise for the reader.

He makes no claims that this rework will have solved the problem, only that it makes the code more sane than it was before. As of this writing, there have been no responses from the people who are able to reproduce this problem. If the problem goes away - and the developers can convince themselves that it has not just been papered over - then some version of this fix will likely need to be prepared for a 2.6.19 update. Then, maybe, the dachshunds can come out of hiding.

Comments (15 posted)

Reworking NAPI

NAPI ("new API," though it is not so new anymore) is an interrupt mitigation mechanism used with network devices. When network traffic is heavy, the kernel can safely predict that incoming packets will be available anytime it gets around to looking, so there is no need to have the adapter interrupting it (possibly thousands of times per second) to tell it about those packets. So a NAPI-compliant driver will turn off the packet receive interrupt and provide a poll() method to the kernel. When the kernel is ready to deal with more packets, poll() will be called with a maximum number of packets it is allowed to feed into the kernel; it should process up to that many packets and quit.

With NAPI in place, the kernel can process significantly higher packet loads. The reduction in interrupt load helps, but there are a couple of other advantages as well. The way NAPI works makes it less likely that packets will be reordered in the kernel. And if traffic reaches the point where the kernel is forced to drop packets, those packets can be dumped before they are ever fed into the network stack. For more information on NAPI, see this old LWN article or this page at OSDL, which is newer and more complete.

That page may require some updating soon, however, as Stephen Hemminger has proposed a newer NAPI (NNAPI?) which changes the driver API somewhat. In the current mainline, there are two NAPI-related fields in the net_device structure: poll(), being the function called to collect packets from the adapter, and weight, which is essentially the driver writer's best guess as to how important the interface is relative to any others which might be on the system. Stephen's patch moves these parameters into a separate structure (struct napi_struct), aggregating them with a few other NAPI-related structures.

The napi_struct structure is then put back into struct net_device, but drivers need not use that one. The whole purpose of this patch would appear to be to separate the NAPI-related information from specific network devices. There are some adapters which provide multiple ports, all of which have a single receive interrupt. The separated NAPI information allows all of those ports to share a single NAPI state and a single poll() function; this organization better fits the reality of the hardware.

This patch won't hit mainline before 2.6.21, so authors have some time to react. The changes are relatively simple to make. The first is to find a napi_struct structure for the device; in the absence of a reason to do otherwise, the best solution would be to use the new napi field in the net_device structure. So, if the current code initializes itself with something like:

    dev->weight = MY_WEIGHT;
    dev->poll = my_poll;

The new version would look like this:

    dev->napi.weight = MY_WEIGHT;
    dev->napi.poll = my_poll;

The prototype of the poll() function has changed a bit, however; it now looks like:

    int (*poll)(struct napi_struct *napi, int budget);

The pointer to the net_device structure has been replaced with a pointer to the napi_struct structure. In most cases, the net_device pointer can be had with a call like:

    struct net_device *dev = container_of(napi, struct net_device, napi);

The meaning of the budget parameter has changed slightly as well; it is now the only indicator of how many packets the poll() function may feed into the kernel. There is no longer any need to check the quota field separately. Finally, the return value should be the number of packets which were actually processed.

The other NAPI-related functions in the network system have been modified in fairly predictable ways. NAPI polling is started with either of:

    void napi_schedule(struct napi_struct *napi);
    /* or */
    int napi_schedule_prep(struct napi_struct *napi);
    void __napi_schedule(struct napi_struct *napi);

Polling is turned off with:

    void napi_complete(struct napi_struct *napi);

The current patch is in an early state, so the interfaces could change over the next few months. Nobody has spoken out against it, though, so chances are good that it will be merged in some form.

Comments (none posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.6.20-rc1 ?
Andrew Morton 2.6.20-rc1-mm1 ?
Chris Wright Linux 2.6.18.6 ?
Adrian Bunk Linux 2.6.16.37-rc1 ?
Adrian Bunk Linux 2.6.16.36 ?
Willy Tarreau Linux 2.4.34-rc2 ?
Willy Tarreau Linux 2.4.34-rc3 ?
Willy Tarreau Linux 2.4.33.6 ?
Willy Tarreau Linux 2.4.33.5 ?

Architecture-specific

Core kernel code

Development tools

Device drivers

Jaroslav Kysela alsa-git merge request ?
Jeff Garzik libata updates ?
Kristian =?utf-8?B?SMO4Z3NiZXJn?= New firewire stack - updated patches ?
David Brownell arch-neutral GPIO calls ?

Filesystems and block I/O

Networking

Security-related

Virtualization and containers

Serge E. Hallyn user namespace: Introduction ?

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds