Kernel development [LWN.net]

Kernel release status

The current 2.6 development kernel is 2.6.27-rc6, released on September 9. "Same old deal - except it's been almost two weeks since -rc5. That said, the diff is actually about the same size, so I guess that means things are calming down." Full details may be found in the long-format changelog.

As of this writing, no patches have been merged into the mainline repository since the 2.6.27-rc6 release.

The current stable 2.6 kernel is 2.6.26.5, released on September 7. It contains a single fix for a build error introduced by 2.6.26.4, released earlier the same day. 2.6.26.4 contains a fairly long list of bug fixes.

Also released on the 7th was 2.6.25.17, also containing a fair number of fixes.

For older kernels: the 2.4 process has restarted with the release of 2.4.36.7, fixing "several minor security issues" and a few other problems. 2.4.37-rc1 is also out; this one includes a number of enhancements; see the announcement for details.

Comments (none posted)

Quotes of the week

There's patronage. This is where the Crown Prince of Bavaria, say, gives Linus Torvalds a castle and a moat, and bids him to write code for the pleasure of the court, or else be thrown in the dungeon with those BSD mongrels. Linus goes on to create great works, often prefaced with a large set of logon messages in praise of his honoured patron, only to die later in poverty following some dismissive comments he includes in a kernel driver about the CEO of OSDN's mistresses' pet lioness.

Critics of patronage point out to live on the whims of a distant, self-involved elite is a demeaning life for Linux programmers, reminiscent as it is of both medieval surfdom and being a mere Linux user, both of which being horrid epochs that as a civilisation we imagine we have transcended.

-- Danny O'Brien (a recycled column but still fun).

In Ubuntu we have in general considered upstream to be "our ROCK", by which we mean that we want upstream to be happy with the way we express their ideas and their work. More than happy - we want upstream to be delighted! We focus most of our effort on integration. Our competitors turn that into "Canonical doesn't contribute" but it's more accurate to say we measure our contribution in the effectiveness with which we get the latest stable work of upstream, with security maintenance, to the widest possible audience for testing and love. To my mind, that's a huge contribution.

-- Mark Shuttleworth

Grr. I'd love to say "I told you so", and write another rant about -rc series patches. But I'm too lazy, so people - please mentally insert my standard rant here.

-- Linus Torvalds

I didn't know that sending a test patch which is admittedly not pretty is a capital crime nowadays.

In future I'll restrict myself to look at such stuff only on Monday to Friday between 9AM and 5PM and send test/RFC patches only when they got approved by the nonshitapproval committee, which holds a meeting once a month.

-- Thomas Gleixner

Comments (9 posted)

Tightening the merge window rules

By Jonathan Corbet
September 9, 2008

The 2005 kernel summit included a discussion on a recurring topic: how can the community produce kernels with fewer bugs? One of the problems which was identified in that session was that significant changes were often being merged late in the development cycle with the result that there was not enough time for testing and bug fixing. In response, the summit attendees proposed the concept of the "merge window," a two-week period in which all major changes for a given development cycle would be merged into the mainline. Once the merge window closed, only fixes would be welcome.

Three years later, the merge window is a well established mechanism. Over that time, the discipline associated with the merge window has gotten stronger; it is now quite rare that significant changes go into the mainline outside of the merge window. The one notable exception is that new drivers can be accepted later in the cycle, based on the reasoning that a driver, being completely new and self-contained functionality, cannot cause regressions. Even then, there are hazards: the UVC webcam driver, merged quite late in the 2.6.26 cycle (in 2.6.26-rc9), brought a security hole with it.

The merge window rule is often expressed as "only fixes can go in after the -rc1 release." Recent discussions have made it clear, though, that Linus is starting to develop a rather more restrictive view of how development should go outside of the merge window. The imminent 2008 kernel summit may well find itself taking on this topic and making some changes to the rules.

In short, Linus has concluded that "fixes only" is not disciplined enough; a lot of work characterized as a "fix" can, itself, be a source of new regressions. So here's how Linus would like developers to operate now:

Here's a simple rule of thumb:

if it's not on the regression list
if it's not a reported security hole
if it's not on the reported oopses list

then why are people sending it to me?

There can be no doubt that the tighter rules have come as a surprise to a number of developers - if nothing else, the frequency with which Linus has found himself getting grumpy with patch submitters makes that clear.

And, the truth of the matter is that Linus has not enforced anything like the above rule in the past. Beyond new drivers, post-merge-window changes have typically included things like coding style and white space fixups, minor feature enhancements, defconfig updates, documentation updates, annotations for the sparse tool, and so on. Relatively few of these changes come equipped with an entry on the regression list.

To look at this another way, here's a table which appeared in the 2.6.26 development statistics article, updated with 2.6.27 (to date) information:

Release Changesets merged

For -rc1 after -rc1

2.6.23 4505 2570

2.6.24 7132 3221

2.6.25 9629 3078

2.6.26 7555 2577

2.6.27* 7733 2451

* (Through September 9).

Release	Changesets merged
For -rc1	after -rc1
2.6.23	4505	2570
2.6.24	7132	3221
2.6.25	9629	3078
2.6.26	7555	2577
2.6.27*	7733	2451

2.6.27 appears to be following the trend set by previous kernels: on the order of 25% of the total changesets will be merged outside of the nominal merge window. The most recent 2.6.27 regression summary shows a total of 150 regressions during this development cycle, of which 33 were unresolved. That suggests that at least 2300 patches merged since 2.6.27-rc1 were not fixes for listed regressions.

So the "regression fixes only" policy is truly new - and not really effective yet. Should this policy hold, it could have a number of interesting implications including, perhaps, an increase in the number of non-regression fixes shipped in distributor kernels. It might make developers become more diligent about reporting regressions so that the associated fix can be merged. With fewer changes going in later in the cycle, development cycles might just get a little shorter, perhaps even to the eight weeks that was, once, the nominal target. And, of course, we might just get kernel releases with fewer bugs, which would be a hard thing to complain about. In the short term, though, expect more grumpy emails to developers who are still trying to work by the older rules.

Comments (14 posted)

LIRC delurks

By Jonathan Corbet
September 10, 2008

The Linux Infrared Remote Control project (LIRC) provides drivers for a number of infrared receivers and transmitters. It is, perhaps, most heavily used by people running MythTV and similar packages; it would, after all, completely ruin the experience to have to get up from the couch to change channels. Despite their established user base, and despite the fact that a number of distributors ship the code, the LIRC drivers have never found their way into the mainline kernel. In more recent times, little effort has gone into their development and maintenance; the link to "Caldera OpenLinux" on the project's web site would seem to make that clear.

But LIRC is useful code, and, as is the case with most out-of-tree drivers, most people would really rather see LIRC in the mainline kernel. Merging into the mainline got a step closer on September 9, when Jarod Wilson posted a version of the LIRC drivers for consideration. Jarod, it seems, has been working (with Janne Grunau) on these drivers for some months; in the process, they have eliminated "tens of thousands" of complaints from the checkpatch.pl script and cleaned up a number of things.

Even after that work, though, the LIRC drivers are clearly not yet up to normal kernel standards. Some very strange coding conventions are used in places. Many of the drivers have broken (or completely absent) locking. Duplicated code abounds. One driver has implemented a command parser in its write() function. Another driver is for hardware which already has a different driver in the mainline. And, importantly, these drivers do not work with the input subsystem.

[PULL QUOTE: The LIRC drivers would appear to strongly support the notion that out-of-tree code is, almost by necessity, worse code. END QUOTE] In the past, Linus Torvalds (and others) have argued for merging drivers as soon as possible. If the code is poor, its chances of being improved get much higher once it's in the mainline and others can fix it. The LIRC drivers would appear to strongly support the notion that out-of-tree code is, almost by necessity, worse code. These drivers have been around for almost a decade, have been packaged by distributors, and have been used by large numbers of people. Despite all of that, they contain a large number of serious problems which have never been addressed.

Now that the drivers have been posted to the linux-kernel list, quite a few of these problems are being pointed out; Jarod and Janne have been responding to reviews and fixing the issues. The "merge drivers early" philosophy would argue for pushing LIRC into 2.6.28, even if serious problems remain. Presence in the mainline will raise the visibility of the code, inspiring (one hopes) more developers to work on fixing it up. Merging LIRC will also free distributors from the need to create separate packages for those drivers.

One important question will have to be addressed before merging LIRC can be seriously considered, though: its user-space API. Once LIRC is merged, its user-space API will be set in stone, so any problems with that API need to be resolved first. LIRC, being out of the mainline, did not follow the development of the input subsystem, so it does not behave like other input drivers - even in-tree drivers for infrared remotes. The use of an in-kernel command-line parser in at least one driver is sure to raise eyebrows; that sort of interaction should really be handled via ioctl() or sysfs. All told, it is hard to imagine this code being merged until the API problems have been resolved.

Changing the LIRC API will, of course, lead to problems of its own. There is user-space code which depends on the current API; any changes will break that code. The kernel community will certainly understand this problem, but is unlikely to be swayed by it. There are a number of risks associated with maintaining production kernel code out of the mainline tree; one of those risks is that your established APIs will not be accepted by the kernel development community. So an API change may simply be part of the cost of getting LIRC into the mainline at this late date.

It should be a cost worth paying. Once LIRC is in the mainline, interested developers will work to continue to bring the code up to kernel standards. The community will maintain it going forward. All Linux users will get the LIRC drivers with their kernel, with no need to deal with external packages. Getting there may be a bit frustrating for users of remotes and (especially) for the developers who have taken on the task of getting this code into the mainline. But, once it's done, remotes will just be more normal hardware, supported by the kernel like everything else.

Comments (5 posted)

System calls and rootkits

By Jake Edge
September 10, 2008

A patch to add some security checks before making system calls would seem like a reasonable addition to the kernel, but because it is, at best, a half-measure, it received a less than enthusiastic response. Preventing rootkits—malware that alters the kernel to hide its presence and function—from altering the system call table was the rationale behind the patch, but it would only work for the current crop of rootkits. Once that change was made, rootkit authors would just change their modus operandi in response.

There are many possible ways that a root user—or malware running as root—can modify a Linux system to run rootkit code. Some currently "popular" rootkits modify the system call table, though it is ostensibly read-only. Some commercial malware scanners that run on Linux have also been known to use this technique. In both cases, certain system calls are re-routed from the standard kernel code to code that lives elsewhere. That code, running in kernel mode, can then do just about anything it wants with the system.

Arjan van de Ven proposed a patch that hooked into the system call entry code to check the address of the call to ensure that it was within the addresses occupied by kernel code. He describes the change and its impact this way:

The patch below, while obviously not perfect protection against malware, adds some cheap sanity checks to the syscall path to verify the system call is actually still in the kernel code region and not some external-to-this region such as a rootkit.

The overhead is very minimal; measured at 2 cycles or less. (this is because the branches get predicted right and the rest of the code is almost perfectly parallelizable... and an indirect function call is a branch issue anyway)

Various kernel hackers pointed out the flaws inherent in that scheme. As Andi Kleen succinctly puts it:

This just means that the root kits will switch to patch the first instruction of the entry points instead. [...] So the protection will be zero to minimal, but the overhead will be there forever.

One of the more interesting ideas to come out of the discussion was Alan Cox's thoughts on using a hypervisor to enforce protections:

The only place you can expect to make a difference here is in virtualised environments by teaching KVM how to provide 'irrevocably read only' pages to guests where the guest OS isn't permitted to change the rights back or the virtual mapping of that page.

Ingo Molnar described a rather complicated scheme that might increase the likelihood of a rootkit being detected, but with a fairly high cost—in build complexity as well as the ability to debug the resulting kernel. The compiler would be changed to insert calls to rootkit checks randomly throughout the kernel binary in ways that would be difficult or impossible for a rootkit to detect and evade. In the end, though, a rootkit could simply install a new kernel that does exactly what it wants, then cause, or wait for, a reboot.

Without some kind of hardware enforcement (e.g. Trusted Platform Module) or locked-down virtualization, Linux is defenseless against attacks that run as root. The kernel could change to thwart a particular kind of attack, such as van de Ven's patch, but other kinds of attacks will still succeed. It is clearly a situation where "the only way to win is not to play this game", as Pavel Machek—amongst others—noted in the thread.

In the end, van de Ven wrote off the patch as an exercise in measuring the cost of this kind of runtime checking. It was fairly low cost solution, but without any major upside. The real upside was getting kernel hackers thinking about the problem, which could lead to some better solutions down the road.

Comments (9 posted)

Linus Torvalds Linux 2.6.27-rc6 ?

Andrew Morton 2.6.27-rc5-mm1 ?

Greg KH Linux 2.6.26.4 ?

Greg KH Linux 2.6.26.5 ?

Steven Rostedt 2.6.26.5-rt8 ?

Steven Rostedt 2.6.26.3-rt5 ?

Steven Rostedt 2.6.26.3-rt6 ?

Steven Rostedt 2.6.26.3-rt7 ?

Greg KH Linux 2.6.25.17 ?

Steven Rostedt 2.6.24.7-rt18 ?

Willy Tarreau Linux 2.4.37-rc1 ?

Willy Tarreau Linux 2.4.36.7 ?

Thomas Gleixner TSC calibration improvements ?

Nir Tzachar ncurses based config ?

Mike Travis smp: reduce stack requirements for genapic send_IPI_mask functions ?

Vaidyanathan Srinivasan Tunable sched_mc_power_savings=n ?

Mathieu Desnoyers Priority Sifting Reader-Writer Lock v13 ?

Oren Laadan Kernel based checkpoint/restart` ?

jmerkey@wolfmountaingroup.com mdb: Merkey's Linux Kernel Debugger Release 2.6.27-rc6 ?

Dmitry Baryshkov platform: add new device registration helper ?

Thomas Hellstrom TTM user interface. ?

Manu Abraham DVB Update [PATCH 0/31] multiproto tree ?

Bartlomiej Zolnierkiewicz ide: add generic ATA/ATAPI disk driver ?

Steve Glendinning SMSC LAN9500 USB2.0 10/100 ethernet adapter driver ?

Frank Zago V4L2 driver for some Fujifilm cameras ?

Jarod Wilson linux infrared remote control drivers ?

Alex Chiang PCI: let the core manage slot names ?

Mark Brown mfd: Core support for the WM8400 AudioPlus HiFi CODEC and PMU ?

Michael Kerrisk man-pages-3.09 is released ?

Takashi Sato freeze feature ver 1.11 ?

Evgeniy Polyakov New distributed storage release. ?

Conrad Meyer FUSE-Tux3 (At your own risk) ?

Hugh Dickins discarding swap ?

Mark Fasheh Fiemap, an extent mapping ioctl ?

Parag Warudkar x86: sysfs - kill owner field from attribute ?

Andy Whitcroft Reclaim page capture v2 ?

Hamid R. Jahanjou VM: Implements the swap-out page-clustering technique ?

Evgeniy Polyakov Network channels. ?

Luis R. Rodriguez New regulatory infrastructure for cfg80211 and drivers ?

Arjan van de Ven Add basic sanity checks to the syscall execution patch ?

Paul Moore Labeled networking patches for 2.6.28 ?

Jeremy Fitzhardinge x86: lay groundwork for Xen domain 0 support ?

sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org : Enable multiple mounts of devpts ?

Rafael J. Wysocki 2.6.27-rc5-git8: Reported regressions from 2.6.26 ?

Karel Zak util-linux-ng v2.14.1 (stable) ?

Kernel development

Brief items

Kernel release status

Kernel development news

Quotes of the week

Tightening the merge window rules

LIRC delurks

System calls and rootkits

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Networking

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous