LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 development kernel is 2.6.25-rc9, released on April 11. The stable 2.6.25 release is imminent, and will likely be out by the time you read this; your editor suspects that Linus is just waiting for LWN to be published before shoving the release out the door.

The current -mm tree is 2.6.25-rc8-mm2. Recent changes to -mm include the new suspend and hibernation infrastructure, another long series of IDE patches, some wireless USB work, and kernel marker support for proprietary modules.

Comments (4 posted)

Kernel development news

Quotes of the week (review)

We need higher S/N on l-k. We need people looking into the subsystem trees as those grow and causing a stench when bad things are found, with design issues getting brought to l-k if nothing else helps. We need tree maintainers understanding that review, including out-of-community one, is needed (the need of testing is generally better understood - I _hope_).
-- Al Viro (read the whole thing)

That all sounds good and I expect few would disagree. But if it is to happen, it clearly won't happen by itself, automatically. We will need to force it upon ourselves and the means by which we will do that is process changes. The thing which is being disparaged as "bureaucracy".

The steps to be taken are:

a) agree that we have a problem

b) agree that we need to address it

c) identify the day-to-day work practices which will help address it (as you have done)

d) identify the process changes which will force us to adopt those practices

e) implement those process changes.

I have thus far failed to get us past step a).

-- Andrew Morton

I for one do not agree that we have a problem.
-- Arjan van de Ven

Comments (none posted)

Atheros hires ath5k developer

When kernel developers talk about problematic hardware vendors, Atheros often appears near the top of their lists. So this announcement from Luis Rodriguez, a developer of the reverse-engineered ath5k driver, is intriguing: "I write to you to inform you that I have decided to join Atheros as a full time employee, as a Software Engineer, to help them with their goals and mission to get every device of Atheros supported upstream in the Linux kernel." What will come of this remains to be seen, but if it truly signals a change of heart at Atheros, it is a most welcome development.

Full Story (comments: 29)

TOMOYO Linux and pathname-based security

By Jonathan Corbet
April 14, 2008
It takes a certain kind of courage to head down a road when one can plainly see the unpleasant fate which befell those who went before. So one might think that the fate of AppArmor would deter others from following a similar path. The developers of TOMOYO Linux are not easily put off, though. Despite having a security subsystem which shares a number of features with AppArmor, these developers are pushing forward in an attempt to get their code into the mainline.

AppArmor, remember, is a Linux security module which uses pathnames to make security decisions. So it is entirely conceivable that two different security policies could apply to the same file if that file is accessed by way of two different names. This approach helps make AppArmor easier to administer than SELinux, but it has given AppArmor major problems in the review process for a few reasons:

  • There has been strong resistance to the addition of any new security modules at all, to the point that proposals to remove the LSM framework altogether have been floated.

  • Some security developers see a pathname-based mechanism as being fundamentally insecure. SELinux developers, in particular, have been very strongly against pathname-based security. To these developers, security policies should apply directly to objects (or to labels attached directly to objects) rather than to names given to objects.

  • The current Linux security module hooks, not being developed with pathname-based security in mind, do not provide sufficient information to the low-level file operation hooks. So AppArmor had to reconstruct pathnames within its security hooks. The method chosen for this reconstruction was, one might say, not universally admired.

If the TOMOYO Linux developers are serious about getting their code into the mainline, they will need to have answers to these objections.

As it happens, the first two obstructions have mostly gone away. Casey Schaufler's persistence finally resulted in the merging of the SMACK security module for 2.6.25; it is the only such module, other than SELinux, ever to get into the mainline. Now that SMACK has paved the way, talk of removing the LSM framework (which had been strongly vetoed by Linus in any case) has ended and the next security module should have an easier time of it.

Linus has also decreed that pathname-based security modules are entirely acceptable for inclusion into the kernel. So, while some developers remain highly skeptical of this approach, their skepticism cannot, on its own, be used as a reason to keep a pathname-based security module out. Pathname-based approaches appear to be "secure enough" for a number of applications, and there are some advantages to using that approach.

All of the above is moot, though, if the TOMOYO Linux developers are unable to implement pathname-based access control in a way which passes muster. The recent TOMOYO Linux patch took a different approach to this problem: since the LSM hooks do not provide the needed information, the developers just added a new set of hooks, outside of LSM, for use by TOMOYO Linux. And, while they were at it, they added new hooks at all enforcement points. This was not a popular decision, to say the least. The whole idea behind LSM was to have a single set of hooks for all security modules; if every module now adds its own set of hooks, that purpose will have been defeated and the kernel will turn into a big mess of security hooks. Duplicating the LSM framework is not the way to get a security module into the mainline.

So, somehow, the TOMOYO Linux developers will need to implement pathname-based security in a different way. The most obvious thing to do would be to modify the existing hooks to supply the requisite information (being a pointer to the vfsmount structure). The problem here is that, at the point where the LSM hooks are called, that structure is not available; it is only used at the higher levels of the virtual filesystem code. So either some core VFS functions would have to be changed (so the vfsmount pointer could be passed into them), or a new set of hooks would need to be placed at a level where that pointer is available. It appears that the second approach - adding new hooks in the namespace code - will be taken for the next version of the patch.

As the TOMOYO Linux developers work through this problem, they are likely to be closely watched by the (somewhat reduced in number) AppArmor group. There appears to be a resurgence of interest in getting AppArmor merged, so we will probably see AppArmor put forward again in the near future. That will be even more likely if TOMOYO Linux is able to solve the pathname problem in a way which survives review and gets into the kernel.

Comments (none posted)

e1000 v. e1000e

By Jonathan Corbet
April 15, 2008
Ingo Molnar was recently bitten by a problem which, in one form or another, may affect a wider range of Linux users after 2.6.26. Linux currently has two drivers for Intel's e1000 network adapters, called "e1000" and "e1000e". The former driver, being the older of the two, supports all older, PCI-based e1000 adapters. There is, shall we say, a relative shortage of developers who are willing to stand up for the quality of the code in this driver, but it works and has a lot of users.

The e1000e driver, instead, supports PCI-Express adapters. It is a newer driver which is seen as being better written and easier to maintain. It is intended that all new hardware will be supported by this driver, and that, in particular, all PCI-Express hardware will use it. The only problem is that a few PCI-Express chipsets were added to the older e1000 driver before this policy was adopted. Since the newer driver also supports those chipsets, there are two drivers (with two completely different bodies of code) supporting the same hardware. The e1000 maintainers would like to end this duplication and put the e1000 driver into a stable maintenance mode.

To that end, earlier this month, it was announced that, as of 2.6.26, the PCI IDs corresponding to PCI-Express devices would be removed from the e1000 driver, and that all users of that affected hardware need to move over to e1000e. The e1000 developers had originally tried to make this move for 2.6.25, but they committed a fundamental faux pas in the process: they broke Linus's machine. So that change got reverted before 2.6.25-rc1 came out. Instead, now, we have the announcement that the change is coming in the next cycle (when the e1000e problems, presumably, will be fixed) and a bit of configuration trickery has been added; it causes the e1000 driver to not claim PCI-Express devices if the e1000e driver has been built into the kernel.

Ingo's problem is that he built the e1000 driver into his kernel, but ended up with e1000e configured as a module which was never loaded. That combination leads to a network adapter which does not work at all, since the built-in driver no longer claims it. Ingo, a bit disgruntled at having to spend an hour tracking down the problem, has suggested that it is a regression which must be fixed. The e1000 driver maintainers have resisted doing so, but Linus, having also been burned, agrees. So, while this transition is likely to go ahead as scheduled, 2.6.25 will probably have a configuration change designed to keep others from falling into a similar trap.

Comments (1 posted)

OMFS and the value of obscure filesystems

By Jonathan Corbet
April 15, 2008
Your editor has never dabbled in filesystems development. He has a suspicion, however, that there is a tense moment in every new filesystem developer's life: when Christoph Hellwig's review shows up in the mailbox. Christoph's reviews, while not always being pleasant reading, tend to be right on the money with regard to problems in filesystem implementations - and problems in new filesystems are common. Christoph's stamp of approval is almost required for the merging of a filesystem, so, when the initial posting of a filesystem is greeted with reviews that read, nearly in their entirety, "looks good," one would assume that the path into the mainline would be straightforward.

The story of OMFS, though, shows that this assumption does not always hold. Reviewers have only been able to find the smallest of details to fix, but there is opposition to its merging, especially from Andrew Morton. The objection is that this filesystem - found on devices like the Rio Karma music player and ReplayTV boxes - has a very small user base. OMFS developer Bob Copeland, in his initial posting, suggested that fewer than twenty people might be using it at this time. New devices with this filesystem are no longer being made, so the chances of the user base growing significantly are small.

Andrew's objection is that the addition of any new code creates a new maintenance burden for kernel developers. Whenever a VFS interface is changed, all filesystems must be fixed to work with the new API. So the addition of a filesystem imposes costs which, he says, should be outweighed by the benefits that new filesystem brings. In the case of an obscure filesystem with a small and (presumably) decreasing user base, says Andrew, it is not clear that the benefits are sufficient. He asks:

Just as a thought exercise: should we merge a small and well-written driver which has zero users?

Andrew would rather see OMFS turned into a user-space filesystem using FUSE. Chris Mason is also concerned:

Even though OMFS seems to be using the generic interfaces well, there is still a testing burden for every change. Someone needs to try it, report any problems and get them fixed. Since none of the people making the changes is likely to have an OMFS test bed, all of that burden will fall on Bob, his users, and anyone who tries to compile the module (Andrew).

OMFS supporters note that the code is written well and can serve as an example for other filesystem authors. They also note that code with small user bases is often merged - that, in fact, in some areas, developers have said they want all code, regardless of how few people are using it. Running OMFS through FUSE, they say, would be harder for users to set up and less efficient in operation. Says Christoph:

Moving a simple block based filesystem means it's more complicated, less efficient because of the additional context switches and harder to use because you need additional userspace packages and need to setup fuse.

We made writing block based filesystems trivial in the kernel to grow more support for filesystems like this one.

In this case, it looks like Andrew will back down on this one and let the next version of the OMFS patches into -mm. From there, if all goes well, it could make the jump into the mainline, possibly as early as 2.6.27. But Andrew is clearly unhappy about that outcome, and may well raise the question again in the future: is "well written" really sufficient to justify merging new filesystems into the kernel?

Comments (11 posted)

Bisection divides users and developers

By Jonathan Corbet
April 15, 2008
The last couple of years have seen a renewed push within the kernel community to avoid regressions. When a patch is found to have broken something that used to work, a fix must be merged or the offending patch will be removed from the kernel. It's a straightforward and logical idea, but there's one little problem: when a kernel series includes over 12,000 changesets (as 2.6.25 does), how does one find the patch which caused the problem? Sometimes it will be obvious, but, for other problems, there are literally thousands of patches which could be the source of the regression. Digging through all of those patches in search of a bug can be a needle-in-the-haystack sort of proposition.

One of the many nice tools offered by the git source code management system is called "bisect." The bisect feature helps the user perform a binary search through a range of patches until the one containing the bug is found. All that is needed is to specify the most recent kernel which is known to work (2.6.24, say), and the oldest kernel which is broken (2.6.25-rc9, perhaps), and the bisect feature will check out a version of the kernel at the midpoint between those two. Finding that midpoint is non-trivial, since, in git, the stream of patches is not a simple line. But that's the sort of task we keep computers around for. Once the midpoint kernel has been generated, the person chasing the bug can build and test it, then tell git whether it exhibits the bug or not. A kernel at the new midpoint will be produced, and the process continues. With bisect, the problematic patch can be found in a maximum of a dozen or so compile-boot-test cycles.

Bisect is not a perfect tool. If patch submitters are not careful, bisect can create a broken kernel when it splits a patch series. The patch which causes a bug to manifest itself may not be the one which introduced the bug. In the worst case, a developer may merge a long series of patches, finishing with one brief change which enables all the code added previously; in this case, bisect will find the final patch, which will only be marginally useful. If the person reporting the bug is running a distributor's kernel, it may be hard to get that kernel in a form which is amenable to the bisection process. Bisection might require unacceptable downtime on the only (production) system which is affected by the bug. And, of course, the process of checking out, building, booting, and testing a dozen kernels is not something which one fits into a coffee break. It requires a certain determination on the part of the tester and quite a bit of time.

All of the points above would suggest that requesting a bisection from a user reporting a bug should be done as a last resort. In that context, it is worth looking at the story of a recent bug report which suggests that some observers, at least, think that kernel developers are relying a little too heavily on this tool. An April 9, Mark Lord reported a regression in the networking stack; after making a couple of guesses, the network developers suggested that the problem be bisected.

Mark replied that he did not have the time to go through a full bisection, and that he would much rather be provided a list of commits which might be at fault. That list was not forthcoming, though; there were no developers who had an idea of where the problem might be and, as it turns out, the developer who introduced the bug lives in a time zone which caused him to miss the discussion. Mark's response was strong:

Years ago, Linus suggested that he opposed an in-kernel debugger mainly because he preferred that we *think* more about the problems, rather than just finding/fixing symptoms. This 100% reliance upon git-bisect is worse than that. It has people now just tossing regressions into the code left and right, knowing that they can toss all of the testing back at the poor folks whose systems end up not working.

Andrew Morton also worries that developers resort too quickly to a bisection request rather than working with users as was once done. Either that, he says, or developers just ignore the report from the beginning.

Other developers have answers to these worries, of course. Kernel developers often are not in a position to reproduce a reported bug; it may depend on the specifics of the user's hardware or workload. So they must depend on the user to try things and inform them when a change fixes the problem. Here's David Miller's view on how things used to work:

In fact, this is what Andrew's so-called "back and forth with the bug reporter" used to mainly consist of. Asking the user to try this patch or that patch, which most of the time were reverts of suspect changes. Which, surprise surprise, means we were spending lots of time bisecting things by hand.

We're able to automate this now and it's not a bad thing.

The other answer that one hears is that the situation now is much different, with far more users, much more code, and more problems to deal with. The old "back and forth" mode was better suited to smaller user and developer communities; in the current world, things must be done differently. David Miller again:

What people don't get is that this is a situation where the "end node principle" applies. When you have limited resources (here: developers) you don't push the bulk of the burden upon them. Instead you push things out to the resource you have a lot of, the end nodes (here: users), so that the situation actually scales.

There is another aspect of the problem which is spoken about a bit less frequently: developers must prioritize bug reports and decide which ones to work on. Unlike some projects, the kernel does not have anybody serving in any sort of bug triage role, so, in the absence of a disgruntled and paying customer, most developers make their own decisions on which problems to try to solve. It should not be surprising that problems with the most complete information are the ones which are most likely to be addressed first.

A bug report with a bisection that fingers a specific commit is a report with very good information, one which is generally easy to resolve. As an example, consider Mark Lord's report again; he did eventually take the time (five hours, apparently) to bisect the problem and report the results; the bug was found and fixed almost immediately thereafter - despite the fact that the responsible developer was still sleeping on the other side of the planet.

Even less spoken about is the fact that quite a few problems are one-off occurrences. Somewhere out there in the world, there is a single user who, due to a highly uncommon mixture of hardware and software, experiences a problem which affects (almost) nobody else. Marginal hardware, out-of-tree patches, and overclocking only make the problem worse. Arjan van de Ven's kernel oops summaries are illustrative in this regard; the statistics for the 2.6.25-rc kernels show that a half-dozen problems account for over half of the reports, while the vast majority of oopses have only a single occurrence.

Kernel developers have learned that this kind of problem report tends to go away by itself; the affected user finds a way around the issue (or just gives up) and nobody else ever complains. One can well argue that trying to chase down this kind of problem is not a good use of a kernel developer's time. The hard part is figuring out which reports are of this variety. One relatively straightforward way is to wait until reports from other users confirm the problem - or until a sufficiently determined user bisects the problem and provides a commit ID. In this sense, bisection serves as a sort of triage mechanism which requires users to perform enough work to show that the problem is real.

So the developers do have very good reasons for requesting bisections from users. That said, there is reason to worry that many users will simply stop sending in bug reports. If the only response they can expect is a bisection request (which they may be in no position to answer), they may see no point in reporting bugs at all. Fewer bug reports is not the path toward more solid kernel releases. So, as useful as it is, bisection will have to be a tool of last resort in most cases. The good news is that the development community does seem to understand that; bisection remains just one of the many tools we have for the isolation and solution of problems.

The not-quite-so-good news is that, as Al Viro and James Morris have pointed out, the real problem is in the review of code so that fewer bugs are created in the first place. That is not a problem which can be solved with bisection.

Comments (25 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

  • Nick Piggin: SLQB v2. (April 10, 2008)

Networking

Architecture-specific

Security-related

Virtualization and containers

Benchmarks and bugs

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds