The current 2.6 development kernel is 2.6.25-rc9
on April 11. The
stable 2.6.25 release is imminent, and will likely be out by the time you
read this; your editor suspects that Linus is just waiting for LWN to be
published before shoving the release out the door.
The current -mm tree is 2.6.25-rc8-mm2. Recent changes
to -mm include the new suspend
and hibernation infrastructure, another long series of IDE patches,
some wireless USB work, and kernel marker support for
Comments (4 posted)
Kernel development news
We need higher S/N on l-k. We need people looking into the
subsystem trees as those grow and causing a stench when bad things
are found, with design issues getting brought to l-k if nothing
else helps. We need tree maintainers understanding that review,
including out-of-community one, is needed (the need of testing is
generally better understood - I _hope_).
-- Al Viro
(read the whole thing)
That all sounds good and I expect few would disagree. But if it is
to happen, it clearly won't happen by itself, automatically. We
will need to force it upon ourselves and the means by which we will
do that is process changes. The thing which is being disparaged as
The steps to be taken are:
a) agree that we have a problem
b) agree that we need to address it
c) identify the day-to-day work practices which will help address it (as
you have done)
d) identify the process changes which will force us to adopt those practices
e) implement those process changes.
I have thus far failed to get us past step a).
-- Andrew Morton
I for one do not agree that we have a problem.
-- Arjan van de Ven
Comments (none posted)
When kernel developers talk about problematic hardware vendors, Atheros
often appears near the top of their lists. So this announcement from Luis
Rodriguez, a developer of the reverse-engineered ath5k driver, is
intriguing: "I write to you to inform you that I have decided to join
Atheros as a full time employee, as a Software Engineer, to help them
with their goals and mission to get every device of Atheros supported
upstream in the Linux kernel.
" What will come of this remains to be
seen, but if it truly signals a change of heart at Atheros, it is a most
Full Story (comments: 29)
It takes a certain kind of courage to head down a road when one can plainly
see the unpleasant fate which befell those who went before. So one might
think that the fate of AppArmor would deter others from following a similar
path. The developers of TOMOYO
are not easily put off, though. Despite having a security
subsystem which shares a number of features with AppArmor, these developers
are pushing forward in an attempt to get their code into the mainline.
AppArmor, remember, is a Linux security module which uses pathnames to make
security decisions. So it is entirely conceivable that two different
security policies could apply to the same file if that file is accessed by
way of two different names. This approach helps make AppArmor easier to
administer than SELinux, but it has given AppArmor major
problems in the review process for a few reasons:
- There has been strong resistance to the addition of any new security
modules at all, to the point that proposals to remove the LSM
framework altogether have been floated.
- Some security developers see a pathname-based mechanism as being
fundamentally insecure. SELinux developers, in particular, have been
very strongly against pathname-based security. To these developers,
security policies should apply directly to objects (or to labels
attached directly to objects) rather than to names given to objects.
- The current Linux security module hooks, not being developed with
pathname-based security in mind, do not provide sufficient information to
the low-level file operation hooks. So AppArmor had to reconstruct
pathnames within its security hooks. The method chosen for this
reconstruction was, one might say, not universally admired.
If the TOMOYO Linux developers are serious about getting their code into
the mainline, they will need to have answers to these objections.
As it happens, the first two obstructions have mostly gone away. Casey
Schaufler's persistence finally resulted in the merging of the SMACK
security module for 2.6.25; it is the only such module, other than SELinux,
ever to get into the mainline. Now that SMACK has paved the way, talk of
removing the LSM framework (which had been strongly vetoed by Linus in any
case) has ended and the next security module should have an easier time of
Linus has also decreed that pathname-based security modules are entirely
acceptable for inclusion into the kernel. So, while some developers remain
highly skeptical of this approach, their skepticism cannot, on its own, be
used as a reason to keep a pathname-based security module out.
Pathname-based approaches appear to be "secure enough" for a number of
applications, and there are some advantages
to using that approach.
All of the above is moot, though, if the TOMOYO Linux developers are unable
to implement pathname-based access control in a way which passes muster.
The recent TOMOYO Linux patch
took a different approach to this problem: since the LSM hooks do not
provide the needed information, the developers just added a new set of
hooks, outside of LSM, for use by TOMOYO Linux. And, while they were at
it, they added new hooks at all enforcement points. This was not a popular
decision, to say the least. The whole idea behind LSM was to have a single
set of hooks for all security modules; if every module now adds its own set
of hooks, that purpose will have been defeated and the kernel will turn
into a big mess of security hooks. Duplicating the LSM framework is not
the way to get a security module into the mainline.
So, somehow, the TOMOYO Linux developers will need to implement
pathname-based security in a different way. The most obvious thing to do
would be to modify the existing hooks to supply the requisite information
(being a pointer to the vfsmount structure). The problem here is
that, at the point where the LSM hooks are called, that structure is not
available; it is only used at the higher levels of the virtual filesystem
code. So either some core VFS functions would have to be changed (so the
vfsmount pointer could be passed into them), or a new set of hooks
would need to be placed at a level where that pointer is available. It appears that the second approach - adding new
hooks in the namespace code - will be taken for the next version of the
As the TOMOYO Linux developers work through this problem, they are likely
to be closely watched by the (somewhat reduced in number) AppArmor group.
There appears to be a resurgence of interest in getting AppArmor merged, so
we will probably see AppArmor put forward again in the near future. That
will be even more likely if TOMOYO Linux is able to solve the pathname
problem in a way which survives review and gets into the kernel.
Comments (none posted)
Ingo Molnar was recently bitten
by a problem which, in one form or
another, may affect a wider range of Linux users after 2.6.26. Linux
currently has two drivers for Intel's e1000 network adapters, called
"e1000" and "e1000e". The former driver, being the older of the two,
supports all older, PCI-based e1000 adapters. There is, shall we say, a
relative shortage of developers who are willing to stand up for the quality
of the code in this driver, but it works and has a lot of users.
The e1000e driver, instead, supports PCI-Express adapters. It
is a newer driver which is seen as being better written and easier to
maintain. It is intended that all new hardware will be supported by this
driver, and that, in particular, all PCI-Express hardware will use it. The
only problem is that a few PCI-Express chipsets were added to the older
e1000 driver before this policy was adopted. Since the newer driver also
supports those chipsets, there are two drivers (with two completely
different bodies of code) supporting the same hardware. The e1000
maintainers would like to end this duplication and put the e1000 driver
into a stable maintenance mode.
To that end, earlier this month, it was announced that,
as of 2.6.26, the PCI IDs corresponding to PCI-Express devices would be
removed from the e1000 driver, and that all users of that affected hardware
need to move over to e1000e. The e1000 developers had originally tried
to make this move for 2.6.25, but they committed a fundamental faux
pas in the process: they broke Linus's machine. So that change got
reverted before 2.6.25-rc1 came out. Instead, now, we have the
announcement that the change is coming in the next cycle (when the e1000e
problems, presumably, will be fixed) and a bit of configuration trickery
has been added; it causes the e1000 driver to not claim PCI-Express
devices if the e1000e driver has been built into the kernel.
Ingo's problem is that he built the e1000 driver into his kernel, but
ended up with e1000e configured as a module which was never loaded. That combination leads
to a network adapter which does not work at all, since the built-in driver
no longer claims it. Ingo, a bit disgruntled at having to spend an hour
tracking down the problem, has suggested that it is a regression which must
be fixed. The e1000 driver maintainers have resisted doing so, but Linus,
having also been burned, agrees. So, while
this transition is likely to go ahead as scheduled, 2.6.25 will probably
have a configuration change designed to keep others from falling into a
Comments (1 posted)
Your editor has never dabbled in filesystems development. He has a
suspicion, however, that there is a tense moment in every new filesystem
developer's life: when Christoph Hellwig's review shows up in the mailbox.
Christoph's reviews, while not always being pleasant reading, tend to be
right on the money with regard to problems in filesystem implementations -
and problems in new filesystems are common. Christoph's stamp of approval
is almost required for the merging of a filesystem, so, when the initial
posting of a filesystem is greeted with reviews that read, nearly in their
entirety, "looks good," one would assume that the path into the mainline
would be straightforward.
The story of OMFS, though,
shows that this assumption does not always hold. Reviewers have only been able to find
the smallest of details to fix, but there is opposition to its merging,
especially from Andrew Morton. The objection is that this filesystem -
found on devices like the Rio Karma music player and ReplayTV boxes - has a
very small user base. OMFS developer Bob Copeland, in his initial posting,
suggested that fewer than twenty people might be using it at this time.
New devices with this filesystem are no longer being made, so the chances
of the user base growing significantly are small.
Andrew's objection is that the addition of any new code creates a new
maintenance burden for kernel developers. Whenever a VFS interface is
changed, all filesystems must be fixed to work with the new API. So the
addition of a filesystem imposes costs which, he says, should be outweighed
by the benefits that new filesystem brings. In the case of an obscure
filesystem with a small and (presumably) decreasing user base, says Andrew, it is not
clear that the benefits are sufficient. He asks:
Just as a thought exercise: should we merge a small and well-written
driver which has zero users?
Andrew would rather see OMFS turned into a user-space filesystem using
FUSE. Chris Mason is also concerned:
Even though OMFS seems to be using the generic interfaces well,
there is still a testing burden for every change. Someone needs to
try it, report any problems and get them fixed. Since none of the
people making the changes is likely to have an OMFS test bed, all
of that burden will fall on Bob, his users, and anyone who tries to
compile the module (Andrew).
OMFS supporters note that the code is written well and can serve as an
example for other filesystem authors. They also note that code with small
user bases is often merged - that, in fact, in some areas, developers have
said they want all code, regardless of how few people are using it.
Running OMFS through FUSE, they say, would be harder for users to set up
and less efficient in operation. Says
Moving a simple block based filesystem means it's more complicated,
less efficient because of the additional context switches and
harder to use because you need additional userspace packages and
need to setup fuse.
We made writing block based filesystems trivial in the kernel to
grow more support for filesystems like this one.
In this case, it looks like Andrew will back down on this one and let the
next version of the OMFS patches into -mm. From there, if all goes well,
it could make the jump into the mainline, possibly as early as 2.6.27. But
Andrew is clearly unhappy about that outcome, and may well raise the
question again in the future: is "well written" really sufficient to
justify merging new filesystems into the kernel?
Comments (11 posted)
The last couple of years have seen a renewed push within the kernel
community to avoid regressions. When a patch is found to have broken
something that used to work, a fix must be merged or the offending patch
will be removed from the kernel. It's a straightforward and logical idea,
but there's one little problem: when a kernel series includes over 12,000
changesets (as 2.6.25 does), how does one find the patch which caused the
problem? Sometimes it will be obvious, but, for other problems, there are
literally thousands of patches which could be the source of the
regression. Digging through all of those patches in search of a bug can be
a needle-in-the-haystack sort of proposition.
One of the many nice tools offered by the git source code management system
is called "bisect." The bisect feature helps the user perform a binary
search through a range of patches until the one containing the bug is
found. All that is needed is to specify the most recent kernel which is
known to work (2.6.24, say), and the oldest kernel which is broken
(2.6.25-rc9, perhaps), and the bisect feature will check out a version of
the kernel at the midpoint between those two. Finding that midpoint is
non-trivial, since, in git, the stream of patches is not a simple line.
But that's the sort of task we keep computers around for. Once the
midpoint kernel has been generated, the person
chasing the bug can build and
test it, then tell git whether it exhibits the bug or not. A
kernel at the new midpoint will be produced, and the process continues.
With bisect, the problematic patch can be found in a maximum of a dozen or
so compile-boot-test cycles.
Bisect is not a perfect tool. If patch submitters are not careful, bisect
can create a broken kernel when it splits a patch series. The patch which
causes a bug to manifest itself may not be the one which introduced the
bug. In the worst case, a developer may merge a long series of patches,
finishing with one brief change which enables all the code added
previously; in this case, bisect will find the final patch, which will only
be marginally useful. If the person reporting the bug is running a
distributor's kernel, it may be hard to get that kernel in a form which is
amenable to the bisection process. Bisection might require
unacceptable downtime on the only (production) system which is affected by
the bug. And, of course, the process of checking out, building, booting,
and testing a dozen kernels is not something which one fits into a coffee
break. It requires a certain determination on the part of the tester and
quite a bit of time.
All of the points above would suggest that requesting a bisection from a
user reporting a bug should be done as a last resort. In that context, it
is worth looking at the story of a recent bug report which suggests that
some observers, at least, think that kernel developers are relying a little
too heavily on this tool. An April 9, Mark Lord reported a regression in the networking stack;
after making a couple of guesses, the network developers suggested that the problem be bisected.
Mark replied that he did not have the time to go through a full
bisection, and that he would much rather be provided a list of commits
which might be at fault. That list was not forthcoming, though; there were
no developers who had an idea of where the problem might be and, as it
turns out, the developer who introduced the bug lives in a time zone which
caused him to miss the discussion. Mark's response was strong:
Years ago, Linus suggested that he opposed an in-kernel debugger
mainly because he preferred that we *think* more about the
problems, rather than just finding/fixing symptoms. This 100%
reliance upon git-bisect is worse than that. It has people now
just tossing regressions into the code left and right, knowing that
they can toss all of the testing back at the poor folks whose
systems end up not working.
Andrew Morton also worries that developers
resort too quickly to a bisection request rather than working with users as
was once done. Either that, he says, or developers just ignore the report
from the beginning.
Other developers have answers to these worries, of course. Kernel
developers often are not in a position to reproduce a reported bug; it may
depend on the specifics of the user's hardware or workload. So they must
depend on the user to try things and inform them when a change fixes the
problem. Here's David Miller's view on how
things used to work:
In fact, this is what Andrew's so-called "back and forth with the
bug reporter" used to mainly consist of. Asking the user to try
this patch or that patch, which most of the time were reverts of
suspect changes. Which, surprise surprise, means we were spending
lots of time bisecting things by hand.
We're able to automate this now and it's not a bad thing.
The other answer that one hears is that the situation now is much
different, with far more users, much more code, and more problems to deal
with. The old "back and forth" mode was better suited to smaller user
and developer communities; in the current world, things must be done
differently. David Miller again:
What people don't get is that this is a situation where the "end
node principle" applies. When you have limited resources (here:
developers) you don't push the bulk of the burden upon them.
Instead you push things out to the resource you have a lot of, the
end nodes (here: users), so that the situation actually scales.
There is another aspect of the problem which is spoken about a bit less
frequently: developers must prioritize bug reports and decide which ones to
work on. Unlike some projects, the kernel does not have anybody serving in
any sort of bug triage role, so, in the absence of a disgruntled and paying
customer, most developers make their own decisions on which problems to try
to solve. It should not be surprising that problems with the most complete
information are the ones which are most likely to be addressed first.
A bug report with a bisection that fingers a specific commit is a report
with very good information, one which is generally easy to resolve. As an
example, consider Mark Lord's report again; he did eventually take the time
(five hours, apparently)
to bisect the problem and report the
results; the bug was found and fixed almost immediately thereafter -
despite the fact that the responsible developer was still sleeping
on the other side of the planet.
Even less spoken about is the fact that quite a few problems are one-off
occurrences. Somewhere out there in the world, there is a single user who,
due to a highly uncommon mixture of hardware and software, experiences a
problem which affects (almost) nobody else. Marginal hardware, out-of-tree
patches, and overclocking only make the problem worse. Arjan van de Ven's
kernel oops summaries are illustrative in this regard; the
statistics for the 2.6.25-rc kernels show that a half-dozen problems
account for over half of the reports, while the vast majority of oopses
have only a single occurrence.
Kernel developers have learned that this kind of problem report tends to go
away by itself; the affected user finds a way around the issue (or just
gives up) and nobody else ever complains. One can well argue that trying
to chase down this kind of problem is not a good use of a kernel
developer's time. The hard part is figuring out which reports are of this
variety. One relatively straightforward way is to wait until reports from
other users confirm the problem - or until a sufficiently determined user
bisects the problem and provides a commit ID. In this sense, bisection
serves as a sort of triage mechanism which requires users to perform enough
work to show that the problem is real.
So the developers do have very good reasons for requesting bisections from
users. That said, there is reason to worry that many users will simply
stop sending in bug reports. If the only response they can expect is a
bisection request (which they may be in no position to answer), they may
see no point in reporting bugs at all. Fewer bug reports is not the path
toward more solid kernel releases. So, as useful as it is, bisection will
have to be a tool of last resort in most cases. The good news is that the
development community does seem to understand that; bisection remains just
one of the many tools we have for the isolation and solution of problems.
The not-quite-so-good news is that, as Al
Viro and James Morris have pointed out,
the real problem is in the review of code so that fewer bugs are created in
the first place. That is not a problem which can be solved with
Comments (25 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
- Nick Piggin: SLQB v2.
(April 10, 2008)
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>