The 2.6.26 kernel is out
by Linus on July 13.
For those just tuning in, some of the bigger changes in 2.6.26 include PAT
support in the x86 architecture, read-only bind mounts, the KGDB debugger,
a lot of virtualization
work, and more. See the
KernelNewbies 2.6.26 page
for lots of details.
The 2.6.27 merge window has opened, with some 3000 changesets incorporated
as of this writing. See the separate article below for a summary of what
has been merged (so far) for the next development cycle.
The 18.104.22.168 stable kernel
update was released on July 13. It contains a single fix for a
locally-exploitable vulnerability (limited to x86_64 systems).
Comments (none posted)
Kernel development news
That said, I didn't actually _test_ my patch. That's what users are
-- Linus Torvalds
6924d1ab8b7bbe5ab416713f5701b3316b2df85b is a work of art.
Is it ascii-art tetris? a magic eye picture? you decide!
It looks even more spectacular in gitk.
-- Dave Jones
-- Ingo Molnar
But I could also see the second number as being the "year", and
2008 would get 2.8, and then next year I'd make the first release
of 2009 be 2.9.1 (and probably avoid the ".0" just because it again
has the connotations of a "big new untested release", which is not
true in a date-based numbering scheme). And then 2010 would be
Anyway, I have to say that I personally don't have any hugely
strong opinions on the numbering. I suspect others do, though, and
I'm almost certain that this is an absolutely _perfect_
"bikeshed-painting" subject where thousands of people will be very
passionate and send me their opinions on why _their_ particular
shed color is so much better.
-- Linus Torvalds
opens the can of worms
Indeed, I apologise for reviewing the code on a monitor that is
wider than yours. If only we could make sure that all Linux
developers used smaller monitors then the code quality would surely
-- Herbert Xu
And we should obviously have _a_ version of the firmware available
with the kernel when that is possible. But I'd hate for it to be
1:1 with a particular driver version - because at that point it
smells of being a single work, and if it is more than mere
aggregation it's no longer viable with most of our firmware (I
don't think we have source for more than one or two cases).
-- Linus Torvalds
Comments (19 posted)
Linus wasted no time after the 2.6.26 release; he opened the 2.6.27 merge
window less than 24 hours later. As of this writing, the process has
barely begun with a mere 3000 changesets merged. So we do not have a
complete picture of what will be in the next kernel release. But we can
look at what has been merged so far.
User-visible changes include:
- New drivers for CompuLab EM-x270 audio devices (as found on the
Toshiba e800 PDA),
Philips UDA1380 codecs,
Wolfson Micro WM8510 and WM8990 codecs,
Atmel AT32 audio devices,
SGI HAL2 audio devices (as found in Indy and Indigo2 workstations),
SGI O2 audio boards,
crypto engines found in Intel IXP4xx processors,
Freescale Security Engine processors,
AMD I/O memory management units,
Marvell Loki (88RC8480), Kirkwood (88F6000), and Discovery Duo
(MV78xx0) system-on-chip processors,
IBM Power Virtual Fibre Channel Adapters, and
GEFanuc C2K cPCI single-board computers.
- The old "ppc" architecture has been removed; all platforms are now
supported by the integrated "powerpc" architecture code.
- The SCSI command filter - which controls which SCSI commands can be
sent to a device by which kind of user - is now per-device and can be
changed via sysfs.
- The block subsystem now has support for hardware which can perform
data integrity checking; this will allow some kinds of errors to be
caught before the associated data is lost forever. See this article for more
information on the block-layer integrity feature.
- The "dummy" Linux security module has been removed; the default module
is now the capabilities module.
- The crypto code has gained support for the RIPEMD-128, RIPEMD-160,
RIPEMD-256, and RIPEMD-320 hash algorithms. Asynchronous hashing is
now supported and is implemented by the "cryptd" software crypto
- Xen now has support for the saving and restoring of virtual machines -
possibly migrating them to different hosts in between.
- The new virtual file /sys/firmware/memmap shows the memory
map as it was configured by the system BIOS before the kernel booted.
- The ftrace lightweight tracing framework has been merged. See
Documentation/ftrace.txt for more
information on ftrace.
- The mmiotrace tool has
been merged. Mmiotrace will capture and print out memory-mapped I/O
accesses, making it a useful tool for the reverse-engineering of
- The ARM and powerpc architectures now support the latencytop tool.
- The RDMA code has acquired support for the InfiniBand "base memory
management extension" operations. The IP-over-InfiniBand code can now
perform large receive offload (LRO).
- Delayed allocation support has been added to the ext4 filesystem,
which is getting quite close to its target feature set.
- The SATA layer now has enclosure management support; this allows the
system to do things like blink an LED to indicate a specific drive in
a large enclosure.
- The SGI IRIX binary compatibility layer has been removed.
Changes visible to kernel developers include:
- The register_security() function has been removed. Security
modules which wish to implement stacking must now do so explicitly.
- The request_queue_t type is gone at last; block drivers
should use struct request_queue instead.
- Quite a bit of big kernel
lock removal work has been merged. For
char devices, the open() method from struct
file_operations is no longer protected by the BKL. Calls to
fasync() have also lost BKL protection.
- Many drivers have been converted to use the firmware loader, making it
possible to strip the firmware from the kernel for those who are
inclined to do so. See this
article for more information on the firmware work.
- The API work in the i2c layer continues; there is now an autodetection
capability which allows new-style drivers to detect devices on their
- The SCSI layer has gained new support for "device handlers," which are
mostly concerned with multipath management. Some of this code has
been moved over from the device mapper.
Come back next week for the next episode in the "what's coming in 2.6.27"
Comments (none posted)
One likes to think of disk drives as being a reliable store of data. As
long as nothing goes so wrong as to let the smoke out of the device, blocks
written to the disk really should come back with the same bits set in the
same places. The reality of the situation is a bit less encouraging,
especially when one is dealing with the sort of hardware which is available
at the local computer store. Stories of blocks which have been corrupted,
or which have been written to a location other than the one which was
intended, are common.
For this reason, there is steady interest in filesystems which use
checksums on data stored
to block devices. Rather than take the device's word that it successfully
stored and retrieved a block, the filesystem can compare checksums and be sure. A
certain amount of checksumming is also done by paranoid applications in
user space. The checksums used by BitKeeper are said to have caught a
number of corruption problems; successor tools like git have checksums
wired deeply into their data structures. If a disk drive corrupts a git
repository, users will know about it sooner rather than later.
Checksums are a useful tool, but they have one minor problem: checksum
failures tend to come when they are too late to be useful. By the time a
filesystem or application notices that a disk block isn't quite what it
once was, the original data may be long-gone and unrecoverable. But disk
block corruption often happens in the process of getting the data to the
disk; it would sure be nice if the disk itself could use a checksum to
ensure that (1) the data got to the disk intact, and (2) the disk
itself hasn't mangled it.
To that end, a few standards groups have put together schemes for the
incorporation of data integrity checking into the hardware itself. These
mechanisms generally take the form of an additional eight-byte checksum
attached to each 512-byte block. The host system generates the checksum
when it prepares a block for writing to the drive; that checksum will
follow the data through the series of host controllers, RAID
controllers, network fabrics, etc., with the hardware verifying the
checksum along each step of the way. The checksum is stored with the data,
and, when the data is read in the future, the checksum travels back with
it, once again being verified at each step. The end result should be that
data corruption problems are caught immediately, and in a way which
identifies which component of the system is at fault.
Needless to say, this integrity mechanism requires operating system
support. As of the 2.6.27 kernel, Linux will have such support, at least
for SCSI and SATA drives, thanks to Martin Petersen. The well-written documentation file included with the data
integrity patches envisions three places where checksum generation and
verification can be performed: in the block layer, in the filesystem, and
in user space. Truly end-to-end protection seems to need user-space
verification, but, for now, the emphasis is on doing this work in the block
layer or filesystem - though, as of this writing, no integrity-aware
filesystems exist in the mainline repository.
Drivers for block devices which can manage integrity data need to register
some information with the block layer. This is done by filling in a
blk_integrity structure and passing it to
blk_integrity_register(). See the document for the full details;
in short, this structure contains two function pointers.
generate_fn() generates a checksum for a block of data, and
verify_fn() will verify a checksum. There are also functions for
attaching a tag to a block - a feature supported by some drives. The data
stored in the tag can be used by filesystem-level code to, for example,
ensure that the block is really part of the file it is supposed to belong
The block layer will, in the absence of an integrity-aware filesystem,
prepare and verify checksum data itself. To that end, the bio
structure has been extended with a new bi_integrity field,
pointing to a bio_vec structure describing the checksum
information and some additional housekeeping. Happily, the integrity
standards were written to allow the checksum information to be stored
separately from the actual data; the alternative would have been to modify
the entire Linux memory management system to accommodate that information.
The bi_integrity area is where that information goes;
scatter/gather DMA operations are used to transfer the checksum and data
to and from the drive together.
Integrity-aware filesystems, when they exist, will be able to take over the
generation and verification of checksum data from the block layer. A call
to bio_integrity_prep() will prepare a given bio
structure for integrity verification; it's then up to the filesystem to
generate the checksum (for writes) or check it (for reads). There's also a
set of functions for managing the tag data; again, see the document for the
One of the more annoying and long-lived annoyances in the Linux block layer
has been the limit on the number of partitions which can be created on any
one device. IDE devices can handle up to 64 partitions, which is usually
enough, but SCSI devices can only manage 16 - including one reserved for
the full device. As these devices get larger, and as applications which
benefit from filesystem isolation (virtualization, for example) become more
popular, this limit only becomes more irksome.
The interesting thing is that the work needed to circumvent this problem
was done some years ago when device numbers were extended to 32 bits. Some
complicated schemes were
proposed back in 2004 as a way of extending the number of partitions while
not changing any existing device numbers, but that approach was never
adopted. In the mean time, increasing use of tools like udev has
pretty much eliminated the need for device number compatibility; on most
distributions, there are no persistent device files anymore.
So when Tejun Heo revisited the
partition limit problem, he didn't bother with obscure bit-shuffling
schemes. Instead, with his patch set, block devices simply move to a new
major device number and have all minor numbers dynamically assigned. That
means that no block device has a stable (across boots) number; it also
means that the minor numbers for partitions on the same device are not
necessarily grouped together. But, since nobody really ever sees the
device numbers on a contemporary distribution, none of this should matter.
Tejun's patch series is an interesting exercise in slowly evolving an
interface toward a final goal, with a number of intermediate states. In
the end, the API as seen by block drivers changes very little. There is a
new flag (GENHD_FL_EXT_DEVT) which allows the disk to use extended
partition numbers; once the number of minor numbers given to
alloc_disk() is exhausted, any additional partitions will be
numbered in the extended space. The intended use, though, would appear to
be to allocate no traditional minor numbers at all - allocating disks with
alloc_disk(0) - and creating all partitions in that extended
space. Tejun's patch causes both the IDE and sd drivers to allocate
gendisk structures in that way, moving all disks on most systems
into the (shared) extended number space.
Even though modern distributions are comfortable with dynamic device
numbers (and names, for that matter), it seems hard to imagine that a
change like this would be entirely free of systems management problems
across the full Linux user base. Distributors may still be a little
nervous from the grief they took after the shift to the PATA drivers
changed drive names on installed systems. So it's not really clear when
Tejun's patches might make it into the mainline, or when distributors would
make use of that functionality. The pressure for more partitions is
unlikely to go away, though, so these patches may find their way in before
Comments (12 posted)
Even the most casual observer of the linux-kernel mailing must have noticed
that, in the shadow of the firmware flame war, there is also a heated
discussion over the management of security issues. There have also been
to turn this local battle
into a multi-list, regional conflict. Finding the right way to deal with
security problems is difficult for any project, and the kernel is no
exception. Whether this discussion will lead to any changes remains to be
seen, but it does at least provide a clear view of where the disagreements
Things flared up this time in response to the 22.214.171.124 stable kernel update.
The announcement stated that "any users of the 2.6.25 kernel series
are STRONGLY encouraged to upgrade to this release," but did not say
why; none of the patches found in this release were marked as security
problems. As it happens, there were security-related fixes in that update;
some users are upset that they were not explicitly called out as such.
They have reached the point of accusing the kernel developers of hiding
These problems, it is said, are fixed with relatively
benign-sounding commit messages ("x86_64 ptrace: fix sys32_ptrace
task_struct leak," for example) and users are not told that a security fix
has been made. This, in turn, is thought to put users at risk because (1) they
do not know when they need to apply an update, and (2) there is no
clear picture of how many security problems are surfacing in the kernel
code. So, as "pageexec" (or "PaX Team") put
the problem i raised was that there's one declared policy in
Documentation/SecurityBugs (full disclosure) yet actual actions are
completely different and now Linus even admitted it. the problem
arising from such inconsistency is that people relying on the
declared disclosure policy will make bad decisions and potentially
endanger their users. there're two ways out of this sitution:
either follow full disclosure in practice or let the world at large
know that you (well, Linus) don't want to. in either case people
will adjust their security bug handling processes and everyone will
be better off.
There are two aspects to the charge that the kernel is not following a full
disclosure policy: commit messages are said to obscure security fixes, and
kernel releases do not highlight the fact that security problems have been
fixed. There is an aspect of truth to the first charge, in that Linus will
freely admit to changing commit logs which
discuss security problems too explicitly:
I literally draw the line at anything that is simply greppable
for. If it's not a very public security issue already, I don't want
a simple "git log + grep" to help find it.
That said, I don't _plan_ messages or obfuscate them, so "overflow"
might well be part of the message just because it simply describes
the fix. So I'm not claiming that the messages can never help
somebody pinpoint interesting commits to look at, I'm just also not
at all interested in doing so reliably.
His goal here is clear: make life just a little harder for people who are
searching the commit logs for vulnerabilities to exploit. One may argue
over whether this policy amounts to hiding security problems, or whether it
will be effective in reducing exploits (and plenty of people have shown
their willingness to do such arguing), but the fact remains that it
is the policy followed by Linus at this time. In his view, the
committing of a fix is the disclosure of the problem, and there is no need
to be more explicit than that.
That view extends to the whole security update process found in much of the
community. He has no respect for embargo policies or delayed disclosure, and he
criticizes the "whole security circus"
which, in his opinion, emphasizes the wrong thing:
It makes "heroes" out of security people, as if the people who don't just
fix normal bugs aren't as important.
In fact, all the boring normal bugs are _way_ more important, just
because there's a lot more of them. I don't think some spectacular
security hole should be glorified or cared about as being any more
"special" than a random spectacular crash due to bad locking.
Beyond that, it is often hard to know which patches are truly security
fixes. It has been argued at times that all bugs have security
relevance; it's mostly just a matter of figuring out how to exploit them.
So explicitly marking security fixes risks taking attention away from all
of the other fixes, many of which may also, in fact, fix security issues.
Thus, Linus says:
If people think that they are safer for only applying (or upgrading
to) certain patches that are marked as being security-specific,
they are missing all the ones that weren't marked as such. Making
them even _believe_ that the magic security marking is meaningful
is simply a lie. It's not going to be.
So why would I add some marking that I most emphatically do not
believe in myself, and think is just mostly security theater?
That said, the stable kernel updates go out with patches which are known to
be security fixes. Some people clearly believe that being STRONGLY
encouraged to update is not sufficient notification of that fact. It does
seem that there has been a trend away from explicit recognition of security
issues in the stable releases. The inclusion of CVE numbers was once
common; in the 2.6.25 series, only 126.96.36.199, 188.8.131.52, and 184.108.40.206 had such numbers in the
changelogs. It is, indeed, true that a straightforward reading of the
stable release changelogs will not tell users whether those releases fix
relevant security issues.
There are a number of answers to that complaint too, of course. The real
information is in the source code, and that is always public. The fixes in
the stable series are unlikely to be all that relevant to most users
anyway; they are running distributor kernels which are many months behind
even the -stable series and which may (or may not) be affected by a specific
problem. In the end, users who are concerned about security issues in
their kernels have somebody to turn to: their distributors. Linux
distributors follow disclosure rules and
tend to do a pretty thorough job of fixing the known security problems and
propagating those fixes to users. For users who need a high level of
long-term support, there are distributors who are more than willing to
provide that kind of service for a fee.
As is often the case, what it really comes down to here is resources. It
would be nice if somebody were to follow the patch stream (well over 100
patches/day into the mainline) and identify each one which has security
implications. For each patch, this person could then figure out which
kernel version was first affected by the vulnerability, obtain a CVE
number, and issue a nicely-formatted advisory. But this is a huge job, one
which nobody is likely to do in an uncompensated mode for any period of time.
So somebody would have to pay for this work. And, to a great extent, that
is just what the distributors are doing now - with the nice addition that
they backport the fixes into the kernels they support.
It is worth noting that those distributors have not been doing a whole lot
of complaining about how security fixes are handled now. Instead, the
complaining has come, primarily, from the maintainers of the out-of-tree grsecurity project which, from a
cynical point of view, could be seen to benefit from raising the profile of
Linux kernel security problems.
But, regardless of the validity of any such charge, there may be some value
in what they are asking. It is good to have a clear sense for what
the security problems in a piece of code are. If nothing else, it helps
the project itself to understand where it stands with regard to security
and whether things are getting better or worse. So it would be nice if the
kernel developers could be a bit more diligent and organized in how they
track security issues, much like the tracking of regressions has improved
over the last couple of years. But this kind of improvement will not
happen until somebody decides to put the work into it. Actually putting
some time into documenting kernel security issues will accomplish far more
than complaining on mailing lists.
Comments (44 posted)
I would like to try to clarify a few points in the article, "Handling
kernel security problems" by Jonathan Corbet.
First off, I speak only for myself, not for the other half of the Linux
-stable team, Chris Wright, who might totally disagree with me, nor for
the other kernel developers who help out with the firstname.lastname@example.org
alias, nor for my current employer Novell. Also note that all of my
-stable development is done on my own time, and is not part of my role
at my current job.
All of that out of the way, I object to a few things stated in the
It does seem that there has been a trend away from explicit
recognition of security issues in the stable releases. The
inclusion of CVE numbers was once common; in the 2.6.25 series,
only 220.127.116.11, 18.104.22.168, and 22.214.171.124 had such numbers in the
changelogs. It is, indeed, true that a straightforward reading of
the stable release changelogs will not tell users whether those
releases fix relevant security issues.
A number of times, when we do -stable releases, there are no CVE numbers
issued for the "security" related issues that are fixed in there. This
happens when the fix is first made in Linus's tree, and is either
forwarded to the email@example.com alias saying, "we need to get this
out now", or just by the fact that it is only later that people realize
that a CVE number should be allocated.
And yes, the trend is away from explicit recognition of security issues,
exactly following Linus's statement that you quote from.
It comes down to who are the users of the -stable kernel series. I
personally see these kernels for two different groups of people:
- Those who want to follow the latest kernel.org releases and not rely
on a distribution for their kernel versions.
- For distributions to base releases on, and to pick and choose
The first group should always update to the latest -stable kernel update
as they are relying on the -stable team to always provide them the
latest fixes that are known to be needed for them. Simply marking
things as "security related" can be misguided as Linus points out. The
change log entries should show all users what was fixed, and if they run
machine where this code is used, then they should upgrade. It's as
simple as that.
In fact, in the 126.96.36.199 release I tried to say exactly that:
It contains one bugfix, any user of the 2.6.25 kernel on x86-64
with untrusted local users is very STRONGLY recommended to
How much clearer can I be? Does a user of the -stable tree, who has to be
technically competent to be able to do such a thing in the first place,
need to know more to decide if they need to upgrade their machines or not?
It seems people are upset that I am no longer using the magic words
"security fix", and that is true, I am not saying that anymore.
As Linus and others have noted, marking some bugs as being
"security-related" is not helpful, especially as not everyone can even
agree - or sometimes even know at release time - whether a bug has security
implications or not.
Also note that this release does not refer to a CVE number. This is
because, as of this moment, there still is not a number assigned,
despite asking the relevant groups for such an assignment. I never want
to hold up a release by waiting for any such number, so I personally
will just not use them in the future in -stable releases unless they are
already contained in the original changelog entry in Linus's tree.
The second group, the distributions, all seem very happy with how the
-stable releases are conducted. They have the capability to pick and
choose from the fixes and apply them to their older kernel versions and
ship them to their customers as they see fit. The distros all know what
things are security related by the fact that they know and understand
the code and the threat model as they have developers assigned to
handle such security issues, and have done so for years.
In your summary, you state:
It is good to have a clear sense for what the security problems in a
piece of code are. If nothing else, it helps the project itself to
understand where it stands with regard to security and whether things
are getting better or worse. So it would be nice if the kernel
developers could be a bit more diligent and organized in how they
track security issues, much like the tracking of regressions has
improved over the last couple of years.
I think the individual developers of the kernel all know quite well what
the security problems for their code are. This is backed up by the fact
that these developers are the ones usually making the fix and telling
the -stable team that a specific patch is needed to be added.
What you seem to be asking for is a way to somehow classify bugs and
fixes in the kernel tree as "security related" or not. And that goes
back to Linus's original point. To try to do so marginalizes bugs which
are somehow not so designated as not worth fixing.
However, if someone wants to do this work for the kernel community,
and it proves to be useful over time, I'll be the first in line to say
that I was wrong.
Comments (25 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>