Brief itemsreleased by Linus on July 13. For those just tuning in, some of the bigger changes in 2.6.26 include PAT support in the x86 architecture, read-only bind mounts, the KGDB debugger, a lot of virtualization work, and more. See the KernelNewbies 2.6.26 page for lots of details.
The 2.6.27 merge window has opened, with some 3000 changesets incorporated as of this writing. See the separate article below for a summary of what has been merged (so far) for the next development cycle.
The 126.96.36.199 stable kernel update was released on July 13. It contains a single fix for a locally-exploitable vulnerability (limited to x86_64 systems).
Kernel development news
-- Ingo Molnar
Anyway, I have to say that I personally don't have any hugely strong opinions on the numbering. I suspect others do, though, and I'm almost certain that this is an absolutely _perfect_ "bikeshed-painting" subject where thousands of people will be very passionate and send me their opinions on why _their_ particular shed color is so much better.
User-visible changes include:
Changes visible to kernel developers include:
Come back next week for the next episode in the "what's coming in 2.6.27" series.
For this reason, there is steady interest in filesystems which use checksums on data stored to block devices. Rather than take the device's word that it successfully stored and retrieved a block, the filesystem can compare checksums and be sure. A certain amount of checksumming is also done by paranoid applications in user space. The checksums used by BitKeeper are said to have caught a number of corruption problems; successor tools like git have checksums wired deeply into their data structures. If a disk drive corrupts a git repository, users will know about it sooner rather than later.
Checksums are a useful tool, but they have one minor problem: checksum failures tend to come when they are too late to be useful. By the time a filesystem or application notices that a disk block isn't quite what it once was, the original data may be long-gone and unrecoverable. But disk block corruption often happens in the process of getting the data to the disk; it would sure be nice if the disk itself could use a checksum to ensure that (1) the data got to the disk intact, and (2) the disk itself hasn't mangled it.
To that end, a few standards groups have put together schemes for the incorporation of data integrity checking into the hardware itself. These mechanisms generally take the form of an additional eight-byte checksum attached to each 512-byte block. The host system generates the checksum when it prepares a block for writing to the drive; that checksum will follow the data through the series of host controllers, RAID controllers, network fabrics, etc., with the hardware verifying the checksum along each step of the way. The checksum is stored with the data, and, when the data is read in the future, the checksum travels back with it, once again being verified at each step. The end result should be that data corruption problems are caught immediately, and in a way which identifies which component of the system is at fault.
Needless to say, this integrity mechanism requires operating system support. As of the 2.6.27 kernel, Linux will have such support, at least for SCSI and SATA drives, thanks to Martin Petersen. The well-written documentation file included with the data integrity patches envisions three places where checksum generation and verification can be performed: in the block layer, in the filesystem, and in user space. Truly end-to-end protection seems to need user-space verification, but, for now, the emphasis is on doing this work in the block layer or filesystem - though, as of this writing, no integrity-aware filesystems exist in the mainline repository.
Drivers for block devices which can manage integrity data need to register some information with the block layer. This is done by filling in a blk_integrity structure and passing it to blk_integrity_register(). See the document for the full details; in short, this structure contains two function pointers. generate_fn() generates a checksum for a block of data, and verify_fn() will verify a checksum. There are also functions for attaching a tag to a block - a feature supported by some drives. The data stored in the tag can be used by filesystem-level code to, for example, ensure that the block is really part of the file it is supposed to belong to.
The block layer will, in the absence of an integrity-aware filesystem, prepare and verify checksum data itself. To that end, the bio structure has been extended with a new bi_integrity field, pointing to a bio_vec structure describing the checksum information and some additional housekeeping. Happily, the integrity standards were written to allow the checksum information to be stored separately from the actual data; the alternative would have been to modify the entire Linux memory management system to accommodate that information. The bi_integrity area is where that information goes; scatter/gather DMA operations are used to transfer the checksum and data to and from the drive together.
Integrity-aware filesystems, when they exist, will be able to take over the generation and verification of checksum data from the block layer. A call to bio_integrity_prep() will prepare a given bio structure for integrity verification; it's then up to the filesystem to generate the checksum (for writes) or check it (for reads). There's also a set of functions for managing the tag data; again, see the document for the details.
One of the more annoying and long-lived annoyances in the Linux block layer has been the limit on the number of partitions which can be created on any one device. IDE devices can handle up to 64 partitions, which is usually enough, but SCSI devices can only manage 16 - including one reserved for the full device. As these devices get larger, and as applications which benefit from filesystem isolation (virtualization, for example) become more popular, this limit only becomes more irksome.
The interesting thing is that the work needed to circumvent this problem was done some years ago when device numbers were extended to 32 bits. Some complicated schemes were proposed back in 2004 as a way of extending the number of partitions while not changing any existing device numbers, but that approach was never adopted. In the mean time, increasing use of tools like udev has pretty much eliminated the need for device number compatibility; on most distributions, there are no persistent device files anymore.
So when Tejun Heo revisited the partition limit problem, he didn't bother with obscure bit-shuffling schemes. Instead, with his patch set, block devices simply move to a new major device number and have all minor numbers dynamically assigned. That means that no block device has a stable (across boots) number; it also means that the minor numbers for partitions on the same device are not necessarily grouped together. But, since nobody really ever sees the device numbers on a contemporary distribution, none of this should matter.
Tejun's patch series is an interesting exercise in slowly evolving an interface toward a final goal, with a number of intermediate states. In the end, the API as seen by block drivers changes very little. There is a new flag (GENHD_FL_EXT_DEVT) which allows the disk to use extended partition numbers; once the number of minor numbers given to alloc_disk() is exhausted, any additional partitions will be numbered in the extended space. The intended use, though, would appear to be to allocate no traditional minor numbers at all - allocating disks with alloc_disk(0) - and creating all partitions in that extended space. Tejun's patch causes both the IDE and sd drivers to allocate gendisk structures in that way, moving all disks on most systems into the (shared) extended number space.
Even though modern distributions are comfortable with dynamic device numbers (and names, for that matter), it seems hard to imagine that a change like this would be entirely free of systems management problems across the full Linux user base. Distributors may still be a little nervous from the grief they took after the shift to the PATA drivers changed drive names on installed systems. So it's not really clear when Tejun's patches might make it into the mainline, or when distributors would make use of that functionality. The pressure for more partitions is unlikely to go away, though, so these patches may find their way in before too long.attempts to turn this local battle into a multi-list, regional conflict. Finding the right way to deal with security problems is difficult for any project, and the kernel is no exception. Whether this discussion will lead to any changes remains to be seen, but it does at least provide a clear view of where the disagreements are.
Things flared up this time in response to the 188.8.131.52 stable kernel update. The announcement stated that "any users of the 2.6.25 kernel series are STRONGLY encouraged to upgrade to this release," but did not say why; none of the patches found in this release were marked as security problems. As it happens, there were security-related fixes in that update; some users are upset that they were not explicitly called out as such. They have reached the point of accusing the kernel developers of hiding security problems.
These problems, it is said, are fixed with relatively benign-sounding commit messages ("x86_64 ptrace: fix sys32_ptrace task_struct leak," for example) and users are not told that a security fix has been made. This, in turn, is thought to put users at risk because (1) they do not know when they need to apply an update, and (2) there is no clear picture of how many security problems are surfacing in the kernel code. So, as "pageexec" (or "PaX Team") put it:
There are two aspects to the charge that the kernel is not following a full disclosure policy: commit messages are said to obscure security fixes, and kernel releases do not highlight the fact that security problems have been fixed. There is an aspect of truth to the first charge, in that Linus will freely admit to changing commit logs which discuss security problems too explicitly:
That said, I don't _plan_ messages or obfuscate them, so "overflow" might well be part of the message just because it simply describes the fix. So I'm not claiming that the messages can never help somebody pinpoint interesting commits to look at, I'm just also not at all interested in doing so reliably.
His goal here is clear: make life just a little harder for people who are searching the commit logs for vulnerabilities to exploit. One may argue over whether this policy amounts to hiding security problems, or whether it will be effective in reducing exploits (and plenty of people have shown their willingness to do such arguing), but the fact remains that it is the policy followed by Linus at this time. In his view, the committing of a fix is the disclosure of the problem, and there is no need to be more explicit than that.
That view extends to the whole security update process found in much of the community. He has no respect for embargo policies or delayed disclosure, and he criticizes the "whole security circus" which, in his opinion, emphasizes the wrong thing:
In fact, all the boring normal bugs are _way_ more important, just because there's a lot more of them. I don't think some spectacular security hole should be glorified or cared about as being any more "special" than a random spectacular crash due to bad locking.
Beyond that, it is often hard to know which patches are truly security fixes. It has been argued at times that all bugs have security relevance; it's mostly just a matter of figuring out how to exploit them. So explicitly marking security fixes risks taking attention away from all of the other fixes, many of which may also, in fact, fix security issues. Thus, Linus says:
So why would I add some marking that I most emphatically do not believe in myself, and think is just mostly security theater?
That said, the stable kernel updates go out with patches which are known to be security fixes. Some people clearly believe that being STRONGLY encouraged to update is not sufficient notification of that fact. It does seem that there has been a trend away from explicit recognition of security issues in the stable releases. The inclusion of CVE numbers was once common; in the 2.6.25 series, only 184.108.40.206, 220.127.116.11, and 18.104.22.168 had such numbers in the changelogs. It is, indeed, true that a straightforward reading of the stable release changelogs will not tell users whether those releases fix relevant security issues.
There are a number of answers to that complaint too, of course. The real information is in the source code, and that is always public. The fixes in the stable series are unlikely to be all that relevant to most users anyway; they are running distributor kernels which are many months behind even the -stable series and which may (or may not) be affected by a specific problem. In the end, users who are concerned about security issues in their kernels have somebody to turn to: their distributors. Linux distributors follow disclosure rules and tend to do a pretty thorough job of fixing the known security problems and propagating those fixes to users. For users who need a high level of long-term support, there are distributors who are more than willing to provide that kind of service for a fee.
As is often the case, what it really comes down to here is resources. It would be nice if somebody were to follow the patch stream (well over 100 patches/day into the mainline) and identify each one which has security implications. For each patch, this person could then figure out which kernel version was first affected by the vulnerability, obtain a CVE number, and issue a nicely-formatted advisory. But this is a huge job, one which nobody is likely to do in an uncompensated mode for any period of time. So somebody would have to pay for this work. And, to a great extent, that is just what the distributors are doing now - with the nice addition that they backport the fixes into the kernels they support.
It is worth noting that those distributors have not been doing a whole lot of complaining about how security fixes are handled now. Instead, the complaining has come, primarily, from the maintainers of the out-of-tree grsecurity project which, from a suitably cynical point of view, could be seen to benefit from raising the profile of Linux kernel security problems.
But, regardless of the validity of any such charge, there may be some value in what they are asking. It is good to have a clear sense for what the security problems in a piece of code are. If nothing else, it helps the project itself to understand where it stands with regard to security and whether things are getting better or worse. So it would be nice if the kernel developers could be a bit more diligent and organized in how they track security issues, much like the tracking of regressions has improved over the last couple of years. But this kind of improvement will not happen until somebody decides to put the work into it. Actually putting some time into documenting kernel security issues will accomplish far more than complaining on mailing lists.
First off, I speak only for myself, not for the other half of the Linux -stable team, Chris Wright, who might totally disagree with me, nor for the other kernel developers who help out with the email@example.com alias, nor for my current employer Novell. Also note that all of my -stable development is done on my own time, and is not part of my role at my current job.
All of that out of the way, I object to a few things stated in the original article:
A number of times, when we do -stable releases, there are no CVE numbers issued for the "security" related issues that are fixed in there. This happens when the fix is first made in Linus's tree, and is either forwarded to the firstname.lastname@example.org alias saying, "we need to get this out now", or just by the fact that it is only later that people realize that a CVE number should be allocated.
And yes, the trend is away from explicit recognition of security issues, exactly following Linus's statement that you quote from.
It comes down to who are the users of the -stable kernel series. I personally see these kernels for two different groups of people:
The first group should always update to the latest -stable kernel update as they are relying on the -stable team to always provide them the latest fixes that are known to be needed for them. Simply marking things as "security related" can be misguided as Linus points out. The change log entries should show all users what was fixed, and if they run machine where this code is used, then they should upgrade. It's as simple as that.
In fact, in the 22.214.171.124 release I tried to say exactly that:
How much clearer can I be? Does a user of the -stable tree, who has to be technically competent to be able to do such a thing in the first place, need to know more to decide if they need to upgrade their machines or not? It seems people are upset that I am no longer using the magic words "security fix", and that is true, I am not saying that anymore. As Linus and others have noted, marking some bugs as being "security-related" is not helpful, especially as not everyone can even agree - or sometimes even know at release time - whether a bug has security implications or not.
Also note that this release does not refer to a CVE number. This is because, as of this moment, there still is not a number assigned, despite asking the relevant groups for such an assignment. I never want to hold up a release by waiting for any such number, so I personally will just not use them in the future in -stable releases unless they are already contained in the original changelog entry in Linus's tree.
The second group, the distributions, all seem very happy with how the -stable releases are conducted. They have the capability to pick and choose from the fixes and apply them to their older kernel versions and ship them to their customers as they see fit. The distros all know what things are security related by the fact that they know and understand the code and the threat model as they have developers assigned to handle such security issues, and have done so for years.
In your summary, you state:
I think the individual developers of the kernel all know quite well what the security problems for their code are. This is backed up by the fact that these developers are the ones usually making the fix and telling the -stable team that a specific patch is needed to be added.
What you seem to be asking for is a way to somehow classify bugs and fixes in the kernel tree as "security related" or not. And that goes back to Linus's original point. To try to do so marginalizes bugs which are somehow not so designated as not worth fixing. However, if someone wants to do this work for the kernel community, and it proves to be useful over time, I'll be the first in line to say that I was wrong.
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds