LWN Weekly Edition Front pageSecurity Kernel development Distributions Development Linux in the news Announcements ->One big page
This page Previous weekFollowing week |
Kernel developmentRelease status Kernel release status The 2.6.26 merge window remains open, so there is no released 2.6 development kernel. See the article below for a summary of patches merged over the last week.No stable kernel releases have been made over the last week. As of this writing, the 2.6.24.6 and 2.6.25.1 stable updates are in the review process; if all goes well, these updates should be released on May 1.
Kernel development news Quotes of the week Those who have been watching the linux-kernel list know that the 2.6.26 merge window has been a little rougher than some of those which came before. That has led to some fairly strong discussion over how changes find their way into the mainline. Here's a few selections.
I'm not saying the patch is wrong ... or that just because it broke
voyager it shouldn't be done. What I'm saying is that it shouldn't have
been put into the x86 tree without mailing list review.
-- James Bottomley
Running a git tree isn't a private fiefdom, it's a public trust; to keep the trust of other developers, you have to run the tree in a transparent fashion ... and making the mailing list the only input to it is one way of ensuring this. It also helps with review that we're all so worried about so little being done ...
But, we'd not mind at all posting 1000 x86.git patches to lkml (or
another list) every 3 months (or more frequently), if people request
that.
-- Ingo Molnar
You can post whatever patches you like a million times to lkml.
That's not the problem.
It's that the patches don't get reviewed, posting them more or to a
different place doesn't help that.
-- David Miller
Sorting x86 arch code is inevitably going to break a few eggs, but I
suspect the time cost has been more in Dave v Ingo (12 rounds, two falls,
two submissions or a knockout) than actually sorting out the fallout of a
couple of problem cases.
-- Alan Cox
So here's how we're going to fix David's problem:
-- Andrew Morton
- Everyone gets their stuff into linux-next. - Lots of people _test_ linux-next. Just once a week. Those two steps will improve the merge-window chaos a lot. Things will get better.
IMO, the merge window is way too short for actually testing anything. I rebuild
the kernel once or even twice a day and there's no way I can really test it.
I can only check if it breaks right away. And if it does, there's no time to
find out what broke it before the next few hundreds of commits land on top of
that.
-- Rafael Wysocki
And yes, there is a solution: don't develop so much. Don't allow thousands
of developers to be involved. Do a small core group, and make development
so hard or inconvenient that you only have a few tens of people who write
code, and vet them and force them to jump through hoops when adding new
features (or fixing old ones, for that matter).
-- Linus Torvalds
The 2.6.26 merge window, part 2 Since last week's summary was written, another 3700 changesets have found their way into the mainline git repository. The most significant user-visible changes include:
Changes visible to kernel developers include:
The merge window remains open; tune in next week for (what should be) the final set of changes merged for 2.6.26.
Restricting root with per-process securebits Linux capabilities have had a long and somewhat tortuous journey as part of the Linux kernel. Slowly—and very carefully—functionality is being added to this security feature to get it to a point where it is a viable alternative to the all-or-nothing setuid(0) model. A recently merged patch adds a per-process securebits feature that will allow capabilities-based daemons or subsystems to coexist with existing setuid utilities. Linux capabilities break up the privileged tasks normally associated with root (i.e. uid 0) into finer-grained abilities which can be individually granted or revoked for specific processes. The idea is to change the standard Unix model that root has all special privileges while all other users have none. The terminology is always a bit contentious, though, as Linux capabilities are derived from a POSIX proposal that was never adopted, but shares the name "capabilities" with an entirely different approach; this article is only concerned with capabilities of the Linux variety. There has long been interest in creating a Linux system that did not rely upon a single root account. Capabilities are seen as the way to get there, but they have suffered from a bit of a chicken-and-egg problem. With the recent work to add file-based capabilities and restore CAP_SETPCAP to its original meaning, a true capabilities-based system is becoming possible. In the patch, which has been merged for 2.6.26, Andrew Morgan describes the new functionality:
The feature added by this patch can be leveraged to suppress the privilege
associated with (set)uid-0. This suppression requires CAP_SETPCAP to
initiate, and only immediately affects the 'current' process (it is inherited
through fork()/exec()). This reimplementation differs significantly from the
historical support for securebits which was system-wide, unwieldy and which
has ultimately withered to a dead relic in the source of the modern kernel.
The patch removes the global securebits variable, replacing it with an entry in struct task_struct, that can be manipulated by a process, but only for itself—and any children. Morgan envisions hybrid systems that have some utilities using capabilities to get their privileges along with some setuid(0) utilities. In that scenario, a capabilities-based utility or daemon may wish to limit what its children can do, even if they execute a setuid(0) binary. As part of the evolution, process trees can be created that cannot get root privileges. Processes which have the CAP_SETPCAP capability can change their securebits setting via the prctl() system call. There are three separate bits that govern the interaction of capabilities and setuid:
prctl(PR_SET_SECUREBITS, 0x2f);
This is the equivalent of setting SECURE_NOROOT, SECURE_NO_ROOT_LOCKED,
SECURE_NO_SETUID_FIXUP, SECURE_NO_SETUID_FIXUP_LOCKED, and
SECURE_KEEP_CAPS_LOCKED.
The memory of the sendmail-capabilities bug from 2000 makes some a bit queasy—or worse—about any patches that involve capabilities and setuid. Andrew Morton asks: "what was the bug which caused us to cripple capability inheritance back in the days of yore? (Some sendmail thing?)" That bug was caused because unprivileged users could take away the CAP_SETUID capability from setuid binaries like sendmail. When sendmail then used setuid to drop its privileges, it failed, but sendmail did not check, so it was still running with full privilege. This could be leveraged by a user to gain root privileges. It was a disconnect between capabilities and the longstanding behavior of Unix-like systems when dropping privileges. Morgan has written a detailed description of the sendmail-capabilities bug in response to Morton's questions. He makes it clear that he wants to move toward full capability support without breaking existing code:
I'm basically interested in evolving the capability implementation
back to the POSIX.1e model and making it whole - but most certainly
*without crippling legacy superuser support in the process* .
As folk get more comfortable with this full capability model. I believe we can delete more cruft from the main kernel, but even that clean up will leave a fully functional legacy model in place. I feel it should be for something like init, or one of its children to be able to run subsystems in capability-only or legacy modes. Morton seemed satisfied that his concerns had been addressed, but still wonders about the future for capabilities: "So how do we ever get to the stage where we can recommend that distributors turn these things on, and have them agree with us?" This was echoed by Ismail Dönmez, who was looking for concrete examples of how to use the per-process securebits feature. Morgan provides a pointer to some examples along with his belief that sometime soon the capabilities developers will become confident enough to recommend turning off the "experimental" flag for the SECURITY_FILE_CAPABILITIES kernel configuration. That flag governs both the file-based capabilities as well as the per-process securebits. In addition, Morgan says:
More importantly I'm hopeful that in that time we'll have accumulated
enough documentation and user-space experience and examples to convince
others that this is, indeed, a viable feature to support in mainstream
distributions.
A developerWorks article on file-based capabilities by Serge Hallyn and a web page on POSIX capabilities by Chris Friedhoff were both mentioned in the thread as good references for the work being done to actually use capabilities in systems. Those pre-date the securebits work, so Dönmez was looking for use-cases for the new feature. Morgan replied that containers were one, deferring to Hallyn who has some ideas on using securebits:
We tend to talk about 'system containers' versus 'application
containers'. A system container would be like a vserver or openvz
instance, something which looks like a separate machine. I was
going to say I don't imagine per-process securebits being useful
there, but actually since a system container doesn't need to do any
hardware setup it actually might be a much easier start for a full
SECURE_NOROOT distro than a real machine. Heck, on a real machine init
and a few legacy [daemons] could run in the init namespace, while users
log in and apache etc run in a SECURE_NOROOT container.
But I especially like the thought of for instance postfix running in a carefully crafted application container (with its own virtual network card and limited file tree and no visibility of other processes) with SECURE_NOROOT on. Capabilities are an interesting, but complicated, security feature. For most of the ten years they have been part of the Linux kernel, they have either been broken, ignored, or both. With the latest work being done by Hallyn, Morgan, and others, capabilities are finally becoming a fully-working alternative to things like SELinux. It will be interesting to see if more user utilities will become capability-aware and whether distributions start using capabilities. Some day, root may just fade away.
Ksplice: kernel patches without reboots The kernel developers are generally quite good about responding to security problems. Once a vulnerability in the kernel has been found, a patch comes out in short order; system administrators can then apply the patch (or get a patched kernel from their distributor), reboot the system, and get on with life knowing that the vulnerability has been fixed. It is a system which works pretty well.One little problem remains, though: rebooting the system is a pain. At a minimum, it requires a few minutes of down time. In many situations, that down time cannot be tolerated. Reboots also disrupt any ongoing work, break existing network connections, and can cause the loss of results from long-running processes. And, most importantly of all, reboots prove traumatic for a certain subset of Linux administrators who prize a long uptime above almost all other things. Administrators currently have to choose between multi-year uptimes and security fixes; anything which frees them from a dilemma of this magnitude can only be welcome. That "anything" might just be a recently-announced project called ksplice. With ksplice, system administrators can have the best of both worlds: security fixes without unsightly reboots. An in-depth explanation of how ksplice works can be found in this document [PDF]. In short, ksplice requires as input the source tree for the running kernel and the security patch. It will then build two kernels, one with the patch and one without; the kernels are built with a special set of options which makes it easy to figure out which functions change as a result of the patch. The two kernels will be compared, with the purpose of finding those functions. Changes can propagate further than one might expect, especially if, for example, an inline function is modified. Once a list of changed functions has been made, the updated code for those functions is packaged into a kernel module and loaded into the system. Then comes the tricky part: getting the running kernel to start using the new code. That requires patching the running code, which is a risky thing to do. Ksplice starts with a call to stop_machine_run(), which dumps a high-priority thread onto each processor, thus taking control of all processors in the system. It then examines all threads in the system to ensure that none of them are running in the functions to be replaced; if so, trampoline jumps are patched into the beginning of each replaced function (they "bounce" the call to the old code into the replacement code) and life continues. Otherwise ksplice will back off and try again later. This method imposes a number of limitations. One is that only code changes can be patched in with ksplice; patches which make changes to data structures cannot be accommodated. Another comes from the retry-based approach to ensuring that no threads are running in the patched functions; what happens if one of those functions is never free? Kernel functions like schedule(), sys_poll(), or sys_waitid() are likely to always have processes running within them. In cases like this, ksplice will eventually give up and inform the user that the patch cannot be done; it is simply not possible to make changes to those particular functions. These limitations mean that, out of 50 security patches examined by the ksplice developers, eight could not be applied with ksplice. So multi-year uptimes are probably still incompatible with the application of all security patches. Even so, ksplice certainly has the potential to reduce patch-related downtime considerably. Chances are good that there will be a fair amount of interest in ksplice in sites running high-uptime, mission-critical systems. There are few things in the way of an immediate merge of this code into the mainline. One is a matter of coding quality and can be fixed. Then, there is the matter of the lead developer being unconvinced that merging this code makes sense since it is, essentially, a standalone feature. Andi Kleen's response made the (usual) reasons for merging the code clear:
To be honest you weren't the first to come up with something like
this (although you're the first to post to l-k as far as I
know). But the usual problem of something that is kept out of tree
is that it eventually bitrots and gets forgotten. The only sane way
to make such extensions a generically usable linux feature is to
merge them to mainline.
So, presumably, the code will eventually be proposed for a mainline merge. But there is one other little difficulty pointed out by Tomasz Chmielewski: Microsoft holds a patent described this way:
A system and method for automatically updating software components
on a running computer system without requiring any interruption of
service. A software module is hotpatched by loading a patch into
memory and modifying an instruction in the original module to jump
to the patch.
Microsoft came up with this novel new technique in the distant past: 2002. The posting immediately brought out a crowd of surprised graybeards who distinctly remember using such techniques on their PDP-11 systems some decades before Microsoft "invented" hot-patching. The basic claim of the patent would thus appear to be invalidated by some decades' worth of prior art, but some of the dependent claims include features (such as capturing all other processors on the system) which were unlikely to be useful on PDP-11s. Given that the kernel developers are now well aware of this patent, they must take it into account when deciding whether to accept this code into the mainline. It would not be surprising if they chose to avoid baiting the Microsoft FUD machine in this way, even if they all agreed that the patent lacked validity. So a promising technology risks being left out of the kernel as the result of a software patent which was filed at least 30 years too late.
Patches and updates Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet |
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.