LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 development kernel is 2.6.26-rc8, released by Linus on June 24. "It hasn't been a week, I know, and this is a pretty small set of changes since -rc7, but I'm going to be mostly incommunicado for the next week or so, so I just released what will hopefully be the last -rc." See the long-format changelog for all the details.

2.6.26-rc7 was released on June 20; it contains another set of fixes and support for some new graphics cards.

As of this writing, no patches have been merged into the mainline git repository since 2.6.26-rc8.

The current stable 2.6 kernel is 2.6.25.9, released on June 24. It contains a small set of fixes, a couple of which have security implications. 2.6.25.8 was released on June 21 with about a dozen fixes.

Comments (3 posted)

Kernel development news

Quotes of the week

The problem with leaving everything tweakable is that you're asking users to make choices about things but not giving them the information they need to make those choices. Whether you get a power saving from hard drive spindown depends on whether the drive is idle for long enough to save the power you'll spend spinning it back up. Get it wrong and you'll be putting your drive under extra load, reducing performance and consuming more power than you were to begin with.
-- Matthew Garrett

If somebody wants to play with it, go wild. I didn't do "change_bit()", because nobody sane uses that thing anyway. I guarantee nothing. And if it breaks, nobody saw me do anything. You can't prove this email wasn't sent by somebody who is good at forging smtp.
-- Linus Torvalds

Look at it this way: there is no way in which the reviewer of this patch (ie: me) can work out why this function exists. Hence there will be no way in which future readers of this code will be able to work out why this function exists either. This is bad. These things should be described in code comments and in the changelog (whichever is most appropriate).
-- Andrew Morton

Comments (5 posted)

A position statement on closed-source kernel modules

A position statement on the distribution of closed-source kernel modules has been issued and signed by a long list of developers. "We, the undersigned Linux kernel developers, consider any closed-source Linux kernel module or driver to be harmful and undesirable. We have repeatedly found them to be detrimental to Linux users, businesses, and the greater Linux ecosystem. Such modules negate the openness, stability, flexibility, and maintainability of the Linux development model and shut their users off from the expertise of the Linux community. Vendors that provide closed-source kernel modules force their customers to give up key Linux advantages or choose new vendors. Therefore, in order to take full advantage of the cost savings and shared support benefits open source has to offer, we urge vendors to adopt a policy of supporting their customers on Linux with open-source kernel code." Click below for associated information and the names of the signatories.

Full Story (comments: 47)

Linux Graphics, a Tale of Three Drivers

James Bottomley has posted an essay on graphics drivers on the Linux Foundation site. "For Linux, the best way of demonstrating user satisfaction objectively is with the kerneloops project, which tracks reported problems with various kernels (an oops is something that's equivalent to a panic on Unix or blue screen on windows). For instance looking at the recently released 2.6.25 kernel one can see that both the binary Nvidia driver and binary ATI firegl driver account for positions in the top 15 oopses. If one follows the history, one finds that the binary drivers are always significant contributors to this list, whereas open source drivers appear and disappear (corresponding to people actually seeing the bugs and fixing them). This provides objective support for a significant kernel developer contention that it's harder to get fixes for binary drivers. The other bright spot is that the Intel graphics drivers rarely figure at all in the list also showing that if you want graphics to 'just work' then Intel is the one to choose."

Comments (36 posted)

A day in the life of linux-next

By Jonathan Corbet
June 23, 2008
The merge window phase of the kernel development cycle is a hectic time. Over a period of about two weeks, between 5,000 and 10,000 changesets find their way into the mainline git repository. Simply managing that many patches would be hard enough, but the job is made more complicated by the fact that these changesets are not all independent of each other. The first changes to be merged can change the code base in ways that cause later patches to fail to apply. So merge windows have traditionally required maintainers to rework their queued patches to resolve conflicts which arise as other trees are merged. Given the tight time constraints (patches which aren't ready when the merge window closes generally sit out until the next cycle starts), this integration process has been known to put a fair amount of pressure on subsystem maintainers.

The other person feeling the stress was Andrew Morton; one of his many jobs was to bash subsystem trees together in his -mm releases. That took a lot of his time and didn't really solve the problem in the end; much of the work which shows up in -mm isn't necessarily intended for the next development cycle. The end result of all this is that each merge window brought together large amounts of code which had never been integrated before.

Back in February, the linux-next tree was announced as a way to help ease some of these problems. We are now nearing the end of the first full development cycle to use linux-next, so it's worth taking a look to see how it is working out.

The idea behind this tree is relatively simple. Linux-next maintainer Stephen Rothwell keeps a list of trees (maintained with git or quilt) which are intended to be merged in the next development cycle. As of this writing, that list contains 95 trees, all full of patches aimed at 2.6.27. Once a day, Stephen goes through the process of applying these trees to the mainline, one at a time. With each merge, he looks for merge conflicts and build failures. The original plan for linux-next stated that trees causing conflicts or build failures would simply be dropped. In reality, so far, Stephen usually takes the time to figure out the problem; he'll then fix up or drop an individual patch to make everything fit again.

When this process is done, he releases the result as the linux-next tree for the day. Others then grab it and perform build testing on it; some people even boot and run the daily linux-next releases. All this results in a steady stream of problem reports, small fixes, patches moving from one tree to another, and so on - various bits of integration work required to make all of the pieces fit together nicely.

There is an interesting sort of implicit hierarchy in the ordering of the trees. Subsystem trees which are merged early in the process are less likely to run into conflicts than those which come later. When two trees do come into conflict, it's the owner of the later tree - the one which actually shows the conflict - who feels the most pressure to fix things up. The history so far, though, shows that there has been very little in the way of finger-pointing when conflicts arise, as they do almost every day. All of the developers understand that they are working on the same kernel, and they share a common interest in solving problems.

One aspect of this whole system remains untested, though: the movement of patches from linux-next into the mainline. So, thus far, linux-next appears to be functioning as intended. It is serving as an integration point for the next kernel and helping to get many of the merging problems out of the way ahead of time. One aspect of this whole system remains untested, though: the movement of patches from linux-next into the mainline. As things stand now, there is no automatic movement between the trees; instead, maintainers will send their pull requests directly to Linus as always. If Linus refuses to merge certain trees, or if he merges them in an order different from their ordering in linux-next, integration problems could return. In the end, it seems like linux-next will have to drive the final integration process more than is anticipated now, but it will probably take a few development cycles to figure out how to make it all work.

Meanwhile, anybody who is interested in 2.6.27 can, to a great extent, run it now by grabbing linux-next. This tree has clarified one aspect of the development process: the 2-3 month "development cycle" run by Linus is, in fact, just the tip of the kernel development iceberg. It is the final integration and stabilization stage. Linux-next nearly doubles the length of the visible development cycle by assembling the next kernel long before Linus starts working on it. And even linux-next only comes into play toward the end of a patch's life.

In the past, Linus has pointedly worked to avoid overlapping the development and stabilization phases of the development cycle. There was no development tree at all for almost a year while 2.4 was beaten into reasonable shape. This separation was maintained out of a simple fear that an open development tree would distract developers from the more important task of finding and fixing bugs in the current stable release.

That separation is a thing of the past now; there are literally dozens of development trees which are open for business at all times. That can only be worrisome to those who are concerned about the quality of kernel releases; why should developers concern themselves with 2.6.26 bugs when 2.6.27 is being assembled and 2.6.28 is already on the radar? Whether such concerns are valid is likely to be a matter of ongoing debate.

Meanwhile, however, linux-next appears to have settled in as a long-term feature of the kernel development landscape. It is serving its purpose as a place to find and resolve integration problems; it has also had the effect of taking much of that integration work off of Andrew Morton's shoulders. And that, in turn, should free him to spend more time trying to get developers to fix all those bugs.

(See the linux-next wiki for more information on how to work with this tree).

Comments (4 posted)

What's AdvFS good for?

By Jonathan Corbet
June 25, 2008
On June 23, HP announced that it was releasing the source for the "Tru64 Advanced Filesystem" (or AdvFS) under version 2 of the GPL. This is, clearly, a large release of code from HP. What is a bit less clear is what the value of this release will be for Linux. In the end, that value is likely to be significant, but it will be probably realized in relatively indirect and difficult-to-measure ways.

AdvFS was originally developed by Digital Equipment Corporation for its version of Unix; HP picked it up when it acquired Compaq, which had acquired DEC in 1998. This filesystem offers a number of the usual features. It is intended to be a high-performance filesystem, naturally. Extent-based block management and directory indexes are provided. It does journaling for fast crash recovery. There is an undelete feature. AdvFS is also designed to work in clustered environments.

Much of the thought that went into AdvFS was concerned with avoiding the need to take the system down. There is a snapshot feature which can be used to make consistent backups of running systems. Defragmentation can be done online. There is a built-in volume management layer which allows storage devices to be added to (or removed from) a running filesystem; files can also be relocated across devices. The internal volume manager can perform striping of files across devices, but nothing more advanced than that; AdvFS will happily work on top of a more capable volume manager, though.

There are a few things which AdvFS does not have. There is no checksumming of data, and, thus, no ability to catch corruption. Online filesystem integrity checking does not appear to be supported. The maximum filesystem size (16TB) probably seemed infinite in the early 1990's, but it's starting to look a little tight now. In general, AdvFS looks like something which was a very nice filesystem ten or fifteen years ago, but it has little that is not either available in Linux now, or in the works for the near future. And AdvFS doesn't even work with Linux - no porting effort has been made, and it's not clear that one will be made. So is this release just another dump of code being abandoned by its corporate owner?

One could make a first answer by saying that, even if this were true, it would still be welcome. If a company gives up on a piece of code, it's far preferable to put it out for adoption under the GPL than to let it rot until nobody can find it anymore. But there may well be value in this release.

Even if there is no point in trying to make it work under Linux, the AdvFS code is the repository of more than a decade of experience of making a high-end filesystem work in a commercial environment. Your editor had stopped working with DEC systems by the time AdvFS came out, but the word he heard from others is that the early releases were, shall we say, something that taught administrators about the value of frequent backups. But after a few major releases, AdvFS had stabilized into a fast, solid, and reliable filesystem. The current code will embody all of the hard lessons that were learned in the process of getting to that point.

Chris Mason, who is currently working on the Btrfs filesystem, puts it this way:

The idea is that well established filesystems can teach us quite a lot about layout, and about the optimizations that were added in response to customer demand. Having the code to these optimizations is very useful.

Having that code licensed under the GPL is especially useful: any code which is useful in its current form can be pulled quickly into Linux. And, even when the code itself cannot be used, the ideas that it embodies can be borrowed without fear. And that is exactly what HP was hoping to encourage with this release:

In case its not clear, this is a GPLv2 technology release, not an actual port to Linux. We're hoping that the code and documentation will be helpful in the development of new file systems for Linux that will provide similar capabilities, and perhaps used to make tweaks to existing file systems.

And that would appear to be likely to happen. Over time, the best ideas and experience from AdvFS should find their way into the filesystems supported by Linux, even if AdvFS, itself, never becomes one of those filesystems. So HP has made a significant contribution to the kernel development process, one which will probably never show up in the changeset counts and other easily-obtained metrics.

(Those interested in learning more about AdvFS would be well advised to grab the documentation tarball from the AdvFS sourceforge page. The "Hitchhiker's guide" is a good starting place, though, at 229 pages, it's not for hitchhikers who prefer to travel light.)

Comments (1 posted)

Freezing filesystems and containers

By Jake Edge
June 25, 2008

Freezing seems to be on the minds of some kernel hackers these days, whether it is the northern summer or southern winter that is causing it is unclear. Two recent patches posted to linux-kernel look at freezing, suspending essentially, two different pieces of the kernel: filesystems and containers. For containers, it is a step along the path to being able to migrate running processes elsewhere, whereas for filesystems it will allow backup systems to snapshot a consistent filesystem state. Other than conceptually, the patches have little to do with each other, but each is fairly small and self-contained so a combined look seemed in order.

Takashi Sato proposes taking an XFS-specific feature and moving it into the filesystem code. The patch would provide an ioctl() for suspending write access to a filesystem, freezing, along with a thawing option to resume writes. For backups that snapshot the state of a filesystem or otherwise operate directly on the block device, this can ensure that the filesystem is in a consistent state.

Essentially the patch just exports the freeze_bdev() kernel function in a user accessible way. freeze_bdev() locks a file system into a consistent state by flushing the superblock and syncing the device. The patch also adds tracking of the frozen state to the struct block_device state field. In its simplest form, freezing or thawing a filesystem would be done as follows:

    ioctl(fd, FIFREEZE, 0);

    ioctl(fd, FITHAW, 0);
Where fd is a file descriptor of the mount point and the argument is ignored.

In another part of the patchset, Sato adds a timeout value as the argument to the ioctl(). For XFS compatibility—though courtesy of a patch by David Chinner, the XFS-specific ioctl() is removed—a value of 1 for the pointer argument means that the timeout is not set. A value of 0 for the argument also means there is no timeout, but any other value is treated as a pointer to a timeout value in seconds. It would seem that removing the XFS-specific ioctl() would break any applications that currently use it anyway, so keeping the compatibility of the argument value 1 is somewhat dubious.

If the timeout occurs, the filesystem will be automatically thawed. This is to protect against some kind of problem with the backup system. Another ioctl() flag, FIFREEZE_RESET_TIMEOUT, has been added so that an application can periodically reset its timeout while it is working. If it deadlocks, or otherwise fails to reset the timeout, the filesystem will be thawed. Another FIFREEZE_RESET_TIMEOUT after that occurs will return EINVAL so that the application can recognize that it has happened.

Moving on to containers, Matt Helsley posted a patch which reuses the software suspend (swsusp) infrastructure to implement freezing of all the processes in a control group (i.e. cgroup). This could be used now to checkpoint and restart tasks, but eventually could be used to migrate tasks elsewhere entirely for load balancing or other reasons. Helsley's patch set is a forward port of work originally done by Cedric Le Goater.

The first step is to make the freeze option, in the form of the TIF_FREEZE flag, available to all architectures. Once that is done, moving two functions, refrigerator() and freeze_task(), from the power management subsystem to the new kernel/freezer.c file makes freezing tasks available even to architectures that don't support power management.

As is usual for cgroups, controlling the freezing and thawing is done through the cgroup filesystem. Adding the freezer option when mounting will allow access to each container's freezer.state file. This can be read to get the current freezer state or written to change it as follows:

    # cat /containers/0/freezer.state
    RUNNING
    # echo FROZEN > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    FROZEN
It should be noted that it is possible for tasks in a cgroup to be busy doing something that will not allow them to be frozen. In that case, the state would be FREEZING. Freezing can then be retried by writing FROZEN again, or canceled by writing RUNNING. Moving the offending tasks out of the cgroup will also allow the cgroup to be frozen. If the state does reach FROZEN, the cgroup can be thawed by writing RUNNING.

In order for swsusp and cgroups to share the refrigerator() it is necessary to ensure that frozen cgroups do not get thawed when swsusp is waking up the system after a suspend. The last patch in the set ensures that thaw_tasks() checks for a frozen cgroup before thawing, skipping over any that it finds.

There has not been much in the way of discussion about the patches on linux-kernel, but an ACK from Pavel Machek would seem to be a good sign. Some comments by Paul Menage, who developed cgroups, also indicate interest in seeing this feature merged.

Comments (4 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Networking

Security-related

Virtualization and containers

Benchmarks and bugs

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds