LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch remains 2.6.13-rc3. Linus's git repository continues to accumulate patches; most of them are fixes, but there is also a set of SCSI updates and a set of cleanups for the system shutdown and reboot code.

The current -mm tree is 2.6.13-rc3-mm2. Quite a few patches have been added to -mm recently, but they are almost exclusively fixes for various problems. Andrew estimates there are over 100 patches in -mm which need to go straight into 2.6.13.

The current 2.4 prepatch is 2.4.32-pre2, released by Marcelo on July 27. It includes a small number of fixes, including one which closes a security hole.

Comments (none posted)

Kernel development news

A pair of suspend2 patches

One of the outcomes from the power management summit held on July 17 was a decision to move toward merging Nigel Cunningham's suspend2 patches - at least, those which appear to make sense to the wider community. Suspend2 is an out-of-tree implementation of the suspend-to-disk and suspend-to-ram features which are so nice to have on laptop systems. The suspend2 implementation offers a number of features not found in the mainline version, including nice displays, progress bars, interruptible operation, and, it is said, greater reliability. Getting the better parts of suspend2 into the mainline seems like a clearly desirable goal. Since the summit, Nigel has posted a pair of patches which provide some clues as to what is coming, and how it will be received.

A big part of the software suspend problem is getting the system into a quiescent state before putting it on ice. To that end, processes are put into the "refrigerator," a special sort of suspended animation. When suspend time comes around, every process on the system is sent a special signal telling it that refrigeration time has come; each process, once it gets to a good stopping point, checks itself into the fridge and does not run again until after the system has been resumed.

The problem that this scheme runs into is that some processes are dependent on others. If a process which, for example, is involved with getting data written to disk is refrigerated too early in the process, it may never be possible to get the system to a state where it can be suspended. So the software suspend patches try to figure out which processes must be allowed to continue running while the system is being quiesced. It has always been a bit of a hit-and-miss business. The current suspend2 patches try to clean up that process a bit.

Many of the processes which should not be refrigerated are associated with various driver workqueues. So the mainline suspend code marks every workqueue process with the special PF_NOFREEZE flag, keeping it out of the clutches of the refrigerator. But most of those processes can be refrigerated just fine with no ill effect, and they should be. Having unneeded processes running when the system is trying to suspend itself can only serve to destabilize the entire situation.

Previous versions of the suspend2 patches changed the workqueue API so that every creator of a workqueue had to explicitly state whether it should be refrigerated or not. That approach worked, but it broke every create_workqueue() call. The current patch, instead, leaves the existing calls alone, but extends the API with a couple of new calls:

    struct workqueue_struct *create_nofreeze_workqueue(const char *name);
    struct workqueue_struct *create_nofreeze_singlethread_workqueue(const char *name);

As an aside, one notes that kernel namespace is starting to acquire some very long function names. One might almost wish for the good old days, when only the first six characters of a function name were used.

Seriously, however, these functions show how refrigeration is now handled with workqueues. By default, worker tasks associated with workqueues will be put on ice when the system is suspended. Anybody wishing to create a workqueue which does not behave that way must call one of the new functions.

This change has been propagated down to the generic kernel threads layer, which also picked up a new function:

    struct task_struct *kthread_nofreeze_create(int (fn)(void *data),
                                                void *data,
						const char *namefmt, ...);

This patch seems likely to be merged with, at most, minor tweaks. Nigel's second patch, however, got a somewhat less friendly reception. It creates a new process flag called PF_SYNCTHREAD. Any process which is actively trying to flush data to disk is marked with this flag; the end result is that it will be passed over by the refrigerator during the early part of the suspend process. In this way, processes which are creating dirty pages can be put on hold prior to those which are trying to clean those pages up. This patch is not popular, however; it has been criticized for being overly intrusive when simply flushing all pages to disk prior to beginning the suspend process would do the trick. So, unless things change, this patch will not go in.

In any case, these patches are just preparatory work for a larger event: the merging of a new refrigerator implementation. That code has not (recently) been posted; stay tuned.

Comments (7 posted)

Kernel testing and regressions: an example

Kernel testing, or the lack thereof, is considered to be a significant part of the kernel quality problem. Recent kernels, while quite good in many regards, contain more bugs than they should because people have not gotten around to testing them before the final release. Many regressions are in device drivers, which present special testing problems: drivers can only be tested by people who have the relevant hardware. Core kernel code, however, is hardware independent and should be easier to test. But bugs can slip through in that code as well.

Consider, for example, the realtime rlimits feature, which can be used to enable otherwise unprivileged users to run processes with elevated priority. Andreas Steinmetz recently noticed that this feature does not work in the 2.6.13-rc3 kernel. This would seem to be just the sort of feedback the process needs: a user, testing a feature in a -rc kernel, found a bug and provided a patch to fix it. As a result, that particular bug will not be present in 2.6.13.

The only problem is that, as confirmed by Ingo Molnar, the bug is a little older than that. In fact, the realtime resource limit feature does not work at all in the stable 2.6.12 kernel, and nobody noticed until now. This is a feature which can be tested by just about anybody, but that work clearly had not been done. Given that nobody appears to be using this feature, Ingo is not confident that the fix can go into a 2.6.12 stable release; this one will have to wait for 2.6.13.

It should be said that testing realtime resource limits is not an entirely straightforward operation; setting that limit requires changes to the PAM library, C library, and the shells as well. Very few distributions - and no major ones - are shipping those changes at this time. Even so, unprivileged realtime scheduling is a feature that a number of people had been asking for. It is a little surprising that none of those people noticed that it failed to work in a major kernel release. Getting comprehensive testing coverage for the kernel is clearly still a problem - even before drivers are taken into account.

Comments (8 posted)

Is CKRM worth it?

Anytime your editor gives a talk on kernel development efforts, there seems to be one project which inspires scattered boos and hisses from the audience. The lucky project this year was Class-based Kernel Resource Management (CKRM). The CKRM patches have been under development for some time, and the developers involved have been pushing for inclusion. The future of the CKRM patches seems uncertain, however; there is significant opposition to them being merged.

The idea behind CKRM is to give system administrators a high degree of control over how the resources on a system are used. To that end, it puts every process into a "class," then applies rules specifying which resources are available to each class. On the classification side, CKRM includes a rule-based classification engine which can pigeonhole processes in a number of ways: its user or group IDs, the command it is running, which ports it is listening to, etc. Classification engines are pluggable, however, so a site with specific needs could write its own. It is also possible for an administrator to directly shove a process into a given class by way of a virtual filesystem interface.

The controlling side regulates how much of the system each class can use. Maximum limits can be applied, in a way similar to the resource limits built into the kernel now. There is also a mechanism for specifying a "guarantee," a minimum amount of resource which will be allocated to a class. So an administrator can set things up such that the web server will not take more than half the CPU, or that the X server will always get at least 20% if it needs it.

That leads to another component of CKRM: controllers. Each controller manages the allocation of one specific resource in the system. CPU usage is regulated by the CPU controller; as it happens, the CKRM patches in the -mm tree do not currently include that controller. The CPU controller extends its fingers fairly deeply into the Linux scheduler, and the developers do not feel that it is ready for inclusion quite yet. In fact, the only controllers currently in -mm handle the total number of tasks and the rate at which processes can fork. Many other controllers are in development, handling resources like main memory, disk I/O bandwidth, network bandwidth, and more.

The CKRM patches are large - over 14,000 lines in -mm. They also must place hooks into many sensitive parts of the kernel in order to be able to monitor process transitions and enforce resource limits and guarantees. Any patch which digs into parts of the core kernel in this way is going to see a fair amount of scrutiny, and CKRM is no exception. In this case, many developers see CKRM as an overly complex subsystem which is aimed at the needs of the customers of one specific vendor. Most Linux users simply do not need to have such fine-grained control over resource usage on their systems.

CKRM looks like a bit of a long-term maintenance headache as well. Every subsystem which requires distributing hooks around the kernel (think of the Linux security modules, the audit subsystem, or inotify as other subsystems of this type) is essentially overlaying a new structure on top of the base kernel. Any changes to the kernel must be done carefully so that none of the overlaid structures will break. So each one of these structures makes kernel programming a little harder; it is one more thing a developer must keep in mind when making changes. Mix in the fact that most kernel developers (and testers, for that matter) will not have CKRM configured into their kernels, and it becomes clear that a subsystem like CKRM could turn out to be relatively fragile.

Supporters of CKRM see it as a useful tool for the management of larger systems (they see applications for smaller systems as well). In particular, it can be used with virtualization systems (Xen or UML, for example) to consolidate servers onto a smaller hardware base while providing appropriate resource guarantees for the guest servers. Thus, says Gerrit Huizenga, CKRM can be thought of as part of the "eco-computing movement." CKRM imposes no overhead on the system if it is configured out, and almost no overhead if it is built in but not used. Since CKRM is useful for some users, and stays out of the way for the rest, it is worth adding to the kernel.

For now, CKRM is in -mm for people to play with; Andrew Morton has noted that it is not, yet, on a path toward inclusion in the near future. He wants to see a real debate, however, and not a simple, offhand rejection:

But there's been a lot of work put into this and if we're to flatly reject the feature then the developers are owed a much better reason than "eww yuk".

So far, that reason has not been provided in any definitive way. So expect to see this topic come up again as the developers try to get a real answer on whether CKRM is headed for the mainline or not.

Comments (3 posted)

Older kernel history in git format

Linus has announced the availability of a git repository containing all of the kernel development history back to the beginning of the BitKeeper era. Using the new "pack" format, the entire history fits in less than 200MB of disk space - less than a single, uncompressed kernel source tree. This history does not currently tie into the current mainline, though there are ways to stitch it all together. Note that this history is obtained by way of the CVS repository; some information is lost by taking this path, but potential disputes over the use of the BitKeeper metadata are avoided.

Linus's note does not say where the repository can be found; it will be on your favorite kernel.org mirror under /pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git.

Comments (11 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Networking

Architecture-specific

Security-related

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds