LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.0-test4; Linus has not released any kernels since August 22.

That situation may have changed by the time you read this, however; Linus is back from his vacation and has merged a great many changes into his BitKeeper tree. Patches there include a reworked de4x5 driver, "very basic" VIA 8237 serial ATA controller support, a set of MODULE_ALIAS() calls (see below), support for a software-implemented hard disk activity LED, Intel High Precision Event Timers support, Al Viro's first set of large dev_t support patches (covered here last week), some IDE work, a large USB update, lots of network driver fixes, a new set of iptables modules, and many other fixes.

The current stable kernel is 2.4.22. Marcelo released the second 2.4.23 prepatch on August 30; it contains a set of IDE patches, some USB and networking fixes, and a number of other updates. 2.4.23-pre3 followed on September 3, with a fair amount of networking work, a backport of the 2.6 request_firmware() interface, a DRI/DRM update, gcc-3.3 support, and various other fixes.

Marcelo also notes that he has left Conectiva; his kernel work is now being supported by Cyclades.

Comments (3 posted)

Kernel development news

Class-based Kernel Resource Management

The Class-based Kernel Resource Management (CKRM) project is an effort at IBM to provide the hooks for better control over resource consumption by processes. The CKRM project sees the existing resource management tools (nice, ulimit) as not being up to the task. So the CKRM hackers have set out to provide a whole new infrastructure for process control. The ideas were presented at the Ottawa Linux Symposium last July; now, the first set of patches has been posted. The overview posting describes the other patches in the set and gives some pointers to further information.

The core concept behind CKRM is the division of processes into distinct classes, each of which has a separate set of policies applied to it. A kernel API has been provided which enables the loading of classifier modules, enabling different sites to have entirely different ways of classifying processes. Most would likely stick with the rule-based classifier, which is provided with the CKRM patch set; it allows classification based on various task structure fields. So, for example, processes can be classified based on their UID, which program they are running, etc.

Tasks can be reclassified any number of times over their lifetime. The CKRM core patch places hooks in the logical spots where a process could change classification: when a user or group ID is changed, when a program calls exec(), when a new process is forked, etc. There is also a plan for a system call allowing a process to request reclassification at any time, but that call does not appear to be present in the current patches.

Once a task is classified, the system can apply policies to that task. So, for example, the CPU control patch enforces CPU usage policies on processes. Essentially, each class (as a whole) can be restricted to (and guaranteed) access to a administrator-specified percentage of the available processor time. To implement this policy, the patch modifies the scheduler by creating a new run queue for each class. Before the scheduler picks a new process to run, it first decides which class has the highest-priority claim on the CPU. The process to run can then be chosen from that class's queue in the usual way.

The memory control patch, instead, implements policies stating how much physical memory each class can use. The patch hooks into the page reclamation code, making that code rather more selective in how it choses pages to kick out of main memory. Whenever possible, the page reclaimer only choses pages from classes which are going over their maximum allowed share of physical memory. As memory gets tighter, each class will be trimmed down to its minimum share, as set up by the administrator. If there is no real pressure on memory, however, processes are allowed to grow beyond the bounds set for their class.

The memory control problem is complicated by shared pages: what happens when pages are shared between processes in different classes? The documentation on the CKRM web site describes an elaborate mechanism where classes are set up in a hierarchy and shared pages are divided across the appropriate parts of that hierarchy. What the current code appears to do, however, is to simply assign shared pages to the class with the largest share of physical memory.

The CKRM team also describes mechanisms which allow control over the disk I/O bandwidth used by each class and the number of incoming network connections each class can be handling at a given time. The I/O limitations are implemented by adding per-class queues to the disk I/O scheduler and merging requests into a single dispatch queue with the bandwidth policies taken into account. The networking policies involve the creation of yet another set of class-specific queues; in this case, incoming connections are divided into classes through the use of iptables rules. Patches for I/O bandwidth and incoming network connection control have not been released at this time, however.

CKRM is clearly a work in progress; much of the structure is in place, but not everything has been implemented and the code is full of "this needs to be cleaned up" comments. The CKRM hackers hope to get their work into 2.7, however, so they have some time yet to work things into shape.

Comments (4 posted)

Power management arrives

One of the surprises in 2.6.0-test4 was the merge of a pile of power management patches from Patrick Mochel. The patches themselves were not a surprise; their arrival has been expected for some time. In fact, at the Ottawa Linux Symposium, Patrick had promised to try to get them in by August 20. The surprising part is that they went straight to Linus, with no prior appearance on linux-kernel.

The -test4 patches made a number of changes. Perhaps the most significant were the move of the device suspend() and resume() methods out of the device structure and into the bus_type structure. Bus-level code now is explicitly responsible for handling power management operations on devices attached to the bus.

Also changed in -test4 was the software suspend code; this code has been massively reworked and cleaned up. A number of panic() calls have been removed, requirements have been made explicit, the underlying mechanisms are more flexible, and the code is somewhat easier to read. The only problem is that, in -test4, software suspend also no longer works. The various problems which were introduced are being fixed, but one kernel developer in particular - the 2.6 software suspend maintainer - has been very loud in his criticisms and complaints. As a result, Patrick has stated that he will no longer go anywhere near the software suspend code. He evidently has his own implementation which he has chosen not to merge so far; it may put in an appearance in the near future.

Patrick also took some grief for the removal of /proc/acpi/sleep, which no longer fits well into the power management structure. It is, however, an interface which has been present for a while, and can thus break user-space programs.

Given all that, it is perhaps not surprising that Patrick announced his next set of changes on linux-kernel before sending them off to Linus. With these changes, the various suspend states all work with ACPI - at least, on a system without much going on. There is still a lot of work to do, especially with regard to adding driver support. But things appear to be heading in the right direction.

The new set of patches restores /proc/acpi/sleep, and the older software_suspend() function (as a wrapper for the current pm_suspend() function) as well. A number of software suspend improvements have been added. And various other aspects of the code have been cleaned up. With one exception, the developers are not complaining about the new power management code. With luck, one of the remaining 2.6 rough edges will soon be smoothed out.

Comments (none posted)

MODULE_ALIAS

The Linux kernel has long had the capability to load modules on demand when external events make their presence necessary. In many cases, the kernel knows exactly which module is required, and can simply ask for it by name. So, for example, the IDE subsystem can call:

    request_module("ide-cd");

should it encounter a CD needing a driver. In many cases, however, the kernel does not know exactly which module should be loaded; in these cases it punts the question into user space. So, for example, if a user program tries to open a block device node with major number 100, and no driver has registered that number, the kernel will try to load a module called block-major-100. The job of finding a module then falls on modprobe, which will expect to find an alias line in /etc/modules.conf telling it what module should really be loaded.

The only problem with this scheme is that device drivers usually already know which device numbers they are prepared to support. Adding configuration information to /etc/modules.conf is, at best, redundant. It can also be misleading; the poor administrator who tries to connect a driver to a different device number via modules.conf is unlikely to experience much joy.

When the new module loader was added - almost one year ago, now - it included a new MODULE_ALIAS macro. The purpose of this macro is to allow driver authors to specify directly which aliases the module should be responsible for. It is an idea that makes sense, but uptake has been slow; a quick grep of the 2.6.0-test4 source shows that there is not a single use of MODULE_ALIAS in the kernel tree.

That situation appears to be about to change, now that Rusty Russell has released a set of patches which insert actual MODULE_ALIAS calls into the kernel source. The actual variants used depend on the subsystem; block drivers use MODULE_ALIAS_BLOCKDEV, for example, while char devices use MODULE_ALIAS_CHARDEV or MODULE_ALIAS_MISCDEV and network protocols use MODULE_ALIAS_NETPROTO.

There are still situations which require alias commands in the modules.conf file: there is no way for a driver author to know which module should be loaded to implement eth0, for example. But many of the boilerplate aliases can be, eventually, removed. Internal alias support has been present in module-init-tools for some time, so all that's needed before the alias commands can be cleaned up is to get rid of all those legacy 2.4 (and prior) kernels.

Comments (none posted)

Patches and updates

Kernel trees

  • Andrew Morton: 2.6.0-test4-mm5. "<span>Dropped out Con's CPU scheduler work, added Nick's. This is to help us in evaluating the stability, efficacy and relative performance of Nick's work.</span>" (September 3, 2003)
  • Andrea Arcangeli: 2.4.22aa1. (September 2, 2003)

Core kernel code

  • Con Kolivas: O19int. (August 29, 2003)
  • Con Kolivas: O20int. (September 3, 2003)

Development tools

Device drivers

Filesystems and block I/O

Janitorial

Kernel building

Memory management

Networking

Architecture-specific

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds