Brief items
The current development kernel is 2.6.0-test4; Linus has not
released any kernels since August 22.
That situation may have changed by the time you read this, however; Linus
is back from his vacation and has merged a great many changes into his
BitKeeper tree. Patches there include
a reworked de4x5 driver, "very basic" VIA 8237 serial ATA controller
support, a set of MODULE_ALIAS() calls (see below), support for a
software-implemented hard disk activity LED, Intel High Precision
Event Timers support, Al Viro's first set of large dev_t
support patches (covered here last week),
some IDE work, a large USB update, lots of network driver fixes, a new set
of iptables modules, and many other fixes.
The current stable kernel is 2.4.22. Marcelo released the second 2.4.23 prepatch on
August 30; it contains a set of IDE patches, some
USB and networking fixes, and a number of other updates.
2.4.23-pre3 followed on September 3,
with a fair amount of networking work, a backport of the 2.6
request_firmware() interface, a DRI/DRM update, gcc-3.3 support, and
various other fixes.
Marcelo also notes that he has left Conectiva; his kernel work is now being
supported by Cyclades.
Comments (3 posted)
Kernel development news
The
Class-based Kernel Resource Management
(CKRM) project is an effort at IBM to provide the hooks for better control
over resource consumption by processes. The CKRM project sees the existing
resource management tools (
nice,
ulimit) as not being up
to the task. So the CKRM hackers have set out to provide a whole new
infrastructure for process control. The ideas were presented at the Ottawa
Linux Symposium last July; now, the first set of patches has been posted.
The
overview posting describes the other
patches in the set and gives some pointers to further information.
The core concept behind CKRM is the division of processes into distinct
classes, each of which has a separate set of policies applied to it. A
kernel API has been provided which enables the loading of classifier
modules, enabling different sites to have entirely different ways of
classifying processes. Most would likely stick with the rule-based classifier, which is provided with
the CKRM patch set; it allows
classification based on various task structure fields. So, for example,
processes can be classified based on their UID, which program they are
running, etc.
Tasks can be reclassified any number of times over their lifetime. The CKRM core patch places hooks in the logical
spots where a process could change classification: when a user or group ID
is changed, when a program calls exec(), when a new process is forked,
etc. There is also a plan for a system call allowing a process to request
reclassification at any time, but that call does not appear to be present
in the current patches.
Once a task is classified, the system can apply policies to that task. So,
for example, the CPU control patch enforces
CPU usage policies on processes. Essentially, each class (as a whole) can
be restricted to (and guaranteed) access to a administrator-specified
percentage of the available processor time. To implement this policy, the
patch modifies the scheduler by creating a new run queue for each class.
Before the scheduler picks a new process to run, it first decides which
class has the highest-priority claim on the CPU. The process to run can
then be chosen from that class's queue in the usual way.
The memory control patch, instead,
implements policies stating how much physical memory each class can use.
The patch hooks into the page reclamation code, making that code rather
more selective in how it choses pages to kick out of main memory. Whenever
possible, the page reclaimer only choses pages from classes which are going
over their maximum allowed share of physical memory. As memory gets
tighter, each class will be trimmed down to its minimum share, as set up by
the administrator. If there is no real pressure on memory, however,
processes are allowed to grow beyond the bounds set for their class.
The memory control problem is complicated by shared pages: what happens
when pages are shared between processes in different classes? The
documentation on the CKRM web site describes an elaborate mechanism where
classes are set up in a hierarchy and shared pages are divided across the
appropriate parts of that hierarchy. What the current code appears to do,
however, is to simply assign shared pages to the class with the largest
share of physical memory.
The CKRM team also describes mechanisms which allow control over the disk
I/O bandwidth used by each class and the number of incoming network
connections each class can be handling at a given time. The I/O
limitations are implemented by adding per-class queues to the disk I/O
scheduler and merging requests into a single dispatch queue with the
bandwidth policies taken into account. The networking policies involve the
creation of yet another set of class-specific queues; in this case,
incoming connections are divided into classes through the use of iptables
rules. Patches for I/O bandwidth and incoming network connection control
have not been released at this time, however.
CKRM is clearly a work in progress; much of the structure is in place, but
not everything has been implemented and the code is full of "this needs to
be cleaned up" comments. The CKRM hackers hope to get their work into 2.7,
however, so they have some time yet to work things into shape.
Comments (4 posted)
One of the surprises in 2.6.0-test4 was the merge of a pile of power
management patches from Patrick Mochel. The patches themselves were not a
surprise; their arrival has been expected for some time. In fact, at the
Ottawa Linux Symposium, Patrick had promised to try to get them in by
August 20. The surprising part is that they went straight to Linus,
with no prior appearance on linux-kernel.
The -test4 patches made a number of changes. Perhaps the most significant
were the move of the device suspend() and resume()
methods out of the device structure and into the bus_type
structure. Bus-level code now is explicitly responsible for handling power
management operations on devices attached to the bus.
Also changed in -test4 was the software suspend code; this code has been
massively reworked and cleaned up. A number of panic() calls have
been removed, requirements have been made explicit, the underlying
mechanisms are more flexible, and the code is somewhat easier to read. The
only problem is that, in -test4, software suspend also no longer works.
The various problems which were introduced are being fixed, but one kernel
developer in particular - the 2.6 software suspend maintainer - has been
very loud in his criticisms and complaints. As a result, Patrick has
stated that he will no longer go anywhere near the software suspend code.
He evidently has his own implementation which he has chosen not to merge so
far; it may put in an appearance in the near future.
Patrick also took some grief for the removal of /proc/acpi/sleep,
which no longer fits well into the power management structure. It is,
however, an interface which has been present for a while, and can thus
break user-space programs.
Given all that, it is perhaps not surprising that Patrick announced his next set of changes on
linux-kernel before sending them off to Linus. With these changes, the
various suspend states all work with ACPI - at least, on a system without
much going on. There is still a lot of work to do, especially with regard
to adding driver support. But things appear to be heading in the right
direction.
The new set of patches restores /proc/acpi/sleep, and the older
software_suspend() function (as a wrapper for the current
pm_suspend() function) as well. A number of software suspend
improvements have been added. And various other aspects of the code
have been cleaned up. With one exception, the developers are not
complaining about the new power management code. With luck, one of the
remaining 2.6 rough edges will soon be smoothed out.
Comments (none posted)
The Linux kernel has long had the capability to load modules on demand when
external events make their presence necessary. In many cases, the kernel
knows exactly which module is required, and can simply ask for it by name.
So, for example, the IDE subsystem can call:
request_module("ide-cd");
should it encounter a CD needing a driver. In many cases, however, the
kernel does not know exactly which module should be loaded; in these cases
it punts the question into user space. So, for example, if a user program
tries to open a block device node with major number 100, and no driver has
registered that number, the kernel will try to load a module called
block-major-100. The job of finding a module then falls on
modprobe, which will expect to find an alias line in
/etc/modules.conf telling it what module should really be loaded.
The only problem with this scheme is that device drivers usually already
know which device numbers they are prepared to support. Adding
configuration information to /etc/modules.conf is, at best,
redundant. It can also be misleading; the poor administrator who tries to
connect a driver to a different device number via modules.conf is
unlikely to experience much joy.
When the new module loader was added - almost one year ago, now - it
included a new MODULE_ALIAS macro. The purpose of this macro is
to allow driver authors to specify directly which aliases the module should
be responsible for. It is an idea that makes sense, but uptake has been
slow; a quick grep of the 2.6.0-test4 source shows that there is not a
single use of MODULE_ALIAS in the kernel tree.
That situation appears to be about to change, now that Rusty Russell has
released a set of patches which insert actual MODULE_ALIAS calls
into the kernel source. The actual variants used depend on the subsystem;
block drivers use MODULE_ALIAS_BLOCKDEV, for example, while char
devices use MODULE_ALIAS_CHARDEV or MODULE_ALIAS_MISCDEV
and network protocols use MODULE_ALIAS_NETPROTO.
There are still situations which require alias commands in the
modules.conf file: there is no way for a driver author to know
which module should be loaded to implement eth0, for example. But
many of the boilerplate aliases can be, eventually, removed. Internal
alias support has been present in module-init-tools for some time, so all
that's needed before the alias commands can be cleaned up is to
get rid of all those legacy 2.4 (and prior) kernels.
Comments (none posted)
Patches and updates
Kernel trees
- Andrew Morton: 2.6.0-test4-mm5. "<span>Dropped out Con's CPU scheduler work, added Nick's. This is to help us
in evaluating the stability, efficacy and relative performance of Nick's
work.</span>"
(September 3, 2003)
- Andrea Arcangeli: 2.4.22aa1.
(September 2, 2003)
Core kernel code
- Con Kolivas: O19int.
(August 29, 2003)
- Con Kolivas: O20int.
(September 3, 2003)
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Kernel building
Memory management
Networking
Architecture-specific
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>