The current 2.6 prepatch remains 2.6.13-rc3
. Linus's git repository
continues to accumulate patches; most of them are fixes, but there is also
a set of SCSI updates and a set of cleanups for the system shutdown and
The current -mm tree is 2.6.13-rc3-mm2. Quite a few
patches have been added to -mm recently, but they are almost exclusively
fixes for various problems. Andrew estimates there are over 100 patches in
-mm which need to go straight into 2.6.13.
The current 2.4 prepatch is 2.4.32-pre2, released by Marcelo on July 27. It
includes a small number of fixes, including one which closes a security
Comments (none posted)
Kernel development news
One of the outcomes from the power management summit held on July 17
was a decision to move toward merging Nigel Cunningham's suspend2 patches -
at least, those which appear to make sense to the wider community.
Suspend2 is an out-of-tree implementation of the suspend-to-disk and
suspend-to-ram features which are so nice to have on laptop systems. The
suspend2 implementation offers a number of features not found in the
mainline version, including nice displays, progress bars, interruptible
operation, and, it is said, greater reliability. Getting the better parts
of suspend2 into the mainline seems like a clearly desirable goal. Since
the summit, Nigel has posted a pair of patches which provide some clues as
to what is coming, and how it will be received.
A big part of the software suspend problem is getting the system into a
quiescent state before putting it on ice. To that end, processes are put
into the "refrigerator," a special sort of suspended animation. When
suspend time comes around, every process on the system is sent a special
signal telling it that refrigeration time has come; each process, once it
gets to a good stopping point, checks itself into the fridge and does not
run again until after the system has been resumed.
The problem that this scheme runs into is that some processes are dependent
on others. If a process which, for example, is involved with getting data
written to disk is refrigerated too early in the process, it may never be
possible to get the system to a state where it can be suspended. So the
software suspend patches try to figure out which processes must be allowed
to continue running while the system is being quiesced. It has always been
a bit of a hit-and-miss business. The current suspend2 patches try to
clean up that process a bit.
Many of the processes which should not be refrigerated are associated with
various driver workqueues. So the mainline suspend code marks every
workqueue process with the special PF_NOFREEZE flag, keeping it
out of the clutches of the refrigerator. But most of those processes can
be refrigerated just fine with no ill effect, and they should be. Having
unneeded processes running when the system is trying to suspend itself can
only serve to destabilize the entire situation.
Previous versions of the suspend2 patches changed the workqueue API so that
every creator of a workqueue had to explicitly state whether it should be
refrigerated or not. That approach worked, but it broke every
create_workqueue() call. The current patch, instead, leaves
the existing calls alone, but extends the API with a couple of new calls:
struct workqueue_struct *create_nofreeze_workqueue(const char *name);
struct workqueue_struct *create_nofreeze_singlethread_workqueue(const char *name);
As an aside, one notes that kernel namespace is starting to acquire some
very long function names. One might almost wish for the good old days,
when only the first six characters of a function name were used.
Seriously, however, these functions show how refrigeration is now handled
with workqueues. By default, worker tasks associated with workqueues will
be put on ice when the system is suspended. Anybody wishing to create a
workqueue which does not behave that way must call one of the new
This change has been propagated down to the generic kernel threads
layer, which also picked up a new function:
struct task_struct *kthread_nofreeze_create(int (fn)(void *data),
const char *namefmt, ...);
This patch seems likely to be merged with, at most, minor tweaks. Nigel's
second patch, however, got a
somewhat less friendly reception.
It creates a new process flag called PF_SYNCTHREAD.
Any process which is actively trying to flush data to disk is marked with
this flag; the end result is that it will be passed over by the
refrigerator during the early part of the suspend process. In this way,
processes which are creating dirty pages can be put on hold prior to those which
are trying to clean those pages up. This patch is not popular, however; it
has been criticized for being overly intrusive when simply flushing all
pages to disk prior to beginning the suspend process would do the trick.
So, unless things change, this patch will not go in.
In any case, these patches are just preparatory work for a larger event:
the merging of a new refrigerator implementation. That code has not
(recently) been posted; stay tuned.
Comments (7 posted)
Kernel testing, or the lack thereof, is considered to be a significant part
of the kernel quality problem. Recent kernels, while quite good in many
regards, contain more bugs than they should because people have not gotten
around to testing them before the final release. Many regressions are in
device drivers, which present special testing problems: drivers can only be
tested by people who have the relevant hardware. Core kernel code,
however, is hardware independent and should be easier to test. But bugs
can slip through in that code as well.
Consider, for example, the realtime rlimits feature, which can be used to
enable otherwise unprivileged users to run processes with elevated
priority. Andreas Steinmetz recently noticed that this feature does not work in the
2.6.13-rc3 kernel. This would seem to be just the sort of feedback the
process needs: a user, testing a feature in a -rc kernel, found a bug and
provided a patch to fix it. As a result, that particular bug will not be
present in 2.6.13.
The only problem is that, as confirmed by
Ingo Molnar, the bug is a little older than that. In fact, the realtime
resource limit feature does not work at all in the stable 2.6.12 kernel, and nobody
noticed until now. This is a feature which can be tested by just about
anybody, but that work clearly had not been done. Given that nobody
appears to be using this feature, Ingo is not
confident that the fix can go into a 2.6.12 stable release; this one
will have to wait for 2.6.13.
It should be said that testing realtime resource limits is not an entirely
straightforward operation; setting that limit requires changes to the PAM
library, C library, and the shells as well. Very few distributions - and
no major ones - are shipping those changes at this time. Even so,
unprivileged realtime scheduling is a feature that a number of people had
been asking for. It is a little surprising that none of those people
noticed that it failed to work in a major kernel release. Getting
comprehensive testing coverage for the kernel is clearly still a problem -
even before drivers are taken into account.
Comments (8 posted)
Anytime your editor gives a talk on kernel development efforts, there seems
to be one project which inspires scattered boos and hisses from the audience. The
lucky project this year was Class-based Kernel Resource Management (CKRM).
The CKRM patches have been under development for some time, and the
developers involved have been pushing for inclusion. The future of the
CKRM patches seems uncertain, however; there is significant opposition to
them being merged.
The idea behind CKRM is to give system administrators a high degree of
control over how the resources on a system are used. To that end, it puts
every process into a "class," then applies rules specifying which resources
are available to each class. On the classification side, CKRM includes a
rule-based classification engine which can pigeonhole processes in a number
of ways: its user or group IDs, the command it is running, which ports it
is listening to, etc. Classification engines are pluggable, however, so a
site with specific needs could write its own. It is also possible for an
administrator to directly shove
a process into a given class by way of a virtual filesystem interface.
The controlling side regulates how much of the system each class can use.
Maximum limits can be applied, in a way similar to the resource limits
built into the kernel now. There is also a mechanism for specifying a
"guarantee," a minimum amount of resource which will be allocated to a
class. So an administrator can set things up such that the web server will
not take more than half the CPU, or that the X server will always get at
least 20% if it needs it.
That leads to another component of CKRM: controllers. Each controller
manages the allocation of one specific resource in the system. CPU usage
is regulated by the CPU controller; as it happens, the CKRM patches in the
-mm tree do not currently include that controller. The CPU controller
extends its fingers fairly deeply into the Linux scheduler, and the
developers do not feel that it is ready for inclusion quite yet. In fact,
the only controllers currently in -mm handle the total number of tasks and
the rate at which processes can fork. Many other controllers are in
development, handling resources like main memory, disk I/O bandwidth,
network bandwidth, and more.
The CKRM patches are large - over 14,000 lines in -mm. They also must
place hooks into many sensitive parts of the kernel in order to be able to
monitor process transitions and enforce resource limits and guarantees.
Any patch which digs into parts of the core kernel in this way is going to
see a fair amount of scrutiny, and CKRM is no exception. In this case,
many developers see CKRM as an overly complex subsystem which is aimed at
the needs of the customers of one specific vendor. Most Linux users simply
do not need to have such fine-grained control over resource usage on their
CKRM looks like a bit of a long-term maintenance headache as well. Every
subsystem which requires distributing hooks around the kernel (think of the
Linux security modules, the audit subsystem, or inotify as other subsystems
of this type) is essentially overlaying a new structure on top of the base
kernel. Any changes to the kernel must be done carefully so that none of
the overlaid structures will break. So each one of these structures makes
kernel programming a little harder; it is one more thing a developer must
keep in mind when making changes. Mix in the fact that most kernel
developers (and testers, for that matter) will not have CKRM configured
into their kernels, and it becomes clear that a subsystem like CKRM could
turn out to be relatively fragile.
Supporters of CKRM see it as a useful tool for the management of larger
systems (they see applications for smaller systems as well). In
particular, it can be used with virtualization systems (Xen or
UML, for example) to consolidate servers onto a smaller hardware base while
providing appropriate resource guarantees for the guest servers. Thus, says Gerrit Huizenga, CKRM can be thought of
as part of the "eco-computing movement." CKRM imposes no overhead on the
system if it is configured out, and almost no overhead if it is built in
but not used. Since CKRM is useful for some users, and stays out of the
way for the rest, it is worth adding to the kernel.
For now, CKRM is in -mm for people to play with; Andrew Morton has noted
that it is not, yet, on a path toward inclusion in the near future. He
wants to see a real debate, however, and
not a simple, offhand rejection:
But there's been a lot of work put into this and if we're to flatly
reject the feature then the developers are owed a much better
reason than "eww yuk".
So far, that reason has not been provided in any definitive way. So expect
to see this topic come up again as the developers try to get a real answer
on whether CKRM is headed for the mainline or not.
Comments (3 posted)
Linus has announced
the availability of a
git repository containing all of the kernel development history back to the
beginning of the BitKeeper era. Using the new "pack" format, the entire
history fits in less than 200MB of disk space - less than a single,
uncompressed kernel source tree. This history does not currently tie into
the current mainline, though there are ways to stitch it all together.
Note that this history is obtained by way of the CVS repository; some
information is lost by taking this path, but potential disputes over the
use of the BitKeeper metadata are avoided.
Linus's note does not say where the repository can be found; it will be on
your favorite kernel.org mirror under
Comments (11 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>