Brief items
The current stable 2.6 kernel is 2.6.11.8,
released on April 29.
The current 2.6 prepatch remains 2.6.12-rc3.
Linus's git repository contains a number of new "sparse" annotations, a
CIFS update, various architecture updates, resource limits for niceness and
realtime scheduling (see below), a new valid_signal() function
(for testing signal numbers), a JFS update, some networking tweaks, and
lots of fixes.
The current -mm tree is 2.6.12-rc3-mm2. Recent changes
to -mm include a number of new git trees, a cpufreq update, a new
/proc/zoneinfo file, some preparatory patches for Xen, and some
ext3 latency reduction work.
Comments (none posted)
Kernel development news
We're still miles away from 2.6.12.
-- Andrew Morton
Comments (none posted)
Further evidence that the the kernel source code management situation is
slowly stabilizing: there is now
a web
interface to the kernel.org git repositories. Most people, perhaps,
will be interested in
Linus's
tree, where the latest patches merged into the mainline can be viewed,
but there are several developer trees available as well. (Thanks to Steven
Cole).
Comments (16 posted)
The long debate on how to provide preferential scheduling for audio
applications would appear to have come to an end. The realtime Linux
security module has not been merged; instead, the mainline now includes
a version of the rlimit patch. This is
not the outcome which was most favored by the audio development community,
but it will still be useful for them.
The patch creates two new resource limits. RLIMIT_NICE controls
the maximum "niceness" that the process can set for itself in the normal
timesharing scheduler. The limit has a range of 0..39, with 39
corresponding to an internal niceness value of -20 - the highest priority.
The difference between the resource limit value and the actual niceness
values may seem confusing, but apparently it's unavoidable: the Single Unix
Standard specifies that resource limits must be unsigned values.
The other limit is RLIMIT_RTPRIO; it can have a range of
0..100. If it is nonzero, the process is empowered to use the
realtime scheduling classes up to the indicated priority.
The problem with this approach, from the point of view of the audio
community, is that it is not currently supported by any distribution. It
is easy to set up PAM to give expanded limits to specific users or groups -
once PAM has been patched to understand the new limits. Shells, too, must
be patched before their ulimit commands can be used to change the
limits. So it will be some time before an "out of the box" Linux system
will be able to take advantage of this new capability.
In the long term, however, the rlimit patch looks like a minimally invasive
way of making realtime scheduling available, in a relatively safe way, to
ordinary users. Anybody wanting to play with the new mechanism before
their distribution catches up can find instructions and patches on this web page.
Comments (3 posted)
The read-copy-update mechanism works with the fundamental assumption that,
if no pointer to an RCU-protected data structure exists, there will be no
references to that structure after every processor on the system has
scheduled at least once. This assumption works because the rules require
that accesses to RCU-protected data structures be atomic; scheduling while
holding such a reference is not legal. When RCU was added to the kernel,
it brought with it a function called
synchronize_kernel() which
would wait for every processor to schedule. Since it seemed that this
capability could be useful outside of RCU itself,
synchronize_kernel() was exported to the world.
A quick grep of the 2.6.12-rc kernel shows a fair number of
synchronize_kernel() calls. The module loader uses it to let
things calm down when an attempted load fails. The AT keyboard driver
calls it at disconnect time to ensure that no processor is still trying to
work with the device. The kernel profiling code uses
synchronize_kernel() to ensure that all processors notice the
unregistration of its timer hook. And so on.
The external uses of synchronize_kernel() have reached a point
where they are putting extra demands on the RCU code. RCU, after all, does
not really have to wait until every processor has scheduled; the
important constraint, instead, is that every processor running within
rcu_read_lock() exits from the critical section. This distinction
has become more important as the kernel developers have sought ways to make
RCU more compatible with the low-latency work.
So, as of 2.6.12-rc4, synchronize_kernel() will be officially
deprecated. Its replacements will be synchronize_sched(), which
retains the current "wait for all processors to schedule" semantics, and
synchronize_rcu(), which is only guaranteed to wait until any
processors executing within rcu_read_lock() critical sections have
exited those sections. Most external users probably need to be switched
over to synchronize_sched(). The comments suggest that a
synchronize_irq() variant is also envisioned, but it has not been
added as of this writing.
One other significant change: unlike synchronize_kernel(), the two
replacements are exported GPL-only.
Comments (none posted)
Standard wisdom says that the proper defense against fork bomb attacks
(where a simple script forks children until the system chokes under the
load) is to use resource limits. Put a cap on the number of processes
which can be created, and the problem goes away. In reality it's not quite
so simple; the limit can be softened by logging in multiple times. And, in
any case, some people feel that the system should not collapse when faced
with such an attack. A Linux system, it is said, should not be so easy to
bring down in its default configuration.
The last defense against fork bombs is typically the out-of-memory (OOM)
killer. As the system fills up with processes, it will eventually run out
of memory and, in its desperation, start looking for processes to kill.
The OOM killer has a set of heuristics which attempt to choose the "best"
process to kill. These rules help the system to avoid (sometimes) killing
processes which are vital to the continued operation of the system. They
are not particularly helpful in dealing with fork bombs, however.
Coywolf Qi Hunt has posted a patch which
tries to do a better job of defending against fork bombs in the OOM killer.
It works by
extending the task structure to keep better track of a process's
"biological" parent and children. These lists are maintained separately
from the regular process hierarchy pointers, and are not actually used
during normal system operation. They are, in other words, pure overhead
most of the time.
Things change, however, when an out-of-memory situation hits. When the OOM
killer starts up, it will select its first victim in the usual way. When a
second process is chosen for an untimely death, however, the new lists come
into play. For both the current and previous victim, the OOM killer will
traverse the "biological parent" pointers to create a path through the
process hierarchy. Using those paths, the code can select the "least
common ancestor," the lowest process which is an ancestor to both victims.
Then, rather than killing the second chosen victim directly, the OOM killer
goes after the ancestor - and all of its children. If the OOM situation
persists, the killer should be able to quickly work its way up the process
hierarchy until it finds (and eliminates) the process responsible for the
whole mess.
Coywolf has a set of test cases and a system he is willing to run them on;
for all but the nastiest of the three, the patched system was able to put
an end to the fork bomb attack without any ill effects beyond a temporary
slowdown. In the worst case, the system still recovered, but with some
collateral damage. The patch adds some significant overhead (one pointer
and two list_head structures) to each process in the system, so it
may encounter some resistance - most systems will pay that overhead, but
never actually need to run the OOM killer. But, for systems which are
exposed to that sort of attack, this patch could be a useful last line of
defense.
Comments (2 posted)
The 2.6.12-rc kernels include, among many other things, the long-awaited
return of the Philips web camera driver. This driver, remember, was
removed at the original author's request; that author (known as "Nemosoft
Unv") objected to the removal of a special-purpose hook which allowed a
non-free decompression module to be loaded into the kernel. After the
removal, Luc Saillard took over the driver, with the goal of getting it
back into the mainline. As part of that process, he reverse engineered the
image decompression code and included it in the GPL-licensed module. It
would appear that this episode has led to a good result: the Philips driver
is back, and more free than before.
Nemosoft has recently resurfaced, however,
to make the claim that things may not be quite as good as they seem.
According to Nemosoft, no real reverse engineering job was done. Instead:
In case you hadn't noticed, that code has been reverse compiled (I
would not even call it "reverse engineered"), and is simply
illegal. Maybe not in every country, but certainly in some. There
are still some intellectual property rights being violated here,
you know, and I'm surprised at the contempt you and Linux kernel
maintainers show in this regard for a few lines of the law.
Mr. Saillard has been silent on how he performed the reverse engineering
task. A look at the code (example -
pwc-kiara.c) is somewhat unenlightening - the decompression code
consists mostly of a set of tables filled with mysterious numbers. It is
hard to imagine how those tables could be created in any way other than
extracting them from the binary decompressor module.
If the code was truly decompiled and relicensed, there could be a copyright
issue here. On the other hand, the tables used for decompression will be
hard to protect if they are truly the only way to interpret images produced
by the camera. Alan Cox (who forwarded the PWC patches for merging) acknowledges that there could be an issue with
the decompression code, but he is not overly worried about it:
The legal position on reverse engineering is in general fairly
clear. What you describe might not be. If so then we need to find
someone who hasn't read the code to rewrite it from the algorithm
description of the current code. Shouldn't take more than a week.
Alan also points out an issue others have raised: by Nemosoft's admission,
the non-disclosure agreement which forced the decompression code to be
proprietary ran out some time ago. Nemosoft could thus resolve the
licensing issues by simply releasing the decompression code under a free
license.
Comments (3 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>