Kernel development
Brief items
Kernel release status
The current stable 2.6 kernel is 2.6.11.8, released on April 29.The current 2.6 prepatch remains 2.6.12-rc3.
Linus's git repository contains a number of new "sparse" annotations, a CIFS update, various architecture updates, resource limits for niceness and realtime scheduling (see below), a new valid_signal() function (for testing signal numbers), a JFS update, some networking tweaks, and lots of fixes.
The current -mm tree is 2.6.12-rc3-mm2. Recent changes to -mm include a number of new git trees, a cpufreq update, a new /proc/zoneinfo file, some preparatory patches for Xen, and some ext3 latency reduction work.
Kernel development news
Quote of the week
A web interface to git
Further evidence that the the kernel source code management situation is slowly stabilizing: there is now a web interface to the kernel.org git repositories. Most people, perhaps, will be interested in Linus's tree, where the latest patches merged into the mainline can be viewed, but there are several developer trees available as well. (Thanks to Steven Cole).Audio latency - resource limits win
The long debate on how to provide preferential scheduling for audio applications would appear to have come to an end. The realtime Linux security module has not been merged; instead, the mainline now includes a version of the rlimit patch. This is not the outcome which was most favored by the audio development community, but it will still be useful for them.The patch creates two new resource limits. RLIMIT_NICE controls the maximum "niceness" that the process can set for itself in the normal timesharing scheduler. The limit has a range of 0..39, with 39 corresponding to an internal niceness value of -20 - the highest priority. The difference between the resource limit value and the actual niceness values may seem confusing, but apparently it's unavoidable: the Single Unix Standard specifies that resource limits must be unsigned values.
The other limit is RLIMIT_RTPRIO; it can have a range of 0..100. If it is nonzero, the process is empowered to use the realtime scheduling classes up to the indicated priority.
The problem with this approach, from the point of view of the audio community, is that it is not currently supported by any distribution. It is easy to set up PAM to give expanded limits to specific users or groups - once PAM has been patched to understand the new limits. Shells, too, must be patched before their ulimit commands can be used to change the limits. So it will be some time before an "out of the box" Linux system will be able to take advantage of this new capability.
In the long term, however, the rlimit patch looks like a minimally invasive way of making realtime scheduling available, in a relatively safe way, to ordinary users. Anybody wanting to play with the new mechanism before their distribution catches up can find instructions and patches on this web page.
API change: synchronize_kernel() deprecated
The read-copy-update mechanism works with the fundamental assumption that, if no pointer to an RCU-protected data structure exists, there will be no references to that structure after every processor on the system has scheduled at least once. This assumption works because the rules require that accesses to RCU-protected data structures be atomic; scheduling while holding such a reference is not legal. When RCU was added to the kernel, it brought with it a function called synchronize_kernel() which would wait for every processor to schedule. Since it seemed that this capability could be useful outside of RCU itself, synchronize_kernel() was exported to the world.A quick grep of the 2.6.12-rc kernel shows a fair number of synchronize_kernel() calls. The module loader uses it to let things calm down when an attempted load fails. The AT keyboard driver calls it at disconnect time to ensure that no processor is still trying to work with the device. The kernel profiling code uses synchronize_kernel() to ensure that all processors notice the unregistration of its timer hook. And so on.
The external uses of synchronize_kernel() have reached a point where they are putting extra demands on the RCU code. RCU, after all, does not really have to wait until every processor has scheduled; the important constraint, instead, is that every processor running within rcu_read_lock() exits from the critical section. This distinction has become more important as the kernel developers have sought ways to make RCU more compatible with the low-latency work.
So, as of 2.6.12-rc4, synchronize_kernel() will be officially deprecated. Its replacements will be synchronize_sched(), which retains the current "wait for all processors to schedule" semantics, and synchronize_rcu(), which is only guaranteed to wait until any processors executing within rcu_read_lock() critical sections have exited those sections. Most external users probably need to be switched over to synchronize_sched(). The comments suggest that a synchronize_irq() variant is also envisioned, but it has not been added as of this writing.
One other significant change: unlike synchronize_kernel(), the two replacements are exported GPL-only.
Defending against fork bombs
Standard wisdom says that the proper defense against fork bomb attacks (where a simple script forks children until the system chokes under the load) is to use resource limits. Put a cap on the number of processes which can be created, and the problem goes away. In reality it's not quite so simple; the limit can be softened by logging in multiple times. And, in any case, some people feel that the system should not collapse when faced with such an attack. A Linux system, it is said, should not be so easy to bring down in its default configuration.The last defense against fork bombs is typically the out-of-memory (OOM) killer. As the system fills up with processes, it will eventually run out of memory and, in its desperation, start looking for processes to kill. The OOM killer has a set of heuristics which attempt to choose the "best" process to kill. These rules help the system to avoid (sometimes) killing processes which are vital to the continued operation of the system. They are not particularly helpful in dealing with fork bombs, however.
Coywolf Qi Hunt has posted a patch which tries to do a better job of defending against fork bombs in the OOM killer. It works by extending the task structure to keep better track of a process's "biological" parent and children. These lists are maintained separately from the regular process hierarchy pointers, and are not actually used during normal system operation. They are, in other words, pure overhead most of the time.
Things change, however, when an out-of-memory situation hits. When the OOM killer starts up, it will select its first victim in the usual way. When a second process is chosen for an untimely death, however, the new lists come into play. For both the current and previous victim, the OOM killer will traverse the "biological parent" pointers to create a path through the process hierarchy. Using those paths, the code can select the "least common ancestor," the lowest process which is an ancestor to both victims. Then, rather than killing the second chosen victim directly, the OOM killer goes after the ancestor - and all of its children. If the OOM situation persists, the killer should be able to quickly work its way up the process hierarchy until it finds (and eliminates) the process responsible for the whole mess.
Coywolf has a set of test cases and a system he is willing to run them on; for all but the nastiest of the three, the patched system was able to put an end to the fork bomb attack without any ill effects beyond a temporary slowdown. In the worst case, the system still recovered, but with some collateral damage. The patch adds some significant overhead (one pointer and two list_head structures) to each process in the system, so it may encounter some resistance - most systems will pay that overhead, but never actually need to run the OOM killer. But, for systems which are exposed to that sort of attack, this patch could be a useful last line of defense.
The Philips webcam driver - again
The 2.6.12-rc kernels include, among many other things, the long-awaited return of the Philips web camera driver. This driver, remember, was removed at the original author's request; that author (known as "Nemosoft Unv") objected to the removal of a special-purpose hook which allowed a non-free decompression module to be loaded into the kernel. After the removal, Luc Saillard took over the driver, with the goal of getting it back into the mainline. As part of that process, he reverse engineered the image decompression code and included it in the GPL-licensed module. It would appear that this episode has led to a good result: the Philips driver is back, and more free than before.Nemosoft has recently resurfaced, however, to make the claim that things may not be quite as good as they seem. According to Nemosoft, no real reverse engineering job was done. Instead:
Mr. Saillard has been silent on how he performed the reverse engineering task. A look at the code (example - pwc-kiara.c) is somewhat unenlightening - the decompression code consists mostly of a set of tables filled with mysterious numbers. It is hard to imagine how those tables could be created in any way other than extracting them from the binary decompressor module.
If the code was truly decompiled and relicensed, there could be a copyright issue here. On the other hand, the tables used for decompression will be hard to protect if they are truly the only way to interpret images produced by the camera. Alan Cox (who forwarded the PWC patches for merging) acknowledges that there could be an issue with the decompression code, but he is not overly worried about it:
Alan also points out an issue others have raised: by Nemosoft's admission, the non-disclosure agreement which forced the decompression code to be proprietary ran out some time ago. Nemosoft could thus resolve the licensing issues by simply releasing the decompression code under a free license.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>