Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is 2.4.0-test1. Prior to taking a three-week break, Linus announced the release of a new development kernel series: 2.4.0-test1. This kernel is essentially 2.3.99-pre10 with a couple of additional tweaks; see last week's LWN kernel page for a description of the new features.
Now that Linus is gone for a while, there won't be any official development kernel releases. So Alan Cox has taken up the mantle and has started putting out "ac" releases with no end of new stuff. The current version, as of this writing, is 2.4.0-test1-ac7. This release isn't really meant for general consumption; there are several problems that the kernel hackers are trying to track down.
The current stable kernel release is 2.2.15. The 2.2.16 prepatch is up to 2.2.16pre7, which is a release candidate version.
No NFS update in 2.2.16? The 2.2.16 prepatch contains a number of worthwhile fixes, but is missing any sort of NFS update. The 2.2.15 kernel is running an older version of NFS with a number of problems; it's not doing anything to help Linux's reputation for having a second-rate NFS implementation. Some kernel developers have expressed disappointment that the NFS updates have not gotten into 2.2.16.
The story seems to be this: the NFS fixes appear to be stable and well tested. But there are a few remaining questions, including whether their inclusion would require users to upgrade their userland utilities. Meanwhile there is pressure to get 2.2.16 out quickly - 2.2.15 has a number of problems, especially the memory management issues that have been discussed in this space over the last few weeks. So 2.2.16 ships without NFS updates. Maybe in 2.2.17...
ReiserFS in 2.4? Hans Reiser has submitted the ReiserFS filesystem for inclusion into 2.4, finally. Of course, he did so just as Linus left town, so no decisions are going to get made on that for a while. But the ReiserFS developers think they are ready, which is a step in the right direction.
Trouble with timers. Programming SMP systems can be a tricky business. See, for example, this posting from Andrew Morton regarding kernel timer races. Here is a bit of unpleasantness that, thankfully, was found before 2.4.0 came out. Fixing it is going to require some core API changes, though - not something one wants to do while trying to stabilize a new major release.
Kernel timers are simple in concept. When some part of the kernel wants to have something done at a specific point in the future, it sets up a timer. Once the timer expires, a handler function is called to take care of whatever needed to be done. There are all kinds of uses for timers - handling the timing aspects of network protocols, making sure a driver does not hang if a device fails to respond, etc. The <linux/timer.h> include file is invoked by over 400 source files in recent kernels.
The real purpose of a kernel timer, much of the time, is to put a bound on the time that the kernel will spend waiting for a specific event. Normally, that event happens before the timer expires; at that point the timer is no longer needed and can be deleted. And that's where the problem comes in: what if the timer expires - and the handler function is called - just before it is deleted?
When that happens, there is a timer function running that the main thread of control does not expect to be there. That thread and the timer function may well then conflict with each other. In the best case, this sort of race condition could produce erratic device behavior; in the worst it can corrupt and crash the kernel. The result is an obscure, "once in a million times" bug that is extremely difficult to track down.
The simple fix would seem to be to have the function which deletes kernel timers - del_timer() - simply wait until there are no handlers running for that timer. In fact, that's how things worked in the 2.1 days. But this "synchronous" behavior has its own problem: it can easily lead to kernel deadlocks if the routine deleting the timer holds locks that the handler needs. This problem was severe enough that the synchronous behavior was removed before 2.2.0 came out.
The real solution is going to require some changes to timer semantics and a detailed auditing of the almost 700 del_timer() calls. Mr. Morton has put together a plan and a patch which gets the process going. In the end, the del_timer() function will go away, having been replaced by a synchronous version (to be called only when deadlocks can't happen) and an asynchronous version (which requires that other arrangements be made to avoid race conditions). The work, hopefully, will be done by 2.4.0, though some residual problems may well remain in the more obscure drivers.
Other patches and updates released this week include:
Section Editor: Jonathan Corbet
June 1, 2000