Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is still 2.5.1; Linus's prepatch series is now up to 2.5.2-pre10. Many of the recent prepatches were dominated by kdev_t fixes, but that task now appears to be mostly complete. So 2.5.2-pre10 includes buffer cache changes from Al Viro, more USB updates, a large ARM update, and, incidentally, a new scheduler (see below).
As an aside, the kdev_t change is a classic example of how Linus manages kernel development. A proprietary software architect would, after deciding to proceed with this sort of data structure change, instruct one or more lower-level engineers to go through and make the new structure work everywhere. Linus, instead, just fixes the stuff that he uses personally, and assumes that people will come in with patches to deal with the rest. And that is, generally, exactly what happens. And if nobody bothers to fix a certain piece of code, that, too, is useful: it implies that nobody is using that subsystem, and it can be considered for removal.
The current "dj" patch from Dave Jones is 2.5.1-dj13; it is synchronized with 2.5.2-pre9, and includes some additional fixes.
The current stable kernel release is 2.4.17. Marcelo has released 2.4.18-pre2 as the next step toward the 2.4.18 stable release.
Those working with ancient kernels may be interested in the first 2.0.40 release candidate from David Weinehall.
Other kernel trees:
A new scheduler goes into 2.5.2. One can assume that the worst of the block I/O and kdev_t problems in the 2.5.2-pre series have been dealt with: Linus has decided to stir things up again by replacing the scheduler outright. To make things even more fun, the new code is not one of the scheduler implementations that has been floating around for a while; instead it is a completely new development from Ingo Molnar.
Ingo included a detailed description of how the new code works. The following discussion is drawn mostly from that posting; those interested in the details may well want to go to the original source.
The new scheduler starts, like most of the proposed alternatives, by creating per-CPU run queues. As the number of processors increases, the overhead caused the current global run queue becomes intolerable. A separate run queue for each processor eliminates the need for a global spinlock, and avoids a great deal of cache line contention between processors.
The current Linux scheduler works by passing over every runnable process and calculating a "goodness" value; the process with the highest goodness is the one that gets to run. As the number of runnable processes increases, the goodness calculation gets more expensive. So the new scheduler does away with all of that. It implements, instead, an array of run queues, sorted by priority. When it is time to select a new process to run, the scheduler need only consult a bitmap to find the runnable process with the highest priority and pick it off the appropriate queue. The selection process is thus O(1): it does not take more time as the number of processes grows.
The code is actually a little bit more complicated than that, in that there are actually two run queue arrays. Once a process exhausts its time slice, it is moved from the "active" to the "expired" array, where the priority sorting is retained. When all processes have used up their time slices, the expired and active arrays are switched and the process begins anew.
The per-CPU run queue arrays also have the effect of keeping processes on the same processor. Efficient use of memory caches requires this behavior; the old scheduler also tried to enforce CPU affinity, but not always very effectively. With the new scheduler, instead, a different problem could arise: all of the runnable processes could find themselves contending for a single CPU, while other CPUs sit idle. To avoid this situation, the scheduler will occasionally move processes between processors if the run queues get too far out of balance.
Real-time processes are handled a little differently, and in a way that has drawn a few complaints. There is a single, global run queue for real-time processes. When such a process becomes runnable, all processors are interrupted and race to see who can execute the process first. The idea here is that real-time processes require the minimum possible latency, and there is no way to know in advance which processor will be able to drop what it's doing to run a real-time process. By setting all processors on the task, the new scheduler, through brute force, identifies the processor which can respond most quickly each time. But there will be a cost paid by all other processes on the system, which will find themselves interrupted (briefly, usually) every time a real-time process wants to run.
There are a few other features to the new code. One is that it restored the "run a newly forked child before the parent" behavior that was briefly tried in the 2.4 series. That change has been backed out in more recent versions of the patch, however. The new scheduler also tracks CPU usage over time (a few seconds worth) in an attempt to identify the real CPU hogs on the system. Those processes will find their priorities decreased somewhat so that interactive tasks can run first.
Ingo's explanation also includes a number of benchmark results, all of which show improved performance with the new scheduler. The most impressive, perhaps, is the one that tests pure context switching behavior. With this benchmark, the 2.5.2-pre6 scheduler could do just over 240,000 context switches per second on a two-processor system, and 108,000 on an eight-processor system. In other words, the eight-processor system ran slower due to the increased global run queue and lock contention. With the new scheduler, the results were 977,000 switches per second (two processors) and 6,117,000 per second (eight processors). Not bad.
For another set of benchmarks, see this posting from Mike Kravetz. These results show that the new scheduler could still use some work in places.
As of this writing, Ingo's latest patch is version E1, which is quite small (since most of the patch is already in 2.5.2-pre10). Early versions of the patch generated a fair number of bug reports, but those seem to have slowed down. It's hard to say what will happen in latter 2.5 releases, but the scheduler wars may be over, for now.
ALSA gets ready. Advanced Linux Sound Architecture founder and maintainer Jaroslav Kysela has posted a patch containing the latest ALSA code, ready for inclusion into the 2.5.2-pre9 kernel. Among other things, this version of the subsystem shows the beginning of an effort to strip out the 2.2 and 2.4 compatibility code.
The thing that drew everybody's attention, however, was not the ALSA code itself, but its new directory organization. The patch creates a new top-level sound directory in the kernel tree; many developers had expected to see ALSA in the drivers/sound directory instead. The thinking is that ALSA includes much more that just drivers; there is a whole set of higher-level functions as well. Linus compares the ALSA code to the networking subsystem; network drivers are part of it, but there is much more involved.
The final organization is, in fact, likely to mirror how the networking code is handled. The sound directory will contain most of the ALSA code, but the low-level drivers will stay in drivers/sound. There is still no word on when ALSA will actually be merged, and there probably won't be until it shows up in a prepatch.
How to improve latency? The question of latency - the time it takes the kernel to respond to events - is back on the agenda. There seems to be a consensus that the 2.4 kernel is still too unresponsive for many needs. People working with streaming media tend to be the first to complain, but they are not the only ones.
There are two approaches out there for addressing the latency problems:
The main objection to the preemptible kernel patch is that it breaks a long-standing fundamental assumption: that code running in kernel space will not be preempted until it explicitly gives up the processor. This could lead to no end of weird, difficult to debug problems. But the truth of the matter is that a preemptible kernel does not add any complexity not already introduced by SMP systems. Preemptability appears to have a good chance of winning in the end, but that, of course, is a call for Linus, and he has been silent on the topic.
Other patches and updates released this week include:
Section Editor: Jonathan Corbet
January 10, 2002