Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is 2.3.40. The changes in this release are as described last week, with the addition of drivers for Moxa serial cards.
There is a 2.3.41 prepatch available (in its third revision as of this writing). It contains a bunch of Sparc fixes, a driver for 3ware storage controllers, some SCSI code reorganization, an IBM USB camera driver (and many other USB changes), and a large number of networking tweaks.
The current stable kernel release remains 2.2.14. The 2.2.15 prepatch is up to 2.2.15pre4.
Messing with the scheduler has been a topic of discussion as a result of the now-famous IBM paper on the scheduling of Java threads in Linux. The authors of the paper had found that, when large number of threads are contending for the CPU, the Linux kernel spends a great deal of time (up to 20%) in the scheduler.
Two reasons were found for the problem. The first has to do with the ordering of the fields in the task_struct structure, which describes processes in the kernel. By rearranging the fields in this (large) structure, the IBM folks were able to obtain improved cache behavior in the scheduler, and thus improve its performance. This patch is relatively straightforward, and was incorporated into kernel 2.3.39.
The other problem is that the scheduler, at every switch, goes through the entire queue of runnable processes and calculates a "goodness" value for each one. The "goodest" process then gets to run. When the run queue is short (as is usually the case), the cost of this calculation is small. When the queue is long, however, it gets to be significant.
Leading the "fix the scheduler" charge is Davide Libenzi, who has posted a patch which keeps processes in the run queue clustered by their "goodness" value. When the run queue is organized in this way, it is no longer necessary to pass through the entire queue to pick the next process to run. The result is better performance under high loads.
There is, however, very little consensus on whether this optimization is necessary or desirable. The fear that most people have is that, by optimizing the scheduler for high loads, the patch will make life worse in the low-load case. Since low loads are the usual condition for most systems out there, most users would end up being worse off.
Even the question of whether the high-load case is worth optimizing for is controversial. Numerous people make the point that large numbers of threads lead to poor cache usage and poor performance in general. No amount of scheduler tweaking can make up for bad cache behavior. The claim has been made that it is always much better to rewrite the application in a non-threaded mode; the best performance will be achieved in this way, and there is no need to mess with the scheduler.
The real point here is just how expensive cache misses really are. A single cache miss can stall the processor for dozens of clock cycles. That cost is so high that it can easily outweigh any advantages gained through additional parallelism in a multi-threaded application - even on multiprocessor systems. As long as memory speeds lag processor speeds, improving performance by splitting tasks across threads will be hard to do.
The challenge has been thrown to proponents of highly-threaded applications to recode their programs in a single-threaded mode. The challengers believe that the recoded version will perform better; if not, it will be time to revisit the scheduler question. Until such a time, it's unlikely that any scheduler changes will get into the kernel.
Another file_operations change? Abramo Bagnara of the ALSA Project has posted a proposal for an interface change which would add "readv" and "writev" methods to the file_operations structure. Since this structure is central to the Linux device driver interface, such a change can have widespread implications. The block device changes have already changed file_operations once in this time of alleged feature freeze; is it really appropriate to change the kernel API again at this late date?
The answer might just be "yes." The readv and writev system calls are used for "scatter/gather" I/O, where the data moves from or to multiple distinct areas of memory. The Linux kernel, thus far, simply turns each segment of a readv or writev operation into a separate read or write for the underlying device driver or file system (except for sockets, which can already handle readv and writev).
It turns out that some devices - sound cards, in particular - do not work well this way. The I/O requirements of some of these devices can be quite complicated, and problems can arise when I/O operations are split apart before the driver sees them. If, instead, readv and writev are passed straight through to the driver, reliable, low-latency audio becomes much easier to implement.
The change is not difficult, and can be done in a way that does not require changes in any other drivers. Only those which can make use of the new operations would need to be modified, and that can happen whenever the driver maintainer gets around to it. Given the benefits and relatively low risk, this change might go in even this late in the development cycle.
Other patches and updates released this week include:
Section Editor: Jonathan Corbet
January 27, 2000