Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is 2.5.5. Linus's latest prepatch is 2.5.6-pre3; it contains a fair amount in the way of fixes and updates, but the most visible change will be the integration of the JFS journaling filesystem from IBM. Also included are ARM and X86-64 updates, USB updates, VFS updates, IDE driver reworking (see below), a parport update, and more.
Dave Jones's latest prepatch is 2.5.5-dj3. It is caught up to 2.5.6-pre2 and 2.4.19-pre2, and throws in several more fixes as well.
Guillaume Boissiere's 2.5 status summary for March 6 is available.
The current stable kernel release is 2.4.18. The current 2.4.19 prepatch from Marcelo is 2.4.19-pre2. It contains the struct page shrinkage patch, but otherwise confines itself to fixes and cleanups.
Alan Cox's current prepatch is 2.4.19-pre2-ac2; the most significant addition in that patch is Ingo Molnar's O(1) scheduler, which has been in the 2.5 series for some time. Also from Alan is 2.4.18-ac3, which adds a much smaller set of fixes to 2.4.18.
Other kernel trees which have been released in the past week include:
How clean should the IDE code get? A regular feature on the linux-kernel list over the last few weeks has been a series of "IDE cleanup" patches by Martin Dalecki. These patches have been aimed at making the IDE driver code easier to read, and at removing duplicated and unnecessary code. They have been, for the most part, uncontroversial, and Linus has merged most of them into his recent releases. (Of course, Andre Hedrick, the author of much of the IDE code, is not pleased with this work, but that's a story in its own right...)
Things changed a bit, however, with the posting of IDE cleanup 16 which, among other things, takes away direct access (via ioctl()) to the IDE taskfile commands. Martin didn't like providing the ability for userspace programs to talk to the drives directly in that manner, and he complained about the command parsing code that was there as part of that functionality. According to Martin, the taskfile ioctl has only been there since 2.5.3, and is used by nobody.
That reasoning ignores one important little fact: Andre's IDE patches have been around for some time, and have been extensively used despite the fact that they only now have found their way into a mainline kernel. There are users who have found reasons to employ the TASKFILE interface, and they are not pleased at its disappearance. To many, this change goes beyond a simple "cleanup."
Martin seems to have come to agree that some sort of taskfile access is necessary. That issue will thus probably come to a resolution, but there is still a larger question that needs answering. Martin appears to have performed a hostile takeover of the maintainership of the IDE code. Is he truly the IDE maintainer now, and how far does his mandate for change extend?
Protesting BitKeeper. The only surprise is how long it took for this to happen. A group of Ohio State students has posted a petition protesting the increasing use of BitKeeper by the kernel development community. In particular, the petitioners fear that the day will come when use of BitKeeper will be required to participate in the kernel development process.
The problem, of course, is that the BitKeeper license is not a free software license. The BitKeeper source (or a version of it, anyway) is available, and modifications and redistribution are allowed. But there are certain things that you can not do (in particular: disabling the "open logging" feature); thus the software is not free. (See LWN's 1999 BitKeeper coverage or Jack Moffitt's critique of the BitKeeper license for more information).
The response to the petition has ranged from weak to hostile. There are certainly kernel hackers who choose not to use BitKeeper as a result of its licensing, but few seem to be worried about their continued ability to contribute, and none feel the need to impose their decisions on others. BitKeeper seems to be winning converts in the kernel development community, and petitions are unlikely to change that.
Linux device number registration resumes. Back in May of 2001, Linus decreed that no more major device numbers would be handed out; his goal was to force the kernel developers to come up with a reasonable alternative to static numbers. Now John Cagle, who has taken over management of the Linux device list, has announced that device number registrations will resume - at least for kernels released by Marcelo Tosatti and Alan Cox (i.e. in the 2.4 series). Linus is presumably still not accepting new numbers for 2.5, so any numbers allocated now could well not show up in 2.6 until Linus passes it on to a new maintainer.
(See the May 17, 2001 LWN Kernel Page for coverage of the moratorium on new device numbers).
Delayed disk block allocation. When a Linux process writes data to a disk file, the kernel calls into the appropriate filesystem code to get disk space allocated for that data. This allocation happens even though the kernel could (and often does) decide to not actually write that data to disk for some time. The early allocation offers simplicity and reliability - it is nice to know where the data will eventually end up - and it has been good enough for the Linux kernel until now.
Early allocation is not ideal, however, for a few reasons. Foremost among those is that early allocation makes it hard for the filesystem code to optimize the layout of files on disk. The best performance is achieved when the blocks of a file are placed contiguously on the disk; they can then be read or written in a single, fast operation. If the filesystem allocates new blocks one at a time, however, contiguous placement can be hard to accomplish. In particular, if multiple processes are writing files in the same filesystem simultaneously, their data may end up being interleaved on the disk.
Another worthwhile consideration is that some files never get written to disk at all. Many applications create short-lived temporary files that are deleted before their blocks are ever committed to the drive. For such files, it is better to never bother with the allocation of blocks at all.
These concerns argue for delaying the allocation of blocks for files until it is absolutely necessary. A proper delayed allocation implementation should have a measurable impact on performance. That assertion has now been put to the test, as Andrew Morton has posted a patch implementing delayed allocation for the 2.5.6-pre kernels.
Delayed allocation, of course, requires cooperation from the filesystem code, since that is where the allocation actually occurs. It is important, after all, to know that the required disk blocks will be available when the system finally does get around to allocating them - applications want to know right away if their writes are not going to work. Andrew's patch thus extends the address_space_operations structure with a few new methods. When a process writes into a new file block, the kernel can call the reservepage method to tell the filesystem to set some space aside. Later on, the new writeback_mapping method can be called to commit blocks to disk, allocating the space at that time.
A fair amount of effort (and code) has gone into trying to handle those writebacks in an intelligent way. A set of tunable thresholds determine when (and to what extent) the kernel will go out of its way to write dirty pages to disk. At the lowest level, writebacks will start happening as a kernel background task. If the number of dirty pages reaches a substantial portion of the total, processes performing writes can be blocked while their pages are written out synchronously.
Much of the writeout is intended to happen in the background mode, however. To this end, the delayed allocation patch introduces yet another set of kernel threads, called "pdflush." The number of pdflush threads will go up as the amount of writeback work increases - their number is managed through a simple, apache-style pool scheme. The purpose of having multiple threads is to try to keep multiple disk devices busy, even if one is doing most of the work.
How well does the patch work? Randy Hron, kernel benchmarker extraordinaire, has compiled an extensive set of results. The bottom line: for disk operations, and heavy writes in particular, the delayed allocation patch increases performance by 20-25%. Probably worth the trouble, in other words. As one kernel hacker put it: "My only comment is: how fast can we get delalloc into 2.5.x for further testing and development?"
Other patches and updates released this week include:
Core kernel code:
Section Editor: Jonathan Corbet
March 7, 2002