Linux in the news
All in one big page
See also: last week's Kernel page.
The current development kernel release is 2.5.3, which was released on January 30 (changelog). The biggest change in the more recent prepatches has been the split of the massive (> 1MB) Configure.help file into multiple, smaller files spread out over the source tree. This change will make those files easier to maintain (it is hoped); in the mean time, however, it has broken a number of the configuration tools. Other changes include a large ReiserFS update and the inclusion of Nathan Scott's extended attribute patch, which paves the way for access control lists and other useful stuff in the future.
Dave Jones's latest is 2.5.2-dj7, which is caught up to 2.5.3-pre6 and 2.4.18-pre7. It adds a number of small fixes, and, of course, the input layer changes (which require some configuration changes - see last week's LWN kernel page).
Guillaume Boissiere's 2.5 status summary has been updated to reflect the current and near-future state of affairs.
The current stable kernel release is still 2.4.17; Marcelo has not released any new prepatches over the last week. Alan Cox has released 2.4.18-pre7-ac1, which he describes as "a standing still release;" it mostly just catches up to the -pre7 prepatch.
For those with more modest hardware, SnapGear has announced the release of a new uClinux kernel based on 2.4.17. Your processor may not have a memory management unit, but now you can run things like ext3 anyway.
Alternate kernel tree of the week: Marcus Grando has announced 2.4.18-pre7-mg1, which adds the reverse mapping VM patch and some netfilter fixes to the 2.4.18 prepatch.
ACPI followup. Andy Grover, Linux ACPI developer, took exception with the discussion of ACPI, and its problems, in last week's LWN kernel page. His note challenges the complaints that have been made against ACPI, and states:
My hope is, the more people gain familiarity of Linux's ACPI code by testing and helping in its development, the more we all can accept it on its merits, and start improving Linux's PnP and power management by using the improved functionality ACPI provides.
His note is worth a read. The simple fact is that ACPI is in our future, whether we like it or not, and we will have to deal with it. The concerns remain, however, and those will have to be dealt with too.
It all started, of course, with Rob Landley's 'modest proposal' calling for a "patch penguin" to help Linus manage patches from developers.
Okay everybody, this is getting rediculous. Patches FROM MAINTAINERS are getting dropped on the floor on a regular basis. This is burning out maintainers and is increasing the number of different kernel trees (not yet a major fork, but a lot of cracks and fragmentation are showing under the stress). Linus needs an integration lieutenant, and he needs one NOW.
Rob points out that there have been unofficial "patch penguins" in the past. Alan Cox filled that role through much of the 2.3 and 2.4 series, and Dave Jones is doing it in 2.5. In general, the "ac" or "dj" trees have indeed served as a useful staging area for patches on their way to Linus; Rob claims that there should be one such tree with some sort of official blessing from Linus.
The complaints are echoed by a number of developers who feel that their patches have been ignored for too long. Alan Cox goes far enough to suggest that Linus could find himself replaced: "Think gcc, think egcs. History is merely beginning to repeat itself."
Linus, for his part, feels that there is no real problem in how kernel development works. Adding a patch penguin would not help, since said penguin would scale no better than Linus does. The solution to dropped patches is to route them through the appropriate maintainers:
In short: don't try to come up with a "patch penguin". Instead try to help existing maintainers, or maybe help grow new ones. THAT is the way to scalability.
A number of high-profile kernel developers seem to agree with Linus that the system still works.
That is the core of the dispute. The more interesting part, perhaps, is what changes might result from the discussion. It appears that there might actually be a few:
Much of the coverage of this discussion has portrayed it as a major rift among kernel developers, with ominous overtones of an impending "fork" of the kernel project. The truth of the matter is that no large, collaborative project can continue to function without occasionally taking a look at how its processes work. Kernel development is certainly not without its challenges; with luck, this discussion will help bring about changes that will keep the kernel project sustainable into the future.
rmap, fork, and COW. Last week's discussion of the reverse mapping VM patch omitted a couple of important things that are worth a mention. First and easiest is the fact that the hashed page wait queues discussed as part of the rmap patch were actually implemented by William Lee Irwin. Credit where credit is due.
The discussion of the costs of the rmap patch concentrated on memory use, but (as Daniel Phillips pointed out) we overlooked one other important factor. When a child process is created with the fork() system call, one task that must be performed is the copying of the parent's page tables. When the rmap patch is applied, fork() must also copy all of the reverse mapping entries. The computational cost of this copying is not small; with rmap, the time required for a fork increases by a factor between 10% (for small applications) up to 400% for something large. A fast fork() implementation is important for overall system performance; a 400% increase is likely to be seen as unacceptible.
There is a fix in the works, however, as described by Daniel Phillips: copy-on-write page tables. The COW idea has the potential to speed up fork() with or without rmap; it can also lead the way to other interesting page table optimizations in the future.
Under the COW scheme, a call to fork() does not result in the copying of the parent process's page tables. Instead, the tables are marked read-only, and their reference count is increased. Both processes then go off and execute with the (now shared) page tables. When either process makes a write access, it will be trapped with a page fault. At that point, the kernel copies the relevant page table (as well as the page being written to) and decreases its reference count. The process, which now has its own page table, is then allowed to continue with its write operation.
Forks become very fast, since page tables are no longer copied at that time. If a process eventually accesses much of its memory, those copies will happen, but they will be more evenly spread out over the life of the process. The usual pattern, however, is for a fork() call to be quickly followed by an exec() call, which wipes out the page tables entirely. In this case, the overhead of copying most of the page tables is avoided altogether.
So COW page tables are a win even in the absence of the rmap patch, and a bigger win when reverse mapping is used. The patch (which has not yet been released) is perhaps even more significant, however, in that it creates the first structure in the Linux kernel for the sharing of page tables. Linux processes can share mappings of memory or files (i.e. shared libraries), but they each have their own page tables for that shared memory. Private page tables are easier to manage, but there are some inefficiencies that result.
Example: most Linux processes have a shared mapping of the C library which occupies just over 1MB of address space (on the author's Debian 'sid' system). This mapping requires almost 300 page table entries (on an i386 system) for every process - and all of them live in unswappable kernel memory. KDE and GNOME applications tend to have many such library mappings, many of which are substantially larger. There would be a real performance advantage in being able to share the page tables for these mappings. The initial COW patch will probably not include support for sharing page tables in this manner, but it is a step in the right direction.
Much of this is speculative, however, until the COW page table patch is posted and benchmarked. If it works as expected, and frees the rmap patch of its fork() penalty, the whole mess may well make its way into the 2.5 series. As Linus told Rik van Riel:
You may not believe me when I say so, but I personally _really_ hope your rmap patches will work out. I may not have believed in your patches in a 2.4.x kind of timeframe, but for 2.6.x I'm more optimistic.
If we're really lucky, the 2.6 (or, perhaps, 3.0?) kernel will have a top-quality VM implementation before it's released.
Asynchronous I/O patch writup. Writing up Ben LaHaise's asynchronous I/O patch has been on the "todo" list for this page for some time. It is an interesting patch; it provides capabilities that some users seem to really need, but it also makes some fundamental changes to the I/O subsystem. We may still take a shot at the AIO patch, but, for now, Suparna Bhattacharya has beat us to it. Have a look for a thorough, detailed look at the patch and the reasons for it existence.
Other patches and updates. This section has gotten steadily longer over the years; we're experimenting a bit with its formatting in an attempt to make it more readable.
Core kernel code:
Section Editor: Jonathan Corbet
January 31, 2002