Linus's BitKeeper repository contains an x86 signal delivery optimization, an IDE update, I/O space write barrier support, a frame buffer driver update, more scheduler tweaks, some big kernel lock preemption patches, an IDE update, various architecture updates, and lots of fixes.
The current tree from Andrew Morton is 2.6.10-rc1-mm1. Recent changes to -mm include a massive cleanup of (deprecated) MODULE_PARM() calls, a configuration option for dnotify (in anticipation of adding inotify), an ext3 reservation update, and more fixes. The size of -mm has dropped considerably since many patches have found their way into the mainline.
The current 2.4 prepatch is 2.4.28-rc1, announced by Marcelo on October 22. A relatively small set of fixes has been added since -pre4.
Kernel development news
Wow. That was deep. Time to go watch TV again.
-- Andrew Morton lowering expectations for 2.6.10-rc1-mm1.
TCCBOOT, for the ultimate source-based distribution.
Ingo Molnar continues to crank out patches at a high rate. The latest,
-RT-2.6.9-mm1-V0 -RT-2.6.9-mm1-V0.3, is advertised as being
rather more experimental than its predecessors - which is saying
something. This patch set brings preemptible mutexes to (almost) the last,
most difficult parts of the kernel, including the low-level memory
allocators, wait queue code, kernel timers, and more. Says Ingo:
Some small pieces of this work have been put forward as independent patches; these include the enhancements to the completion interface mentioned last week. Linus has also made a couple of changes to the big kernel lock code in support of this sort of work: the BKL functions are now entirely out-of-line, and some of the code for managing the BKL itself has been made preemptible.
Ingo's patch also changes a number of semaphores in the kernel over to completions. For situations where one kernel thread needs to notify another that some task has been finished, completions are a better interface: they make the intent of the code clear, and they are better optimized for that use. Some of those patches have been posted separately as well, leading to some pushback from kernel developers who believe that their use of semaphores for that purpose is entirely legitimate. Bill Huey, the developer behind the mmlinux realtime project, is the person who has been pushing some of those patches; he responded to the resistance in this way:
This, of course, is just the sort of talk which will put many kernel developers off the realtime patches entirely; some of Mr. Huey's subsequent postings, being rather more inflammatory, did not help the situation either. Ingo went into damage control mode and smoothed things over, for now. If and when the realtime preemption patch is put forward for inclusion, however, chances are that the discussion could get heated indeed.
Paul McKenney, meanwhile, expressed a discomfort with the realtime work which must certainly be felt by many:
His message included a patch intended to point toward such a path. This patch, which assumes an SMP system, works by setting aside one CPU as a purely realtime processor; it is not part of the regular scheduling mechanism. Realtime processes may be assigned to that CPU by the system administrator. If they mostly work in user mode, all is well; they have a dedicated processor and need not worry about latency. As soon as a realtime process invokes a system call, however, it goes into non-deterministic mode and is booted out to one of the system's other processors. In this way, the dedicated, real-time processor never gets stuck waiting for a lock.
The downside, of course, is that, every now and then, it is actually nice to be able to use system calls. Paul's idea was that each Linux system call could be examined individually, and, if warranted, modified to be entirely preemptible. When any particular system call reaches a state where it is considered to be deterministic, it could be added to a list of such calls, and realtime processes using it need not be shifted away from the realtime processor. Over time, this list would grow to the point that realtime tasks which do actual, interesting work could be run on the mainline Linux kernel. In the mean time, there would be no need for a major flag day where the entire kernel locking scheme is changed at once.
The real challenge with this approach would be to make I/O deterministic, since realtime processes usually must act in response to external events. That cannot be done until it is clear that all filesystems and device drivers have been made entirely preemptible - and, at that point, much of the system has been affected. Meanwhile, it turns out that the 2.6.9 kernel already has part of this mechanism: the new isolcpus= boot parameter excludes one or more processors from regular scheduling. The scheme for migrating realtime processes when they invoke a non-deterministic system call is not present, however.
The interesting thing with this discussion is that the people objecting to the current development mode have not been able to point to much in the way of specific problems that have resulted from it. A few specific bugs have been listed, but most of those have been around for some time and cannot be said to have resulted from recently-merged new features. The only complaint which holds water might be this one regarding the plight of some out-of-tree kernel development project (PaX in particular). PaX, it seems, is stuck at 2.6.7 because its developers have not yet been able to respond to subsequent changes in internal interfaces.
This argument, of course, does not get very far with most kernel developers. There is an increasing amount of pressure to get out-of-tree projects to submit their code and become part of the mainline tree. Code which is in the mainline gets fixed (usually) when internal interfaces change, but only the original developers can maintain external code. So the standard answer to this sort of complaint is "merge your patches." Changes in the development model to accommodate out-of-tree projects are unlikely.
Not every 2.6 kernel release has been 100% stable, but the same can be said of previous stable kernel series as well. What is different this time is that 2.6 has a great many new features and improvements which would not have been merged under the older model. Many of those improvements would, instead, have been backported by distributors and shipped as part of the "stable" kernel anyway. Under the new scheme, those patches are merged into the mainline, are debugged by everybody involved, and are available to all users. It seems unlikely that most users truly wish to go back to the old days, when distributors shipped highly divergent kernels with (literally) hundreds of patches.
There are occasional requests for bugfix-only "point" releases for the major 2.6 kernels. Rather than wait for 2.6.10, and take all of the other changes which come with that kernel, some people wish for a 220.127.116.11 (and so on) with just the important fixes. The standard response to that request is that anybody can create and maintain such a tree. So far, however, there has not been sufficient demand for this tree to motivate somebody to actually do the work. (It should be noted, though, that Alan Cox has restarted posting his "-ac" patches, which contain fixes that are, in his opinion, important).
All of the above distracts from the real development model discussion: what Linus should call his intermediate releases. There is a steady stream of objections to the "-rc" scheme, since, in fact, very few such kernels are actually release candidates. Linus pondered the issue, but decided to call the first 2.6.10 prepatch 2.6.10-rc1 anyway:
So the -rc names will continue for the foreseeable future.
The -mm tree contains an approach to crash dumps which may inspire a bit more trust. The core idea is to get the failing kernel out of the way entirely, as soon as possible, and to boot into a new kernel which can handle the real crash dump tasks.
The mechanism used is the kexec system call, which loads and boots directly into a new kernel. The original goal was simply to speed up reboots by avoiding the BIOS and the whole set of time-consuming boot-time rituals which it performs; it's the sort of feature which appeals to kernel developers. Kexec patches have been circulating for some time, though the call has yet to make its way into a mainline kernel.
Using kexec to perform crash dumps requires some additional work and advance planning. A separate kernel must be built to run when the crash dump capability is desired. This kernel needs to be as small as possible, and it must be specially configured to load into a portion of memory not used by the primary kernel. This kernel is also set up so that it only uses a small piece of the total system memory; it must be able to boot and run without changing the primary kernel's memory.
When the system is running, kexec is used to preload the crash dump kernel into its reserved portion of memory. If all goes well, it simply sits there, wasting memory, and never gets run. That overhead is simply the price one pays for running an enterprise-class kernel.
Should the system panic, however, the crash dump kernel has its day. The primary kernel, once it decides that something has gone drastically wrong, must first make a copy of the very bottom part of memory (it will get stepped on in the booting process). Once that is done, kexec is invoked to boot directly into the crash dump kernel. That kernel starts up, aware of all memory in the system, but only using the small portion which was reserved to it before. The result is a full, running Linux system with complete access to the memory state of the failed kernel.
To help with debugging of kernel crashes, the crash dump kernel provides a couple of mechanisms for inspecting the failed kernel's memory. The file /proc/vmcore provides the old kernel's memory as an ELF-format core dump; it can be looked at with gdb or another debugging tool. If need be, a char device (/dev/oldmem) can also be set up; it provides raw access to the old kernel's memory.
A developer might choose to work with the crash dump kernel and try to track down the problem immediately. In most deployed situations, instead, the crash dump kernel may be configured to simply copy the old kernel's memory image to a disk file somewhere, then reboot back into the primary system. The crash dump file can then be examined at leisure, perhaps by remote support staff.
The end result of all this work should be a mechanism which can be used to track down the cause of infrequent crashes at remote sites. That should be good for the stability of the kernel as a whole - and the bottom line of enterprise support companies. See Documentation/kdump.txt from the patch for more information.
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds