User: Password:
Subscribe / Log in / New account

Kernel development

Kernel release status

The current 2.6 prepatch is 2.6.10-rc1, which was released by Linus on October 22. Changes from 2.6.9 include a big USB update, the kernel events notification mechanism, the switchable I/O schedulers patch (and a new version of the CFQ scheduler), an NTFS update, in-kernel keyring management, an IRQ subsystem code rework, version 17 of the wireless extension API, the BSD secure levels module, an NFSv4 update, some scheduler tweaks, DVD+RW and CDRW packet writing support, lots of networking changes, and a number of architecture updates. Internal API changes include a new atomic_inc_return() function, changing most of the core device model functions to be exported GPL-only, the removal of the "BIO walking" helper functions, changing remap_page_range() to remap_pfn_range(), and a new generic circular buffer type (covered in last week's Kernel Page). See the long-format changelog for the details.

Linus's BitKeeper repository contains an x86 signal delivery optimization, an IDE update, I/O space write barrier support, a frame buffer driver update, more scheduler tweaks, some big kernel lock preemption patches, an IDE update, various architecture updates, and lots of fixes.

The current tree from Andrew Morton is 2.6.10-rc1-mm1. Recent changes to -mm include a massive cleanup of (deprecated) MODULE_PARM() calls, a configuration option for dnotify (in anticipation of adding inotify), an ext3 reservation update, and more fixes. The size of -mm has dropped considerably since many patches have found their way into the mainline.

The current 2.4 prepatch is 2.4.28-rc1, announced by Marcelo on October 22. A relatively small set of fixes has been added since -pre4.

Comments (none posted)

Kernel development news

Quotes of the week

I want Linux development to be fluid, and I think the best way to reach that goal is to make people _think_ of it as being fluid. It's the old "perception changes reality" thing. It's really true. How you think about something quite heavily influences what you do.

Wow. That was deep. Time to go watch TV again.

-- Linus Torvalds.

This kernel is probably pretty crappy - there is a _lot_ of stuff happening and the quality of the patches which I am receiving seems to be gradually dropping off.

-- Andrew Morton lowering expectations for 2.6.10-rc1-mm1.

TCCBOOT is a boot loader able to compile and boot a Linux kernel directly from its source code. TCCBOOT is only 138 KB big (uncompressed code) and it can compile and run a typical Linux kernel in less than 15 seconds on a 2.4 GHz Pentium 4.

TCCBOOT, for the ultimate source-based distribution.

Comments (5 posted)

The ongoing realtime story

The efforts to bring hard realtime response to Linux continue. For those of you following along at home, here is a summary of the latest realtime Linux developments.

Ingo Molnar continues to crank out patches at a high rate. The latest, -RT-2.6.9-mm1-V0 -RT-2.6.9-mm1-V0.3, is advertised as being rather more experimental than its predecessors - which is saying something. This patch set brings preemptible mutexes to (almost) the last, most difficult parts of the kernel, including the low-level memory allocators, wait queue code, kernel timers, and more. Says Ingo:

this is probably the last 'big leap forward' in terms of the scope of the patch. (having reached the ultimate scope: it now encompasses everything ;)

Some small pieces of this work have been put forward as independent patches; these include the enhancements to the completion interface mentioned last week. Linus has also made a couple of changes to the big kernel lock code in support of this sort of work: the BKL functions are now entirely out-of-line, and some of the code for managing the BKL itself has been made preemptible.

Ingo's patch also changes a number of semaphores in the kernel over to completions. For situations where one kernel thread needs to notify another that some task has been finished, completions are a better interface: they make the intent of the code clear, and they are better optimized for that use. Some of those patches have been posted separately as well, leading to some pushback from kernel developers who believe that their use of semaphores for that purpose is entirely legitimate. Bill Huey, the developer behind the mmlinux realtime project, is the person who has been pushing some of those patches; he responded to the resistance in this way:

Well, this is something that's got to be considered by the larger Linux community and whether these conventions are to be kept or removed. It's a larger issue than what can be address in Ingo's preemption patch, but with inevitable need for something like this in the kernel (hard RT) it's really unavoidable collision. IMO, it's got to go, which is a nasty change.

This, of course, is just the sort of talk which will put many kernel developers off the realtime patches entirely; some of Mr. Huey's subsequent postings, being rather more inflammatory, did not help the situation either. Ingo went into damage control mode and smoothed things over, for now. If and when the realtime preemption patch is put forward for inclusion, however, chances are that the discussion could get heated indeed.

Paul McKenney, meanwhile, expressed a discomfort with the realtime work which must certainly be felt by many:

The problem is that the entire OS kernel must be modified to ensure that all code paths are deterministic. It would be much better if there was an evolutionary path to hard realtime.

His message included a patch intended to point toward such a path. This patch, which assumes an SMP system, works by setting aside one CPU as a purely realtime processor; it is not part of the regular scheduling mechanism. Realtime processes may be assigned to that CPU by the system administrator. If they mostly work in user mode, all is well; they have a dedicated processor and need not worry about latency. As soon as a realtime process invokes a system call, however, it goes into non-deterministic mode and is booted out to one of the system's other processors. In this way, the dedicated, real-time processor never gets stuck waiting for a lock.

The downside, of course, is that, every now and then, it is actually nice to be able to use system calls. Paul's idea was that each Linux system call could be examined individually, and, if warranted, modified to be entirely preemptible. When any particular system call reaches a state where it is considered to be deterministic, it could be added to a list of such calls, and realtime processes using it need not be shifted away from the realtime processor. Over time, this list would grow to the point that realtime tasks which do actual, interesting work could be run on the mainline Linux kernel. In the mean time, there would be no need for a major flag day where the entire kernel locking scheme is changed at once.

The real challenge with this approach would be to make I/O deterministic, since realtime processes usually must act in response to external events. That cannot be done until it is clear that all filesystems and device drivers have been made entirely preemptible - and, at that point, much of the system has been affected. Meanwhile, it turns out that the 2.6.9 kernel already has part of this mechanism: the new isolcpus= boot parameter excludes one or more processors from regular scheduling. The scheme for migrating realtime processes when they invoke a non-deterministic system call is not present, however.

Comments (5 posted)

Some development model notes

There has been an increase in complaints about the 2.6 development model recently. Some observers are dismayed by the continued high rate of change in 2.6, and have posted calls for the creation of a 2.7 branch and restricting 2.6 to critical bug fixes only. Failure to separate development and maintenance in this way, it is said, hurts the reputation of the Linux kernel and subjects users to needless regressions.

The interesting thing with this discussion is that the people objecting to the current development mode have not been able to point to much in the way of specific problems that have resulted from it. A few specific bugs have been listed, but most of those have been around for some time and cannot be said to have resulted from recently-merged new features. The only complaint which holds water might be this one regarding the plight of some out-of-tree kernel development project (PaX in particular). PaX, it seems, is stuck at 2.6.7 because its developers have not yet been able to respond to subsequent changes in internal interfaces.

This argument, of course, does not get very far with most kernel developers. There is an increasing amount of pressure to get out-of-tree projects to submit their code and become part of the mainline tree. Code which is in the mainline gets fixed (usually) when internal interfaces change, but only the original developers can maintain external code. So the standard answer to this sort of complaint is "merge your patches." Changes in the development model to accommodate out-of-tree projects are unlikely.

Not every 2.6 kernel release has been 100% stable, but the same can be said of previous stable kernel series as well. What is different this time is that 2.6 has a great many new features and improvements which would not have been merged under the older model. Many of those improvements would, instead, have been backported by distributors and shipped as part of the "stable" kernel anyway. Under the new scheme, those patches are merged into the mainline, are debugged by everybody involved, and are available to all users. It seems unlikely that most users truly wish to go back to the old days, when distributors shipped highly divergent kernels with (literally) hundreds of patches.

There are occasional requests for bugfix-only "point" releases for the major 2.6 kernels. Rather than wait for 2.6.10, and take all of the other changes which come with that kernel, some people wish for a (and so on) with just the important fixes. The standard response to that request is that anybody can create and maintain such a tree. So far, however, there has not been sufficient demand for this tree to motivate somebody to actually do the work. (It should be noted, though, that Alan Cox has restarted posting his "-ac" patches, which contain fixes that are, in his opinion, important).

All of the above distracts from the real development model discussion: what Linus should call his intermediate releases. There is a steady stream of objections to the "-rc" scheme, since, in fact, very few such kernels are actually release candidates. Linus pondered the issue, but decided to call the first 2.6.10 prepatch 2.6.10-rc1 anyway:

And the fact is, I can't see the point. I'll just call it all "-rcX", because I (very obviously) have no clue where the cut-over-point from "pre" to "rc" is, or (even more painfully obviously) where it will become the final next release. So to not overtax my poor brain, I'll just call them all -rc releases, and hope that developers see them as a sign that there's been stuff merged, and we should start calming down and seeing to the merged patches being stable soon enough.

So the -rc names will continue for the foreseeable future.

Comments (8 posted)

Crash dumps with kexec

One of the longstanding wishlist items for the Linux kernel is a built-in crash dump capability. Companies providing support for Linux, such as vendors of "enterprise" distributions, want this capability for the help it can provide in tracking down those obnoxious problems which only happen at the customer's site. Numerous implementations exist, but none have made it into the mainline kernel. Among the reasons for this is a lack of comfort with the crash dump code itself. That code runs when the state of the system is known to be compromised; people tend to worry that the kernel, in that state, could do unpleasant things, like corrupting filesystems. Even code which takes pains to never touch a disk is not entirely to be trusted when the system is reeling from a panic.

The -mm tree contains an approach to crash dumps which may inspire a bit more trust. The core idea is to get the failing kernel out of the way entirely, as soon as possible, and to boot into a new kernel which can handle the real crash dump tasks.

The mechanism used is the kexec system call, which loads and boots directly into a new kernel. The original goal was simply to speed up reboots by avoiding the BIOS and the whole set of time-consuming boot-time rituals which it performs; it's the sort of feature which appeals to kernel developers. Kexec patches have been circulating for some time, though the call has yet to make its way into a mainline kernel.

Using kexec to perform crash dumps requires some additional work and advance planning. A separate kernel must be built to run when the crash dump capability is desired. This kernel needs to be as small as possible, and it must be specially configured to load into a portion of memory not used by the primary kernel. This kernel is also set up so that it only uses a small piece of the total system memory; it must be able to boot and run without changing the primary kernel's memory.

When the system is running, kexec is used to preload the crash dump kernel into its reserved portion of memory. If all goes well, it simply sits there, wasting memory, and never gets run. That overhead is simply the price one pays for running an enterprise-class kernel.

Should the system panic, however, the crash dump kernel has its day. The primary kernel, once it decides that something has gone drastically wrong, must first make a copy of the very bottom part of memory (it will get stepped on in the booting process). Once that is done, kexec is invoked to boot directly into the crash dump kernel. That kernel starts up, aware of all memory in the system, but only using the small portion which was reserved to it before. The result is a full, running Linux system with complete access to the memory state of the failed kernel.

To help with debugging of kernel crashes, the crash dump kernel provides a couple of mechanisms for inspecting the failed kernel's memory. The file /proc/vmcore provides the old kernel's memory as an ELF-format core dump; it can be looked at with gdb or another debugging tool. If need be, a char device (/dev/oldmem) can also be set up; it provides raw access to the old kernel's memory.

A developer might choose to work with the crash dump kernel and try to track down the problem immediately. In most deployed situations, instead, the crash dump kernel may be configured to simply copy the old kernel's memory image to a disk file somewhere, then reboot back into the primary system. The crash dump file can then be examined at leisure, perhaps by remote support staff.

The end result of all this work should be a mechanism which can be used to track down the cause of infrequent crashes at remote sites. That should be good for the stability of the kernel as a whole - and the bottom line of enterprise support companies. See Documentation/kdump.txt from the patch for more information.

Comments (6 posted)

Patches and updates

Kernel trees

  • Andrew Morton: 2.6.9-mm1. (October 22, 2004)


Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds