The current 2.6 prepatch is 2.6.10-rc1, which was
released by Linus on October 22.
Changes from 2.6.9 include a big USB update, the kernel events notification
mechanism, the switchable I/O schedulers patch (and a new version of the
CFQ scheduler), an NTFS update, in-kernel keyring management, an IRQ
subsystem code rework, version 17 of the wireless extension API, the BSD
secure levels module, an NFSv4 update, some scheduler tweaks, DVD+RW and
CDRW packet writing support, lots of networking changes, and a number of
architecture updates. Internal API changes include a new
atomic_inc_return() function, changing most of the core device
model functions to be exported GPL-only, the removal of the "BIO walking"
helper functions, changing
remap_page_range() to
remap_pfn_range(), and a new generic
circular buffer type (covered in
last week's
Kernel Page). See
the long-format
changelog for the details.
Linus's BitKeeper repository contains an x86 signal delivery optimization,
an IDE update, I/O space write barrier
support, a frame buffer driver update, more scheduler tweaks, some big
kernel lock preemption patches, an IDE update, various architecture
updates, and lots of fixes.
The current tree from Andrew Morton is 2.6.10-rc1-mm1. Recent changes to -mm include
a massive cleanup of (deprecated) MODULE_PARM() calls, a
configuration option for dnotify (in anticipation of adding inotify), an
ext3 reservation update, and more fixes. The size of -mm has dropped
considerably since many patches have found their way into the mainline.
The current 2.4 prepatch is 2.4.28-rc1, announced by Marcelo on October 22.
A relatively small set of fixes has been added since -pre4.
Comments (none posted)
Kernel development news
I want Linux development to be fluid, and I think the best way to
reach that goal is to make people _think_ of it as being fluid.
It's the old "perception changes reality" thing. It's really
true. How you think about something quite heavily influences what
you do.
Wow. That was deep. Time to go watch TV again.
--
Linus Torvalds.
This kernel is probably pretty crappy - there is a _lot_ of stuff
happening and the quality of the patches which I am receiving seems
to be gradually dropping off.
-- Andrew Morton lowering expectations for
2.6.10-rc1-mm1.
TCCBOOT is a boot loader able to compile and boot a Linux kernel
directly from its source code. TCCBOOT is only 138 KB big
(uncompressed code) and it can compile and run a typical Linux
kernel in less than 15 seconds on a 2.4 GHz Pentium 4.
TCCBOOT, for
the ultimate source-based distribution.
Comments (5 posted)
The efforts to bring hard realtime response to Linux continue. For those
of you following along at home, here is a summary of the latest
realtime Linux developments.
Ingo Molnar continues to crank out patches at a high rate. The latest,
-RT-2.6.9-mm1-V0 -RT-2.6.9-mm1-V0.3, is advertised as being
rather more experimental than its predecessors - which is saying
something. This patch set brings preemptible mutexes to (almost) the last,
most difficult parts of the kernel, including the low-level memory
allocators, wait queue code, kernel timers, and more. Says Ingo:
this is probably the last 'big leap forward' in terms of the scope
of the patch. (having reached the ultimate scope: it now
encompasses everything ;)
Some small pieces of this work have been put forward as independent
patches; these include the enhancements to the completion interface
mentioned last week. Linus has also made a
couple of changes to the big kernel lock code in support of this sort of
work: the BKL functions are now entirely out-of-line, and some of the code
for managing the BKL itself has been made preemptible.
Ingo's patch also changes a number of semaphores in the kernel over to
completions. For situations where one kernel thread needs to notify
another that some task has been finished, completions are a better
interface: they make the intent of the code clear, and they are better
optimized for that use. Some of those patches have been posted separately
as well, leading to some pushback from kernel developers who believe that
their use of semaphores for that purpose is entirely legitimate. Bill
Huey, the developer behind the mmlinux realtime project, is the person who
has been pushing some of those patches; he responded to the resistance in this way:
Well, this is something that's got to be considered by the larger
Linux community and whether these conventions are to be kept or
removed. It's a larger issue than what can be address in Ingo's
preemption patch, but with inevitable need for something like this
in the kernel (hard RT) it's really unavoidable collision. IMO,
it's got to go, which is a nasty change.
This, of course, is just the sort of talk which will put many kernel
developers off the realtime patches entirely; some of Mr. Huey's subsequent
postings, being rather more inflammatory, did not help the situation
either. Ingo went into damage control mode
and smoothed things over, for now. If and when the realtime preemption
patch is put forward for inclusion, however, chances are that the
discussion could get heated indeed.
Paul McKenney, meanwhile, expressed a
discomfort with the realtime work which must certainly be
felt by many:
The problem is that the entire OS kernel must be modified to ensure
that all code paths are deterministic. It would be much better if
there was an evolutionary path to hard realtime.
His message included a patch intended to point toward such a path. This
patch, which assumes an SMP system, works by setting aside one CPU as a
purely realtime processor; it is not part of the regular scheduling
mechanism. Realtime processes may be assigned to that CPU by the system
administrator. If they mostly work in user mode, all is well; they have a
dedicated processor and need not worry about latency. As soon as a
realtime process invokes a system call, however, it goes into
non-deterministic mode and is booted out to one of the system's other
processors. In this way, the dedicated, real-time processor never gets
stuck waiting for a lock.
The downside, of course, is that, every now and then, it is actually nice
to be able to use system calls. Paul's idea was that each Linux system
call could be examined individually, and, if warranted, modified to be
entirely preemptible. When any particular system call reaches a state
where it is considered to be deterministic, it could be added to a list of
such calls, and realtime processes using it need not be shifted away from
the realtime processor. Over time, this list would grow to the point that
realtime tasks which do actual, interesting work could be run on the
mainline Linux kernel. In the mean time, there would be no need for a
major flag day where the entire kernel locking scheme is changed at once.
The real challenge with this approach would be to make I/O deterministic,
since realtime processes usually must act in response to external events.
That cannot be done until it is clear that all filesystems and device
drivers have been made entirely preemptible - and, at that point, much of
the system has been affected. Meanwhile, it turns out that the 2.6.9
kernel already has part of this mechanism: the new isolcpus= boot
parameter excludes one or more processors from regular scheduling. The
scheme for migrating realtime processes when they invoke a
non-deterministic system call is not present, however.
Comments (5 posted)
There has been an increase in complaints about the 2.6 development model
recently. Some observers are dismayed by the continued high rate of change
in 2.6, and have posted calls for the creation of a 2.7 branch and
restricting 2.6 to critical bug fixes only. Failure to separate
development and maintenance in this way, it is said, hurts the reputation
of the Linux kernel and subjects users to needless regressions.
The interesting thing with this discussion is that the people objecting to
the current development mode have not been able to point to much in the way
of specific problems that have resulted from it. A few specific bugs have
been listed, but most of those have been around for some time and cannot be
said to have resulted from recently-merged new features. The only
complaint which holds water might be this
one regarding the plight of some out-of-tree kernel development project
(PaX in particular). PaX, it seems, is stuck at 2.6.7 because its
developers have not yet been able to respond to subsequent changes in
internal interfaces.
This argument, of course, does not get very far with most kernel
developers. There is an increasing amount of pressure to get out-of-tree
projects to submit their code and become part of the mainline tree. Code
which is in the mainline gets fixed (usually) when internal interfaces
change, but only the original developers can maintain external code. So
the standard answer to this sort of complaint is "merge your patches."
Changes in the development model to accommodate out-of-tree projects are
unlikely.
Not every 2.6 kernel release has been 100% stable, but the same can be said
of previous stable kernel series as well. What is different this
time is that 2.6 has a great many new features and improvements which would
not have been merged under the older model. Many of those improvements
would, instead, have been backported by distributors and shipped as part of
the "stable" kernel anyway. Under the new scheme, those patches are merged
into the mainline, are debugged by everybody involved, and are available to
all users. It seems unlikely that most users truly wish to go back to the
old days, when distributors shipped highly divergent kernels with
(literally) hundreds of patches.
There are occasional requests for bugfix-only "point" releases for the
major 2.6 kernels. Rather than wait for 2.6.10, and take all of the other
changes which come with that kernel, some people wish for a 2.6.9.1 (and so
on) with just the important fixes. The standard response to that
request is that anybody can create and maintain such a tree. So far,
however, there has not been sufficient demand for this tree to motivate
somebody to actually do the work. (It should be noted, though, that Alan
Cox has restarted posting his "-ac" patches, which contain fixes that are,
in his opinion, important).
All of the above distracts from the real development model
discussion: what Linus should call his intermediate releases. There is a
steady stream of objections to the "-rc" scheme, since, in fact, very few
such kernels are actually release candidates. Linus pondered the issue,
but decided to call the first 2.6.10 prepatch 2.6.10-rc1 anyway:
And the fact is, I can't see the point. I'll just call it all
"-rcX", because I (very obviously) have no clue where the
cut-over-point from "pre" to "rc" is, or (even more painfully
obviously) where it will become the final next release. So to not
overtax my poor brain, I'll just call them all -rc releases, and
hope that developers see them as a sign that there's been stuff
merged, and we should start calming down and seeing to the merged
patches being stable soon enough.
So the -rc names will continue for the foreseeable future.
Comments (8 posted)
One of the longstanding wishlist items for the Linux kernel is a built-in
crash dump capability. Companies providing support for Linux, such as
vendors of "enterprise" distributions, want this capability for the help it
can provide in tracking down those obnoxious problems which only happen at
the customer's site. Numerous implementations exist, but none have made it
into the mainline kernel. Among the reasons for this is a lack of comfort
with the crash dump code itself. That code runs when the state of the
system is known to be compromised; people tend to worry that the kernel, in
that state, could do unpleasant things, like corrupting filesystems. Even
code which takes pains to never touch a disk is not entirely to be trusted
when the system is reeling from a panic.
The -mm tree contains an approach to crash dumps which may inspire a bit
more trust. The core idea is to get the failing kernel out of the way
entirely, as soon as possible, and to boot into a new kernel which can
handle the real crash dump tasks.
The mechanism used is the kexec system call,
which loads and boots directly into a new kernel. The original goal was
simply to speed up reboots by avoiding the BIOS and the whole set of
time-consuming boot-time rituals which it performs; it's the sort of
feature which appeals to kernel developers. Kexec patches have been
circulating for some time, though the call has yet to make its way into a
mainline kernel.
Using kexec to perform crash dumps requires some additional work and
advance planning. A separate kernel must be built to run when the crash
dump capability is desired. This kernel needs to be as small as possible,
and it must be specially configured to load into a portion of memory not
used by the primary kernel. This kernel is also set up so that it only
uses a small piece of the total system memory; it must be able to boot and
run without changing the primary kernel's memory.
When the system is running, kexec is used to preload the crash dump kernel
into its reserved portion of memory. If all goes well, it simply sits
there, wasting memory, and never gets run. That overhead is simply the
price one pays for running an enterprise-class kernel.
Should the system panic, however, the crash dump kernel has its day. The
primary kernel, once it decides that something has gone drastically wrong,
must first make a copy of the very bottom part of memory (it will get
stepped on in the booting process). Once that is done, kexec is invoked to
boot directly into the crash dump kernel. That kernel starts up, aware of
all memory in the system, but only using the small portion which was
reserved to it before. The result is a full, running Linux system with
complete access to the memory state of the failed kernel.
To help with debugging of kernel crashes, the crash dump kernel provides a
couple of mechanisms for inspecting the failed kernel's memory. The file
/proc/vmcore provides the old kernel's memory as an ELF-format
core dump; it can be looked at with gdb or another debugging
tool. If need be, a char device (/dev/oldmem) can also be set up;
it provides raw access to the old kernel's memory.
A developer might choose to work with the crash dump kernel and try to
track down the problem immediately. In most deployed situations, instead,
the crash dump kernel may be configured to simply copy the old kernel's
memory image to a disk file somewhere, then reboot back into the primary
system. The crash dump file can then be examined at leisure, perhaps by
remote support staff.
The end result of all this work should be a mechanism which can be used to
track down the cause of infrequent crashes at remote sites. That should be
good for the stability of the kernel as a whole - and the bottom line of
enterprise support companies. See Documentation/kdump.txt from the patch for
more information.
Comments (6 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>