What to merge for 2.6.13?
[Posted June 22, 2005 by corbet]
Andrew Morton, looking forward to 2.6.13, has
posted a list of major patches
which, in his opinion, will (or will not) be merged soon. Reviewing the
list, along with the subsequent discussion, gives a good sense for what the
next 2.6 kernel might look like. Of course, the final product is still
likely to contain a few surprises.
Some of the decisions are not particularly controversial. Andrew is
likely to merge the OCFS2
filesystem, some Xen precursor patches, execute in place support,
software suspend support for SMP systems, some kernel timer performance
improvements, various KProbes updates, the RapidIO subsystem, some
scheduler tweaks, and some memory management work. Nobody has really
complained about the inclusion of any of these patches (yet), so their path into
the kernel might be relatively smooth.
One patch which has gotten surprising support is kexec, which was first
covered here in November,
2002. The ability to quickly boot a new kernel without going through
the system firmware is nice, but the real payoff for kexec comes when it is
combined with kernel crash
dumps. Crash dumps can be a useful diagnostic tool, especially for
vendors who are trying to track down a bizarre crash which only occurs at a
customer's site. So various distributors have included some sort of crash
dump capability in their kernels for some time. These patches will
typically write kernel memory to a disk or network device, then reboot the
system.
The approaches taken to crash dumps so far share one significant problem:
they all rely on the kernel to create its own dump. But this is a kernel
which has just gone into panic mode; it is not in a stable state.
The chances of an oopsing kernel completing a satisfactory crash dump are
not all that high (Arjan van de Ven estimates that it works about 10% of the
time). The real problem, however, is the risk involved in allowing an
unstable kernel to continue performing I/O; there is a very real
possibility that a (corrupted) crash dump could end up being written on top
of something that the owner would have preferred to keep.
The kexec approach gets around this problem by rebooting the system before
performing the dump. The normal, production kernel is configured to set
aside a small range of memory, which it never uses. Instead, a different
kernel is loaded into that memory; this kernel will be small, and
configured to do little other than performing crash dumps. If the system
should panic, kexec is used to immediately boot into the crash dump
kernel. This kernel, which will be starting fresh and in a known state,
can then write the contents of memory to some sort of permanent store
before rebooting into a new production kernel. This approach is safer and
more reliable; the mailing list discussion has been favorable enough that
kexec/kdump appears likely to be merged.
The reiser4 filesystem has sat in the -mm tree for some time, and Andrew
indicated that he might merge it this time around. Reiser4 has run into trouble into the past,
mostly as a result of its "file as a directory" semantics which change how
Linux works, can confuse tools, and, crucially, can lead to system
deadlocks. This feature has been disabled for now, but there is still
opposition to merging reiser4 into the mainline.
The main issue this time around would appear to be the plugin architecture
used by reiser4. Plugins can be used to change the behavior of the
filesystem in many ways, from adding compression to completely changing how
the file is laid out on disk. The plugin mechanism is a key part of Hans
Reiser's longer-term vision of how filesystems should work; he hopes to
eventually move all kinds of functionality into the filesystem level. The
kernel developers, however, do not think that this sort of mechanism should
be built into a filesystem; instead, much of what plugins do belongs in the
VFS layer. So they would like to see reiser4 slimmed down into a much
smaller, dumber system, with the plugin capability added on top of it and
made available for other filesystems as well.
Hans is resisting making this (large) change; he asks that the review process take a different
tack:
How about review by benchmark instead? It works, it runs faster
than the competition, users like it, we addressed the core kernel
patch complaints, it should go in and receive the exposure that
will result in lots of useful improvements and suggestions. It
seems like we are getting an unusual review process.
Things appear to be at a standoff which could block the inclusion of
reiser4 for some time.
Yet another change under consideration is configurable clock frequencies
for the i386 and ia-64 architectures. The current value (1KHz) turns out
not to be optimal for all users; lower clock frequencies can improve
throughput on some systems at the cost of coarser timer resolution and
possibly increased latencies. There have been complaints about the new
default (250Hz) and the fact that the patch is going in at all when more
sweeping changes to the timer system (such as the dynamic tick patch) are waiting
on the wings. Your editor's guess is that the patch will be merged, but
the default may be changed to keep the current HZ value.
FUSE (user-space filesystems) is being discussed again. FUSE has run into opposition due to the way it
overrides the file permissions checking done at the VFS level. There does
not appear to be any solution to this issue that pleases everybody, so it
is hard to say where this one might go. It is possible that FUSE will be
merged, but without its particular permissions behavior - a solution which
would leave a number of FUSE users still needing to apply a patch to get
the behavior they want.
It didn't appear on Andrew's list, but the removal of devfs has also been a
discussion item. Andrew didn't entirely like the full patch set which
completely removed devfs from the kernel; he wondered what would happen if
enough people complained and devfs had to be restored at some point in the
future. So the current approach is to simply remove the devfs
configuration option, making the functionality inaccessible. Eventually,
if no major problems turn up, the code can be removed for real.
(
Log in to post comments)