The current 2.6 kernel is 2.6.11
, finally, on March 2. Only a
small number of fixes went in after 2.6.11-rc5
, which had, itself, consisted of a
slightly larger number of fixes. For those just tuning in, 2.6.11 includes
four-level page tables
a rework of the direct rendering code,
in-inode extended attributes for ext3 (gives better Samba performance
a new pipe implementation
a bunch of latency reduction work (though some latency issues remain),
the Big Kernel Semaphore
and lots more. The long-format
has the details.
As of this writing, no post-2.6.11 patches have been merged into Linus's
It's worth noting that Linus has started a
discussion on making some (relatively small) changes to the kernel
The current -mm tree is 2.6.11-rc5-mm1.
Recent changes to -mm include a new set of scheduler patches, a reiser4
update, some /dev/mem tweaks to get around cache coherency
problems, a new NFS access control list patch set, and a big set of PCMCIA
patches which make that subsystem work with the hotplug mechanism (and
obsolete the longstanding cardmgr daemon).
The LWN 2.6 API changes
document has recently been updated, and should be current to the 2.6.11
The current 2.4 prepatch remains 2.4.30-pre2; there have been no 2.4
prepatches released since February 23.
Comments (5 posted)
Kernel development news
It's a pity: for a while we were thinking 2.6.11 would be a big
step forward for mainline latency; but it now looks to me like
these tests have come too late in the cycle to be dealt with
-- Hugh Dickins
It seems that a lock-breaking patch in the VM subsystem got pushed aside by
the four-level page table work, and thus didn't make it into 2.6.11. Hugh
has posted a fix, but, by the time it came,
2.6.11 was close enough that putting in locking changes didn't seem like a
Comments (1 posted)
The book has been out for a couple of weeks, but now that there is a
press release (click below), it's official: Linux Device Drivers, Third
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman,
is now available. Look for it at your favorite bookstore. This book will
also be released online under the Creative Commons
, but we do not currently have an
estimate for when it will be available.
Full Story (comments: 9)
As has been described in previous Kernel Page
, the Linux kernel works with a four-level, hierarchical page
table mechanism. A virtual address is translated to a physical address by
walking down the table until the relevant page table entry is found. When
running on hardware which does not implement a four-level tree, the kernel
transparently "folds" the missing layers out of existence. So the same
high-level memory management code runs on all hardware, regardless of the
depth of page table tree that hardware implements.
There is one interesting issue with this scheme: not all hardware uses this
sort of hierarchical page table mechanism. It matches the i386 hardware well
- to the point that the processor works directly from the same page tables
that the generic kernel memory management code manipulates. Other
processors have different ways of handling address translation, however.
The ia-64 architecture uses a linear page table which is, itself, mapped in
virtual memory; there is a "virtual hashed page table walker" hardware
function which can quickly resolve page faults in many situations. The
hierarchical page tables carefully maintained by the core kernel are never
used directly by the hardware; instead, the architecture-specific code
takes care of moving information between the core kernel tables and the
hardware versions. This impedance matching requires extra code and work;
it also makes it harder to take advantage of any high-level features that
the hardware may offer.
chapter from ia-64 Linux Kernel for a detailed description of
how the ia-64 architecture handles page tables).
Christoph Lameter would like to get rid of the disconnect between in-kernel
and hardware page tables; to that end, he has proposed a new abstraction layer which would handle
access to the processor's memory management unit (MMU). With the new layer
in place, there would be no more hierarchical page tables in the core
kernel. If the hardware uses hierarchical tables, the
architecture-specific code would still work with them, but they would be
hidden from the core. The proposed replacement interface is somewhat vague
at this stage, but some features have been sketched out:
- A new type, mmu_entry_t would represent a translation from
a virtual address to the corresponding physical address. It thus
functions like a page table entry, but it could contain information
not necessarily found in page table entries now, such as "large page"
information and, possibly, statistics information.
- A translation set (mmu_translation_set_t) represents the
address space for a process; it is a collection of
mmu_entry_t values and required housekeeping information.
- The new interface would also implement transactions
(mmu_transaction_t), so that complex changes to page tables
could be performed in an atomic manner. The transaction abstraction
hides the page table locking within the architecture-specific code,
since that locking may be done in very different ways.
Initially, the new interface would be implemented on top of the existing
hierarchical page tables. The transition could thus be made a little
smoother, and architectures which actually use the hierarchical tables
could continue to function as always. Eventually, however, direct access
to those tables from the core kernel code would be removed, and
architectures with different ideas of how page tables should be managed
would be able to drop the hierarchical tables.
Once the transition has been made, other things would become possible as
well. The current memory management system is really only comfortable when
pages are all the same size. The support for huge pages has been bolted on
to the side, and it does not really hide the fact that different processors
handle large pages in very different ways. The new scheme would present a
simple mksize() function to change the size of a page, and would
hide from the kernel the details of how that size change is actually done.
In addition, the new scheme would allow for global pages which appear in
every process's address space, and for keeping statistics of the various
types of pages in the system.
Discussion of the proposal has been muted. Actually, it has been almost
nonexistent. Unfortunately, things often happen that way when abstract
proposals are posted to the kernel lists. Kernel developers respect actual
code far more than design ideas; they will often wait until an
implementation is posted for review, then talk about how it should
have been done. So the new memory management interface may have to make
some more progress before the discussion can truly begin.
Comments (1 posted)
The kernel developers have set a long term goal: reduce the number of
kernel symbols exported to modules. There is a general feeling that the
module interface has gone out of control, and that modules are allowed to
reach into too many parts of the core kernel. Additionally, there seems to
be no reason for many exports; quite a few exported symbols are not used by
any modules in the mainline kernel. So almost every 2.6.x release has
unexported at least a handful of symbols, sometimes to the detriment of
It looked like more of the same when Adrian Bunk posted a patch unexporting
do_settimeofday(), which is not used by any mainline modules.
There didn't seem to be any reason to allow modules to change the kernel's
idea of what time it is, so the symbol could go.
Andrew Morton has drawn the line, however,
on symbol removals. He now wants them to be marked as being deprecated
(when used in a module), added to the feature removal schedule, and
actually removed a year down the line. His position is:
I don't see much point in playing these games. Deprecate it, pull
it out next year, done.
If this view sticks, it means that the days of abrupt disappearance of
exported symbols are done. Symbols can still go away, but there will be
some advance warning before it happens. Whether it will stick remains to
be seen, however; there is a definite subset of kernel hackers who feel
that there is no need to make life easier for out-of-tree modules.
So what happened with the patch? It turns
out that the ARM architecture has a number of out-of-tree real-time
clock modules which need to be able to call do_settimeofday(). So
Adrian withdrew the patch, and the symbol remains exported.
Comments (1 posted)
The Xen virtual
has been getting a great deal of attention. Xen allows virtual
systems to be run, over Linux, with high performance. Each machine can run
a different operating system (perhaps even Windows, eventually), can have
its resource usage limited, and can even be moved between physical hosts
while it is running. Xen is of interest to people doing kernel
development, or who are interested in providing virtual hosting services.
Xen works by creating its own virtual hardware architecture, to which guest
kernels are ported. The separate architecture is required to enable Xen to
truly isolate guest systems in such a way that they cannot break out. This
approach also allows Xen to perform various performance-enhancing tricks,
such as allowing Xen systems to communicate by transparently remapping
pages between them. For Linux, the Xen patches create a completely new
architecture (arch/xen) which, while resembling the i386
architecture (and copying many files from it), is separate from it.
For some time now, certain kernel developers have been saying that the
merging of Xen was imminent. Nobody seems to object to having support for
Xen in the mainline kernel, but there is one little glitch: back in
December, Andi Kleen objected to the
creation of a separate Xen architecture. The creation of a completely new
architecture which duplicates much of the i386 code will, says Andi, lead
to long-term maintenance problems. He would much rather see Xen support
merged into an i386 subarchitecture.
Xen developer Ian Pratt initially responded
that such a merge was not feasible, and, besides, maintaining the separate
architecture had not been a problem for them so far. Andi remained
convinced, however, that things would not work well in the long term. The
discussion slowed to a halt without any real decisions being made, one way
Andrew Morton recently decided to restart the
conversation with an opinion of his own:
I tend to agree with Andi, and I'm not sure that the Xen team fully
appreciate the downside of having an own-architecture in the
kernel.org kernel and the upside of having their code integrated
with the most-maintained architecture. It could be that the
potential problems haven't been sufficiently well communicated.
Ian Pratt came back with a new proposal.
The Xen group would start by doing the easy parts of merging the Xen code
directly into the i386 architecture. Most of this work, he says, would
involve cleaning up the i386 code; the result would be a halving of the
number of files modified by the Xen patches. The remaining changes would
then go in as an i386 subarchitecture except for any Xen code which is
useful for all architectures; that, instead, would end up in
drivers/xen/core. Further unification and cleanup could happen
after the merge takes place.
This approach appears to have satisfied the critics, the obligatory minor
quibbles notwithstanding. So that is probably the path Xen will take to
get into the mainline. There is, it would seem, a fair amount of work to
be done before that mainline merge can actually happen, though, so it's not
at all clear that it can be done in time for 2.6.12.
Comments (2 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>