User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is 2.6.11, released, finally, on March 2. Only a small number of fixes went in after 2.6.11-rc5, which had, itself, consisted of a slightly larger number of fixes. For those just tuning in, 2.6.11 includes InfiniBand support, four-level page tables, debugfs, a rework of the direct rendering code, in-inode extended attributes for ext3 (gives better Samba performance), a new pipe implementation, a bunch of latency reduction work (though some latency issues remain), the Big Kernel Semaphore patch, and lots more. The long-format changelog has the details.

As of this writing, no post-2.6.11 patches have been merged into Linus's BitKeeper repository.

It's worth noting that Linus has started a discussion on making some (relatively small) changes to the kernel release process.

The current -mm tree is 2.6.11-rc5-mm1. Recent changes to -mm include a new set of scheduler patches, a reiser4 update, some /dev/mem tweaks to get around cache coherency problems, a new NFS access control list patch set, and a big set of PCMCIA patches which make that subsystem work with the hotplug mechanism (and obsolete the longstanding cardmgr daemon).

The LWN 2.6 API changes document has recently been updated, and should be current to the 2.6.11 release.

The current 2.4 prepatch remains 2.4.30-pre2; there have been no 2.4 prepatches released since February 23.

Comments (5 posted)

Kernel development news

Quote of the week

It's a pity: for a while we were thinking 2.6.11 would be a big step forward for mainline latency; but it now looks to me like these tests have come too late in the cycle to be dealt with safely.

-- Hugh Dickins

It seems that a lock-breaking patch in the VM subsystem got pushed aside by the four-level page table work, and thus didn't make it into 2.6.11. Hugh has posted a fix, but, by the time it came, 2.6.11 was close enough that putting in locking changes didn't seem like a good idea.

Comments (1 posted)

Linux Device Drivers, 3rd Edition released

The book has been out for a couple of weeks, but now that there is a press release (click below), it's official: Linux Device Drivers, Third Edition by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman, is now available. Look for it at your favorite bookstore. This book will also be released online under the Creative Commons Attribution-ShareAlike license, but we do not currently have an estimate for when it will be available.

Full Story (comments: 10)

A proposal for a major memory management rework

As has been described in previous Kernel Page articles, the Linux kernel works with a four-level, hierarchical page table mechanism. A virtual address is translated to a physical address by walking down the table until the relevant page table entry is found. When running on hardware which does not implement a four-level tree, the kernel transparently "folds" the missing layers out of existence. So the same high-level memory management code runs on all hardware, regardless of the depth of page table tree that hardware implements.

There is one interesting issue with this scheme: not all hardware uses this sort of hierarchical page table mechanism. It matches the i386 hardware well - to the point that the processor works directly from the same page tables that the generic kernel memory management code manipulates. Other processors have different ways of handling address translation, however. The ia-64 architecture uses a linear page table which is, itself, mapped in virtual memory; there is a "virtual hashed page table walker" hardware function which can quickly resolve page faults in many situations. The hierarchical page tables carefully maintained by the core kernel are never used directly by the hardware; instead, the architecture-specific code takes care of moving information between the core kernel tables and the hardware versions. This impedance matching requires extra code and work; it also makes it harder to take advantage of any high-level features that the hardware may offer.

(See this chapter from ia-64 Linux Kernel for a detailed description of how the ia-64 architecture handles page tables).

Christoph Lameter would like to get rid of the disconnect between in-kernel and hardware page tables; to that end, he has proposed a new abstraction layer which would handle access to the processor's memory management unit (MMU). With the new layer in place, there would be no more hierarchical page tables in the core kernel. If the hardware uses hierarchical tables, the architecture-specific code would still work with them, but they would be hidden from the core. The proposed replacement interface is somewhat vague at this stage, but some features have been sketched out:

  • A new type, mmu_entry_t would represent a translation from a virtual address to the corresponding physical address. It thus functions like a page table entry, but it could contain information not necessarily found in page table entries now, such as "large page" information and, possibly, statistics information.

  • A translation set (mmu_translation_set_t) represents the address space for a process; it is a collection of mmu_entry_t values and required housekeeping information.

  • The new interface would also implement transactions (mmu_transaction_t), so that complex changes to page tables could be performed in an atomic manner. The transaction abstraction hides the page table locking within the architecture-specific code, since that locking may be done in very different ways.

Initially, the new interface would be implemented on top of the existing hierarchical page tables. The transition could thus be made a little smoother, and architectures which actually use the hierarchical tables could continue to function as always. Eventually, however, direct access to those tables from the core kernel code would be removed, and architectures with different ideas of how page tables should be managed would be able to drop the hierarchical tables.

Once the transition has been made, other things would become possible as well. The current memory management system is really only comfortable when pages are all the same size. The support for huge pages has been bolted on to the side, and it does not really hide the fact that different processors handle large pages in very different ways. The new scheme would present a simple mksize() function to change the size of a page, and would hide from the kernel the details of how that size change is actually done. In addition, the new scheme would allow for global pages which appear in every process's address space, and for keeping statistics of the various types of pages in the system.

Discussion of the proposal has been muted. Actually, it has been almost nonexistent. Unfortunately, things often happen that way when abstract proposals are posted to the kernel lists. Kernel developers respect actual code far more than design ideas; they will often wait until an implementation is posted for review, then talk about how it should have been done. So the new memory management interface may have to make some more progress before the discussion can truly begin.

Comments (1 posted)

Removing exported symbols in a stable kernel

The kernel developers have set a long term goal: reduce the number of kernel symbols exported to modules. There is a general feeling that the module interface has gone out of control, and that modules are allowed to reach into too many parts of the core kernel. Additionally, there seems to be no reason for many exports; quite a few exported symbols are not used by any modules in the mainline kernel. So almost every 2.6.x release has unexported at least a handful of symbols, sometimes to the detriment of out-of-tree modules.

It looked like more of the same when Adrian Bunk posted a patch unexporting do_settimeofday(), which is not used by any mainline modules. There didn't seem to be any reason to allow modules to change the kernel's idea of what time it is, so the symbol could go.

Andrew Morton has drawn the line, however, on symbol removals. He now wants them to be marked as being deprecated (when used in a module), added to the feature removal schedule, and actually removed a year down the line. His position is:

I don't see much point in playing these games. Deprecate it, pull it out next year, done.

If this view sticks, it means that the days of abrupt disappearance of exported symbols are done. Symbols can still go away, but there will be some advance warning before it happens. Whether it will stick remains to be seen, however; there is a definite subset of kernel hackers who feel that there is no need to make life easier for out-of-tree modules.

So what happened with the patch? It turns out that the ARM architecture has a number of out-of-tree real-time clock modules which need to be able to call do_settimeofday(). So Adrian withdrew the patch, and the symbol remains exported.

Comments (1 posted)

Toward the merging of Xen

The Xen virtual machine has been getting a great deal of attention. Xen allows virtual systems to be run, over Linux, with high performance. Each machine can run a different operating system (perhaps even Windows, eventually), can have its resource usage limited, and can even be moved between physical hosts while it is running. Xen is of interest to people doing kernel development, or who are interested in providing virtual hosting services.

Xen works by creating its own virtual hardware architecture, to which guest kernels are ported. The separate architecture is required to enable Xen to truly isolate guest systems in such a way that they cannot break out. This approach also allows Xen to perform various performance-enhancing tricks, such as allowing Xen systems to communicate by transparently remapping pages between them. For Linux, the Xen patches create a completely new architecture (arch/xen) which, while resembling the i386 architecture (and copying many files from it), is separate from it.

For some time now, certain kernel developers have been saying that the merging of Xen was imminent. Nobody seems to object to having support for Xen in the mainline kernel, but there is one little glitch: back in December, Andi Kleen objected to the creation of a separate Xen architecture. The creation of a completely new architecture which duplicates much of the i386 code will, says Andi, lead to long-term maintenance problems. He would much rather see Xen support merged into an i386 subarchitecture.

Xen developer Ian Pratt initially responded that such a merge was not feasible, and, besides, maintaining the separate architecture had not been a problem for them so far. Andi remained convinced, however, that things would not work well in the long term. The discussion slowed to a halt without any real decisions being made, one way or another.

Andrew Morton recently decided to restart the conversation with an opinion of his own:

I tend to agree with Andi, and I'm not sure that the Xen team fully appreciate the downside of having an own-architecture in the kernel and the upside of having their code integrated with the most-maintained architecture. It could be that the potential problems haven't been sufficiently well communicated.

Ian Pratt came back with a new proposal. The Xen group would start by doing the easy parts of merging the Xen code directly into the i386 architecture. Most of this work, he says, would involve cleaning up the i386 code; the result would be a halving of the number of files modified by the Xen patches. The remaining changes would then go in as an i386 subarchitecture except for any Xen code which is useful for all architectures; that, instead, would end up in drivers/xen/core. Further unification and cleanup could happen after the merge takes place.

This approach appears to have satisfied the critics, the obligatory minor quibbles notwithstanding. So that is probably the path Xen will take to get into the mainline. There is, it would seem, a fair amount of work to be done before that mainline merge can actually happen, though, so it's not at all clear that it can be done in time for 2.6.12.

Comments (2 posted)

Patches and updates

Kernel trees


Core kernel code

Device drivers


Filesystems and block I/O

Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds