LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 2.5.43, which was announced by Linus on October 15. He described this release as "a huge merging frenzy for the feature freeze." It includes the read-copy-update patch (described in the July 18 LWN Kernel Page), more network asynchronous I/O patches, SMP support for User-mode Linux, a version of the InterMezzo filesystem that works in 2.5, more memory management work, the removal of kiobufs (see below), JFS and XFS updates, an AFS filesystem implementation, the "oprofile" profiler, IBM "Summit" architecture support, an ARM update, and many other fixes and updates. The long-format changelog is also available.

2.5.42 was released on October 11. There was a lot of stuff in this patch, including NFS work, numerous patches from the -dj tree, the 64-bit sector ("large block device") patch, more asynchronous I/O patches, the IDE tagged command queueing patch, and a lot of other fixes and updates. See the long-format changelog for all the details.

The latest prepatch from Alan Cox is 2.5.42-ac1. He has taken a stand in the LVM debate (see below) by merging the LVM2 device mapper; other than that, this prepatch consists mostly of compilation fixes.

The current 2.5 status summary from Guillaume Boissiere is dated October 16.

The current stable kernel is 2.4.19. Marcelo took another step toward 2.4.20 with 2.4.20-pre11, which was released on October 15.

Alan Cox released 2.4.20-pre10-ac1 on October 10; the only item in the changelog is "resync with Marcelo."

Comments (none posted)

Kernel development news

Choosing a volume manager for 2.5

As the feature freeze date gets closer, people are starting to get worried about some of the unresolved issues in the 2.5 series. At the top of the list, currently, is volume managers. The LVM code in the 2.4 kernel is not much loved by kernel developers; it has gone unmaintained in 2.5 and simply does not work. One thing that everybody seems to agree on is that LVM has reached the end of its life and needs to be removed.

But that, of course, begs the question of what will replace LVM. There are two contenders out there:

  • LVM2 is a new version of LVM, reimplemented from the ground up by Sistina Software, which also wrote the original LVM. LVM2 is actually the name given to the user-level interface; the kernel code for LVM2 is called the "device mapper" or "DM".

  • The Enterprise Volume Management System (or EVMS) is a new, independent development from IBM.

Both volume managers have been proposed for inclusion into 2.5 as replacements for LVM. There is currently very little consensus on which, if either, should go in, and Linus has stated that he is undecided on the issue.

LVM2/DM is the smaller and simpler of the two volume managers. Its goals are to be a cleaner, better implementation of LVM, so it does not add a great many features. It can combine volumes in a linear (appending one partition to another) or striped (interleaving data across partitions) manner, but does not support higher-level RAID features. The lack of RAID 4/5 support is not necessarily a problem, since the kernel "md" driver provides those capabilities. LVM2 also does not try to understand the filesystems on the volumes it manages, so changing the sizes of volumes can be a multi-step process. LVM2 is backward compatible with LVM, and provides a very similar interface to administrators.

EVMS is a much larger, more complex development. It supports RAID 4 and 5, and other features such as bad block remapping. EVMS comes with a comprehensive graphical interface. It also can work with several filesystem types to make filesystem resizing easy. From the user level, EVMS comes across as a far more complete tool.

There is substantial resistance in the kernel hacker community to merging EVMS, however. A number of coding style issues have been raised; for example, the declaration of static variables within header files is considered to be in poor taste. There are objections to the duplication of the RAID functionality already provided by the md driver. EVMS also hides the internal structure of its volumes. Imagine creating two large volumes by combining two drives (for each) in a linear mapping, then making one big volume by striping across the two linear volumes. The internal, linear volumes would not be visible as separate devices. Critics of this implementation dislike the duplication of code (against the block layer) implied by creating a new type of hidden block device; it also complicates operations that need to be performed directly on the internal devices. So there has been pressure to expose the internal devices, or, even, to work many of these volume management functions directly into the block layer API.

LVM2 has not been subjected to the same level of criticism; the consensus seems to be that the code is relatively clean and correct. The level of capability offered by LVM2 is lower, however.

The development teams for both EVMS and LVM2 have stated their willingness to address complaints in order to get their projects merged. The problem, of course, is that the feature freeze date is getting closer, and neither project will be "complete" by then. Some developers are talking seriously about merging neither volume manager, and simply doing without until the next development series opens.

Releasing a stable kernel without a logical volume manager is probably not a realistic option, however. Something will probably go in. Linus stated in the 2.5.42 announcement that he was leaning toward EVMS; EVMS also appears to be the choice of people who use volume management, as opposed to those who have to deal with the code. So the odds probably favor an EVMS merge, but it is far from a sure bet at this point.

Comments (9 posted)

Kiobufs removed

One of the advantages of the new "commits" mailing list is that one can see the patches which slip quietly into the kernel without public discussion. One of those is this patch by Christoph Hellwig, via Andrew Morton, which removes the "kiobuf" infrastructure from the kernel. This patch has been merged by Linus, and will show up in the 2.5.43 development kernel.

The kiobuf structure was developed by Stephen Tweedie as a way, initially, of implementing the raw block I/O devices in the 2.3 development series. Using kiobufs, kernel code can perform operations directly to and from user-space buffers without having to worry about walking page tables, pinning pages into memory, and so on. Kiobufs did the job they were designed to do, and they found their way into a number of kernel developments.

Not everybody was happy with the kiobuf interface, however. Many saw it as a heavyweight structure, requiring a lot of time (and memory) to set up and tear down. Kiobufs also forced the splitting of large I/O operations into small chunks - often as small as a single 512-byte sector, but never larger than 64KB. As a result, kiobufs never became the high-performance I/O mechanism that it was intended to be.

So what replaces kiobufs in the 2.5 kernel? Modern direct I/O code uses the get_user_pages() function:

        int get_user_pages (struct task_struct *tsk,
                            struct mm_struct *mm,
                            unsigned long start, int len,
                            int write, int force, 
                            struct page **pages, 
                            struct vm_area_struct **vmas);

This function faults in len user pages starting at start, and locks them into the page cache. Return values include the struct page pointers (in pages) and pointers to the associated VMA structures (in vmas); either can be NULL if the caller is not interested in that information. Code which used kiobufs will want the struct page pointers, which can be used to set up DMA operations or other direct transfers; most callers do not need the VMA pointers. The pages should be passed (individually) to page_cache_release() when the operation is complete.

The asynchronous I/O patches have also, at times, included a new kvec structure which looks like a lighter, faster version of kiobufs. No patches with kvecs have been merged by Linus, however.

Kiobufs, meanwhile, have reached a dead end. It's worth remembering, though, that kiobufs were the pioneering effort into the use of struct page pointers for direct I/O. The code may be gone, but the lessons learned from kiobufs live on in the current implementation.

Comments (1 posted)

Xbox Linux kernel patches

For those who are wondering what it takes to make Linux run on an Xbox: Michael Steil of the Xbox Linux Project has posted a note describing the project's kernel patches (and asking how to get them merged). The required changes include a workaround for an Xbox PCI bug, compensation for a faster system timer, a different way of shutting down and rebooting, the lack of a keyboard controller, support for the "FATX" filesystem, and a driver for the "Xpad" controller. The changes seem to be uncontroversial; expect Xbox support in the mainline kernel before too long.

Full Story (comments: none)

Making security hooks optional

The Linux Security Module effort ran into a bit of a snag this week as its developers tried to get another set of hooks merged into the 2.5 mainline. The result was a "back to the drawing board" experience which is likely to improve the quality of the LSM Patch overall.

The LSM team posted a set of hooks for networking operations for inclusion. There has been concern about the performance impact of the networking hooks since last June's Kernel Summit, so the LSM developers have put quite a bit of effort into minimizing any potential slowdowns. The current patches, it is said, have no measurable impact in 100MB/s networking, and a 1-2% slowdown with gigibit networks.

That is a small impact, but it was still too much for the networking hackers. Those folks have put a great deal of effort into creating the fastest networking on the planet, and they are not much interested in patches which slow things down. They take particular exception to just how these hooks are implemented. Consider one piece from the network hooks patches:

        if (skb) {
                security_ops->skb_recv_datagram(skb, sk, flags);
                return skb;
        }

The LSM patch, of course, adds the security_ops line.

The problem here is that the security hook is always called. If no particular security module has been loaded, then a dummy hook is called. So, even in the case where no security policy is being implemented (the usual case for most systems into the foreseeable future), a long-distance, indirect call is being made, with the usual effects on cache and TLB performance. The impact may be small, but it is still too much for the networking developers.

The solution, as posted by Greg Kroah-Hartman, is to move the hook invocation into a separate (inline) function. So the code fragment above would change to something like:

        if (skb) {
                security_skb_recv_datagram(skb, sk, flags);
                return skb;
        }

where security_skb_recv_datagram() would look like:

	static inline void void security_skb_recv_datagram(...) 
	{
		security_ops->skb_recv_datagram(...);
	}

This approach may not seem all that different. But now it is easy to introduce a CONFIG_SECURITY configuration option that makes all of the security hook invocations disappear entirely. Thus, for people who know that they will not load security modules (and for distributors who choose not to support security modules), the overhead of the module hooks vanishes entirely. With this change in place, the networking team is happier.

This change will also help address a couple of other problems that Rusty Russell (fresh back from his honeymoon) has pointed out. There is current a (small) race condition with module removal; it is possible that a security module could be removed from memory while other threads are still executing within the module's code. Fixing this problem will require the addition of some sort of reference counting, or the use of the recently-merged read-copy-update mechanism. It may also be desirable to control the environment in which security hooks run; for example, it could be decided that security hooks should run with preemption disabled. Both problems are more easily solved if the invocation of the hooks is wrapped within another function.

Comments (1 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Janitorial

  • Tim Schmielau: tasks.h. (October 13, 2002)

Kernel building

Memory management

  • Andrew Morton: 2.5.43-m3. (October 15, 2002)

Networking

Architecture-specific

Security-related

Benchmarks and bugs

  • Con Kolivas: 2.5.42-mm3. (Benchmark results). (October 15, 2002)

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds