Release status
Kernel release status
The current development kernel is 2.5.43, which was
announced by Linus on October 15. He
described this release as "a huge merging frenzy for the feature freeze."
It includes the read-copy-update patch (described in
the July 18 LWN Kernel Page),
more network asynchronous I/O patches, SMP support for User-mode Linux, a
version of the InterMezzo filesystem that works in 2.5, more memory
management work, the removal of kiobufs (see below), JFS and XFS updates,
an AFS filesystem implementation, the "oprofile" profiler, IBM "Summit"
architecture support, an ARM update, and many other fixes and updates. The
long-format changelog is also available.
2.5.42 was released on October 11.
There was a lot of stuff in this patch, including NFS work, numerous
patches from the -dj tree, the 64-bit sector ("large block device") patch,
more asynchronous I/O patches, the IDE tagged command queueing patch, and a
lot of other fixes and updates. See the
long-format changelog for all the details.
The latest prepatch from Alan Cox is 2.5.42-ac1. He has taken a stand in the LVM
debate (see below) by merging the LVM2 device mapper; other than that, this
prepatch consists mostly of compilation fixes.
The current 2.5 status summary from
Guillaume Boissiere is dated October 16.
The current stable kernel is 2.4.19. Marcelo took another step
toward 2.4.20 with 2.4.20-pre11, which was
released on October 15.
Alan Cox released 2.4.20-pre10-ac1 on
October 10; the only item in the changelog is "resync with Marcelo."
Comments (none posted)
Kernel development news
Choosing a volume manager for 2.5
As the feature freeze date gets closer, people are starting to get worried
about some of the unresolved issues in the 2.5 series. At the top of the
list, currently, is volume managers. The LVM code in the 2.4 kernel is not
much loved by kernel developers; it has gone unmaintained in 2.5 and simply
does not work. One thing that everybody seems to agree on is that
LVM has reached the end of its life and needs to be removed.
But that, of course, begs the question of what will replace LVM. There are
two contenders out there:
- LVM2 is a new version of LVM, reimplemented from the ground up by
Sistina Software, which also wrote the original LVM. LVM2 is actually
the name given to the user-level interface; the kernel code for LVM2
is called the "device mapper" or "DM".
- The Enterprise Volume
Management System (or EVMS) is a new, independent development from
IBM.
Both volume managers have been proposed for inclusion into 2.5 as
replacements for LVM. There is currently very little consensus on which,
if either, should go in, and Linus has stated that he is undecided on the
issue.
LVM2/DM is the smaller and simpler of the two volume managers. Its goals
are to be a cleaner, better implementation of LVM, so it does not add a
great many features. It can combine volumes in a linear (appending one
partition to another) or striped (interleaving data across partitions)
manner, but does not support higher-level RAID features. The lack of RAID
4/5 support is not necessarily a problem, since the kernel "md" driver
provides those capabilities. LVM2 also does not try to understand the
filesystems on the volumes it manages, so changing the sizes of volumes can
be a multi-step process. LVM2 is backward compatible with LVM, and
provides a very similar interface to administrators.
EVMS is a much larger, more complex development. It supports RAID 4
and 5, and other features such as bad block remapping. EVMS comes
with a comprehensive graphical interface. It also can work with several
filesystem types to make filesystem resizing easy. From the user level,
EVMS comes across as a far more complete tool.
There is substantial resistance in the kernel hacker community to merging
EVMS, however. A number of coding style issues have been raised; for
example, the declaration of static variables within header files is
considered
to be in poor taste. There are objections to the duplication of the RAID
functionality already provided by the md driver. EVMS also hides the
internal structure of its volumes. Imagine creating two large volumes by
combining two drives (for each) in a linear mapping, then making one big
volume by striping across the two linear volumes. The internal, linear
volumes would not be visible as separate devices. Critics of this
implementation dislike the duplication of code (against the block layer)
implied by creating a new type of hidden block device; it also complicates
operations that need to be performed directly on the internal devices.
So there has been pressure to expose the internal devices, or, even, to
work many of these volume management functions directly into the block
layer API.
LVM2 has not been subjected to the same level of criticism; the consensus
seems to be that the code is relatively clean and correct. The level of
capability offered by LVM2 is lower, however.
The development teams for both EVMS and LVM2 have stated their willingness
to address complaints in order to get their projects merged. The problem,
of course, is that the feature freeze date is getting closer, and neither
project will be "complete" by then. Some developers are talking seriously
about merging neither volume manager, and simply doing without until the
next development series opens.
Releasing a stable kernel without a logical volume manager is probably not
a realistic option, however. Something will probably go in.
Linus stated
in the 2.5.42 announcement that he was leaning toward EVMS; EVMS also
appears to be the choice of people who use volume management, as
opposed to those who have to deal with the code. So the odds probably
favor an EVMS merge, but it is far from a sure bet at this point.
Comments (9 posted)
Kiobufs removed
One of the advantages of the new "commits" mailing list is that one can see
the patches which slip quietly into the kernel without public discussion.
One of those is
this patch by Christoph
Hellwig, via Andrew Morton, which removes the "kiobuf" infrastructure from
the kernel. This patch has been merged by Linus, and will show up in the
2.5.43 development kernel.
The kiobuf structure was developed by Stephen Tweedie as a way, initially,
of implementing the raw block I/O devices in the 2.3 development series.
Using kiobufs, kernel code can perform operations directly to and from
user-space buffers without having to worry about walking page tables,
pinning pages into memory, and so on. Kiobufs did the job they were
designed to do, and they found their way into a number of kernel
developments.
Not everybody was happy with the kiobuf interface, however. Many saw it as
a heavyweight structure, requiring a lot of time (and memory) to set up and
tear down. Kiobufs also forced the splitting of large I/O operations into
small chunks - often as small as a single 512-byte sector, but never larger
than 64KB. As a result, kiobufs never became the high-performance I/O
mechanism that it was intended to be.
So what replaces kiobufs in the 2.5 kernel? Modern direct I/O code uses
the get_user_pages() function:
int get_user_pages (struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start, int len,
int write, int force,
struct page **pages,
struct vm_area_struct **vmas);
This function faults in len user pages starting at start,
and locks them into the page cache. Return values include the
struct page pointers (in pages) and pointers to the
associated VMA structures (in vmas); either can be NULL
if the caller is not interested in that information. Code which used
kiobufs will want the struct page pointers, which can be used
to set up DMA operations or other direct transfers; most callers do not
need the VMA pointers. The pages should be passed (individually) to
page_cache_release() when the operation is complete.
The asynchronous I/O patches have also, at times, included a new
kvec structure which looks like a lighter, faster version of
kiobufs. No patches with kvecs have been merged by Linus, however.
Kiobufs, meanwhile, have reached a dead end. It's worth remembering,
though, that kiobufs were the pioneering effort into the use of
struct page pointers for direct I/O. The code may be gone,
but the lessons learned from kiobufs live on in the current
implementation.
Comments (1 posted)
Xbox Linux kernel patches
For those who are wondering what it takes to make Linux run on an Xbox:
Michael Steil of the Xbox Linux Project has posted
a note describing the project's kernel patches
(and asking how to get them merged). The required changes include a
workaround for an Xbox PCI bug, compensation for a faster system timer, a
different way of shutting down and rebooting, the lack of a keyboard
controller, support for the "FATX" filesystem, and a driver for the "Xpad"
controller. The changes seem to be uncontroversial; expect Xbox support in
the mainline kernel before too long.
Full Story (comments: none)
Making security hooks optional
The Linux Security Module effort ran into a bit of a snag this week as its
developers tried to get another set of hooks merged into the 2.5 mainline.
The result was a "back to the drawing board" experience which is likely to
improve the quality of the LSM Patch overall.
The LSM team posted a set of hooks for
networking operations for inclusion. There has been concern about the
performance impact of the networking hooks since last June's Kernel Summit,
so the LSM developers have put quite a bit of effort into minimizing any
potential slowdowns. The current patches, it is said, have no measurable
impact in 100MB/s networking, and a 1-2% slowdown with gigibit networks.
That is a small impact, but it was still too much for the networking
hackers. Those folks have put a great deal of effort into creating the
fastest networking on the planet, and they are not much interested in
patches which slow things down. They take particular exception to just how
these hooks are implemented. Consider one piece from the network hooks
patches:
if (skb) {
security_ops->skb_recv_datagram(skb, sk, flags);
return skb;
}
The LSM patch, of course, adds the security_ops line.
The problem here is that the security hook is always called. If no
particular security module has been loaded, then a dummy hook is called.
So, even in the case where no security policy is being implemented (the
usual case for most systems into the foreseeable future), a long-distance,
indirect call is being made, with the usual effects on cache and TLB
performance. The impact may be small, but it is still too much for the
networking developers.
The solution, as posted by Greg
Kroah-Hartman, is to move the hook invocation into a separate (inline)
function. So the code fragment above would change to something like:
if (skb) {
security_skb_recv_datagram(skb, sk, flags);
return skb;
}
where security_skb_recv_datagram() would look like:
static inline void void security_skb_recv_datagram(...)
{
security_ops->skb_recv_datagram(...);
}
This approach may not seem all that different. But now it is easy to
introduce a CONFIG_SECURITY configuration option that makes all of
the security hook invocations disappear entirely. Thus, for people who
know that they will not load security modules (and for distributors who
choose not to support security modules), the overhead of the module hooks
vanishes entirely. With this change in place, the networking team is
happier.
This change will also help address a couple of other problems that Rusty
Russell (fresh back from his honeymoon) has pointed out. There is current a (small) race
condition with module removal; it is possible that a security module could
be removed from memory while other threads are still executing within the
module's code. Fixing this problem will require the addition of some sort
of reference counting, or the use of the recently-merged read-copy-update
mechanism. It may also be desirable to control the environment in which
security hooks run; for example, it could be decided that security hooks
should run with preemption
disabled. Both problems are more easily solved if the invocation of the
hooks is wrapped within another function.
Comments (1 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
- Tim Schmielau: tasks.h.
(October 13, 2002)
Kernel building
Memory management
Networking
Architecture-specific
Security-related
Benchmarks and bugs
- Con Kolivas: 2.5.42-mm3. (Benchmark results).
(October 15, 2002)
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>