Looking forward to 2.7
[Posted June 25, 2003 by corbet]
The bulk of the development effort on the kernel is currently aimed at
stabilizing things for the 2.6 release. Chances are that things will stay
that way for the better part of a year - remember that a fair amount of
stabilization work has to happen
after 2.6.0 is released. Even so,
we're starting to see hints (and even code) showing where some things might
go in 2.7.
A number of people maintain their own special-purpose kernel trees. Most
of them are aimed at adding features to the 2.4 or 2.5 kernels; many serve
as staging areas for patches which, it is hoped, will be merged into the
mainline soon. Those of you who find 2.5.x to be overly stable and boring,
though, may want to have a look at William Lee Irwin's -wli patch series,
which is full of stuff that no rational person would consider putting into
2.5 at this point. Some of the work to be found there includes:
- Single-page kernel stacks and interrupt stacks. This work, discussed here last December, increases
the number of processes a system can support by reducing the
per-process memory usage for stacks.
- Object-based reverse mapping (covered in
February). This technique cuts down on virtual memory management
overhead in most cases. In 2.5.73-wli-1, object-based reverse mapping
for anonymous objects (i.e. user-space memory) was added as well.
- High-memory page mid-level directories. The PMD is the middle tier
on systems which use three-level page table schemes - such as x86
systems with massive amounts of memory. The "highpmd" patch moves
these page directories into high memory, thus reducing the amount of
low memory required by each process on the system. Low memory (the
memory, usually below 1GB, which is directly addressable by the
kernel) tends to be scarce on truly huge systems, so any change which
shifts data structures to high memory can be helpful.
As a result of these (and numerous other) patches, William claims a
five-fold increase in the number of processes which can be supported by a
massive system. This work certainly improves scalability, and may well
make it into the mainline - but not in 2.5. (The -wli patches do not
currently include his page clustering work,
which is even more bleeding-edge. Page clustering, too, may well become a
2.7 feature.)
More in the realm of vaporware currently is Daniel Phillips's 2.7 agenda. Daniel has been
the source of numerous interesting ideas in the past (though somewhat fewer
completed implementations). Among other things, the shared page table
patch (which could also be a 2.7 candidate) was originally written by
Daniel. Looking forward to 2.7, Daniel has a few topics of interest:
- Memory defragmentation. Once a Linux system has been running for a
bit, it can get hard for kernel code to allocate blocks of two or more
physically contiguous pages. In most cases, kernel hackers don't even
try. Daniel suggests the creation of a defragmentation daemon which
would move pages around in an attempt to create larger contiguous
blocks of free memory. Additions made to the kernel in 2.5 (such as
the reverse-mapping VM) will help in this regard, since pages cannot
be moved unless the kernel knows where all the pointers to the page
are.
- Variable-size pages. This idea includes page clustering to create
large pages along with "sub-pages" which are smaller than the physical
page size. Daniel claims to have a prototype implementation which
makes the kernel smaller and faster, and which simplifies a number of
things.
- A physical block cache. This would be a separate address space which
tracks physical blocks on a given volume. There are various
performance benefits which would come from such a structure.
It is far too soon to say with any kind of certainty where the 2.7
development series will go. Linus explicitly resists creating any sort of
explicit plan, preferring to see what sorts of developments prove
interesting enough to actually get implemented and used. Still, one can
read from these early hints that the developers expect to remain interested
in virtual memory topics for a while yet.
(
Log in to post comments)