The current development kernel is 2.5.30
, which was released
by Linus on August 1. It includes
the usual IDE patches (through IDE 111), changes to the "generic disk"
data structure, the "strict overcommit" VM patch, the removal of the
"khttpd" in-kernel web server, a number of devfs changes (by Greg
Kroah-Hartman, and not entirely to devfs author Richard Gooch's liking), a
long list of driverfs changes, and many other fixes and updates. See the long-format changelog
for all the details.
Linus's BitKeeper tree (which will become 2.5.31) includes an ISDN update,
more driverfs work, a JFS update, a lot of ethernet driver updates, and
more. Interestingly, this tree also includes the "User-mode Linux
preparation" patches, which make various changes to core code needed by
UML. UML itself is not there yet, but the presence of these patches
suggest that it is coming soon.
The current prepatch from Dave Jones is 2.5.30-dj1, which contains a small set of fixes
and some rubble from his switch over to BitKeeper. "Chances are this
won't even boot for many people (if any at all)."
The latest 2.5 kernel status summary from
Guillaume Boissiere came out on August 7.
The current stable kernel is 2.4.19. The much-awaited final release
was announced by Marcelo on August 2; it
contained no changes after the -rc5 release candidate. The full list of changes in 2.4.19 is available - be
warned that it is long.
Marcelo has already released 2.4.20-pre1, the
first prepatch for the 2.4.20 kernel. The list of changes is long, but it
consists mostly of fixes and driver updates. Marcelo did initially include
a backport of NAPI (high performance networking; see the October 4, 2001
LWN Kernel Page), but backed parts of it out at the last minute; he is
waiting for justification to include it for real. Says Marcelo: "2.4.20
will be a much faster release cycle than 2.4.19 was."
The current prepatch from Alan Cox is 2.4.20-pre1-ac1.
Comments (none posted)
Kernel development news
Most modern processors have the ability to work with "large pages" - single
page table entries which cover large (up to multiple megabyte) ranges of
contiguous physical memory. With one exception, this feature is not used
in the Linux kernel, which works with a 4K or 8K page size (depending on
architecture) in all situations. Smaller pages reduce internal
fragmentation, are quick to swap in and out, don't require the virtual
memory system to maintain large, contiguous chunks of memory, and help to
ensure that exactly the virtual memory that is in use now is resident in
physical memory. Small pages are the best choice for most situations.
Due to the complication of supporting multiple page sizes in the
Linux VM implementation, no such support has been merged so far.
There are advantages to working with large pages, however. 4MB of memory
in 4KB pages requires 1024 page table entries (PTEs) - that is a lot of
memory devoted to overhead, and significant processor time to set up, tear
down, and maintain those PTEs. This overhead is multiplied when shared
memory segments are in use, since Linux is currently unable to share page
tables. But the real savings with large pages has to do with the
processor's translation buffer - a small cache which remembers the result
of virtual-to-physical address translations. An address lookup through the
translation buffer is quick; one that has to actually go to the page table
is slow. Large pages greatly extend the range of the translation buffer,
and simply make applications run faster; performance improvements of 30%
have been claimed at times.
The fact that Oracle uses lots of large, shared memory regions and would
like to see large page support in the kernel is also helping to drive
development in this area.
The most recent large page patch is this one
by Rohit Seth. It allows processes to explicitly request a chunk of large
page memory with a new get_large_pages system call; there is also
a share_large_pages call for creating shared memory regions. The
patch avoids much of the complexity of supporting large pages in the VM by,
well, avoiding it. Large pages are handled completely outside of the
normal memory management mechanisms. When the system boots, a percentage
of memory (25%, by default) is simply set aside to satisfy large page
requests. These pages are handed out when requested (as long as they last)
and are not swapped.
This patch is thus (relatively) simple. It gets the job done in certain
situations - imagine a large box whose job is to run a relational database
system; nailing down a quarter of memory to improve database performance is
a reasonable thing to do. But this patch (intentionally) does not address
the larger problem. In fact, as Linus points
out, this isn't really a "large page" patch at all:
The current largepage patch is really nothing but an interface to
the TLB. Please view it as that - a direct TLB interface that has
zero impact on the VFS or VM layers, and that is meant _purely_ as
a way to expose hw capabilities to the few applications that really
really want them
So what might a real large page patch provide? Wishes that have
been expressed include:
- Support for large page file I/O. Performing I/O operations in 4K
chunks is increasingly a bandwidth bottleneck; filesystems could gain
some performance benefits by working with larger chunks. So the size
of filesystem pages - as seen in the page cache - will someday become
- No need for separate system calls. The most common suggestion has
been that the mmap system call needs a new flag to request
large page allocations.
- David Miller asks: why have system calls
or even mmap flags? Instead, applications should be given
large pages any time they request enough memory and the system is able
to do it. Then the performance benefits would be available without
the need to recode applications (in a nonportable way) to use large
The automatic use of large pages would be helped by another suggestion from
David: if it becomes necessary to swap out a large page, simply split it
back into a long list of regular pages and proceed as usual. Then most of
the swap complexity would go away.
Of course, the October deadline is getting closer. So all of these ideas
are almost certainly destined to wait until after the next stable series.
But one of the variants of the simpler "TLB interface" patches may yet get
in this time around and make the database vendors (and others) happy.
(What, you may ask, is the "one exception" where the kernel uses large
pages now? The mapping of the kernel image itself - a single, large chunk
of non-swappable memory - is handled with a large page PTE.)
Comments (1 posted)
Patches and updates
Core kernel code
- Ingo Molnar: tls-2.5.30-A1. Thread-local storage enhanced with better support for WINE and debuggers.
(August 7, 2002)
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Development>>