Brief items
The current development kernel is 2.5.49,
released by Linus on November 22. "
Architecture
updates, threading improvements, shm fix (the cause of
the Oracle problems), networking, scsi, modules, you name it, it's
here." Details are in
the long-format
changelog.
Linus's (pre-2.5.50) BitKeeper tree has a great many patches, the bulk of
which come from the -ac and -dj trees. It also has some latency reduction
patches from Andrew Morton, real-time swap space accounting, a number of
IDE enhancements, an LSM update, and a big ISDN update.
The current prepatch from Alan Cox is 2.5.49-ac1. It consists mostly of compile
fixes and other small repairs.
The current stable kernel is 2.4.19. 2.4.20 is getting closer,
though; 2.4.20-rc4 was released by Marcelo
on November 26.
Alan Cox has released 2.4.20-rc4-ac1, which
adds a few fixes to the 2.4.20 release candidate.
Comments (3 posted)
Kernel development news
Andrew Morton's -mm patch series continues to be the staging area for no
end of interesting patches in the memory management area. As of this
writing, Andrew's latest patch is
2.5.49-mm1. Here's a look at a few of the
items in that patch that are (1) interesting, and (2) not so
complicated as to give your editor severe brain strain.
The shared
page table patch is an important part of -mm1. This work was
originally done by Daniel Phillips, but the patch has been beaten into
shape and turned into something useful by David McCracken. The standard
Linux virtual memory implementation does not share page tables between
processes; even if two processes are sharing a large chunk of memory, they
access that memory through separate page tables. With this patch,
processes that fork() share their page tables (on a copy-on-write
basis) with their child processes; page tables can also be shared when
processes use mmap() to create a large shared memory region.
This patch can speed up fork() significantly (i.e. by a factor of
almost 20 for very large processes) since it is no longer necessary to copy
page tables and set up the associated reverse mapping data structures. It also
greatly reduces the memory used for page tables and rmap entries; the
savings can be hundreds of megabytes in the "large Oracle server"
scenario. Shared page tables currently only work on x86 systems with high
memory. The patch appears stable (the last bug that had been biting people
just got stomped), but merging it into 2.5 would push the feature freeze
pretty hard at this point. On the other hand, if it does not go into 2.5,
it would not be surprising to see this patch worked into various
distributor kernels.
The asynchronous
direct I/O patch extends the asynchronous I/O infrastructure into the
direct (block) I/O subsystem. It is part of the stated goal of making all
I/O within the kernel be asynchronous.
Jens Axboe's rbtree I/O scheduler addresses
a performance problem with the current I/O block scheduler: it has to scan
through the list of pending requests every time it needs to add a new one.
As the request queue gets long (and a certain length yields better
performance), this scan takes time. So the new scheduler replaces the
linear list of requests with a tree (using the generic red/black tree
implementation in the 2.5 kernel).
The "currently untested and unused" page
reservation API is meant to deal with situations where the kernel must
be able to allocate pages without sleeping - and without failing. A call
to reserve_local_pages() sets aside a given number of pages which
are guaranteed to be available for a subsquent allocation (with the
GPF_RESERVED allocation flag). There is also a new page
walking API which simplifies the task of wanding through a process's
address space. As a special case, this API includes support for the
creation of scatter/gather lists for zero-copy I/O operations.
There's a lot of other work rolled into the 2.5.49-mm1 patch; see Andrew's
posting for the full list.
Comments (1 posted)
User-Mode Linux (UML) is Jeff Dike's "port" of the Linux kernel to itself;
a UML instance runs as a set of processes on a "real" Linux system. UML
has long been useful as a kernel development tool - it's nice to have a
development environment which can be tweaked with normal debuggers, and
which can crash without taking down the host system. In recent times,
there has been a growing level of interest in UML for virtual hosting and
honeypot applications as well. Users (or attackers) can be given root
access to a UML instance without, one hopes, endangering the host system.
UML has traditionally worked by running every UML process as a process on
the host system. The kernel lives up at the top of each process's address
space; transitions to and from "kernel mode" are handled with signals. The
problem with this mode of operation is that it is hard to make secure,
since the UML kernel's memory range is accessible to the processes it is
running. This mode is also slow, since it involves frequent memory
protection changes and signals.
So Jeff has released a patch which fixes
these problems by radically changing how UML works. In the new scheme, a
UML instance runs as exactly two processes on the host system. One is the
UML kernel, while the other takes turn running user-space processes. The
result is more secure (kernel space, being in a separate process, is now
completely inaccessible), and significantly faster as well. There is,
according to Jeff, only one disadvantage to the new way of doing things: it
can't actually be implemented on a stock Linux kernel. This is the sort of
nagging little problem that has been the downfall of many a great
development project.
The problem has to do with how the user-space process works. That process
needs to run each UML process in its own address space. In other words,
every time the UML kernel decides to switch to a new process, the
host-system process running the UML processes needs a whole new memory
management data structure. The Linux kernel does not currently have the
ability to switch a process's memory environment in this manner.
Jeff's solution is to create a magic file called /proc/mm.
Opening this file creates a new address space; that address space can be
modified by writing to the file. When the file is closed, the address
space is deleted. Then, there is a set of ptrace() extensions,
one of which allows the caller to change the address space of the traced
process. By using /proc/mm to create a separate address space for
each UML process, the UML kernel can give each of its processes its own
view of the world within a single host system process. Problem solved.
It all looks like it works well. The /proc/mm approach may run
into some rough sailing on linux-kernel; a system call
implementation (or even /dev) might be better received. However
it is implemented, this new feature is exactly that: a new
feature. Adding new features into the virtual memory and process
management subsystems is exactly what is not supposed to happen during this
phase of 2.5 development.
Comments (2 posted)
Patches and updates
Kernel trees
Core kernel code
- Andries.Brouwer@cwi.nl: kill i_dev.
(November 22, 2002)
Development tools
Device drivers
Documentation
Filesystems and block I/O
Kernel building
Memory management
- Rik van Riel: rmap 15a.
(November 26, 2002)
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>