A look at 2.5.49-mm1
[Posted November 26, 2002 by corbet]
Andrew Morton's -mm patch series continues to be the staging area for no
end of interesting patches in the memory management area. As of this
writing, Andrew's latest patch is
2.5.49-mm1. Here's a look at a few of the
items in that patch that are (1) interesting, and (2) not so
complicated as to give your editor severe brain strain.
The shared
page table patch is an important part of -mm1. This work was
originally done by Daniel Phillips, but the patch has been beaten into
shape and turned into something useful by David McCracken. The standard
Linux virtual memory implementation does not share page tables between
processes; even if two processes are sharing a large chunk of memory, they
access that memory through separate page tables. With this patch,
processes that fork() share their page tables (on a copy-on-write
basis) with their child processes; page tables can also be shared when
processes use mmap() to create a large shared memory region.
This patch can speed up fork() significantly (i.e. by a factor of
almost 20 for very large processes) since it is no longer necessary to copy
page tables and set up the associated reverse mapping data structures. It also
greatly reduces the memory used for page tables and rmap entries; the
savings can be hundreds of megabytes in the "large Oracle server"
scenario. Shared page tables currently only work on x86 systems with high
memory. The patch appears stable (the last bug that had been biting people
just got stomped), but merging it into 2.5 would push the feature freeze
pretty hard at this point. On the other hand, if it does not go into 2.5,
it would not be surprising to see this patch worked into various
distributor kernels.
The asynchronous
direct I/O patch extends the asynchronous I/O infrastructure into the
direct (block) I/O subsystem. It is part of the stated goal of making all
I/O within the kernel be asynchronous.
Jens Axboe's rbtree I/O scheduler addresses
a performance problem with the current I/O block scheduler: it has to scan
through the list of pending requests every time it needs to add a new one.
As the request queue gets long (and a certain length yields better
performance), this scan takes time. So the new scheduler replaces the
linear list of requests with a tree (using the generic red/black tree
implementation in the 2.5 kernel).
The "currently untested and unused" page
reservation API is meant to deal with situations where the kernel must
be able to allocate pages without sleeping - and without failing. A call
to reserve_local_pages() sets aside a given number of pages which
are guaranteed to be available for a subsquent allocation (with the
GPF_RESERVED allocation flag). There is also a new page
walking API which simplifies the task of wanding through a process's
address space. As a special case, this API includes support for the
creation of scatter/gather lists for zero-copy I/O operations.
There's a lot of other work rolled into the 2.5.49-mm1 patch; see Andrew's
posting for the full list.
(
Log in to post comments)