The object-based reverse-mapping VM
[Posted February 25, 2003 by corbet]
The reverse-mapping VM (RMAP) was merged into 2.5 to solve a specific problem:
there was no easy way for the kernel to find out which page tables referred
to a given physical page. Certain activities - swapping being at the top
of the list - require making changes to all relevant page tables. You
simply can not swap a page to disk until all of the page table entries
pointing to it have been invalidated. The 2.4 kernel handles swapping by
scanning through the page tables, one process at a time, and invalidating
entries for
pages that look like suitable victims. If it happens to find all of the
page table entries in time, the page can then be evicted to disk.
In 2.5, a new data structure was added to make this process easier.
Initially each page in the system (as represented by its
struct page structure in the system memory map) had a linked
list of reverse mapping entries pointing to every page table entry
referencing that page. That worked, but it introduced some problems of its
own. The reverse mapping entries took up a lot of memory, and quite a bit
of time to maintain. Operations which required working with a lot of pages
slowed down. And the fork() system call, which must add a new
reverse mapping entry for every page in the process's address space, slowed
significantly. As a result, there has been an ongoing effort to mitigate
RMAP's costs.
Now a new technique, as embodied in this
patch by Dave McCracken, has been proposed. This approach, called
"object-based reverse mapping," is based on the realization that, in some
cases at least, there are other paths from a struct page to a
page table entry. If those paths can be used, the full RMAP overhead is
unnecessary and can be cut out.
By one reckoning, there are two basic types of user-mode page in a Linux
system. Anonymous pages are just plain memory, the kind a process
would get from malloc(). Most other pages are file-backed
in some way; this means that, behind the scenes, the contents of that page
are associated with a file somewhere in the system. File-backed pages
include program code and files mapped in with
mmap(). For these pages, it is possible to find their page table
entries without using RMAP entries. To see how, let us refer to the
following low-quality graphic, the result of your editor's nonexistent
drawing skills:
The struct page structure for a given page is in the upper left
corner. One of the fields of that structure is called mapping; it
points to an address_space structure describing the object which
backs up that page. That structure includes the inode for the file,
various data structures for managing the pages belonging to the file, and
two linked lists (i_mmap and i_mmap_shared) containing
the vm_area_struct structures for each process which has a mapping
into the file. The vm_area_struct (usually called a "VMA")
describes how the mapping appears in a particular process's address space;
the file /proc/pid/maps lists out the VMAs for the process
with ID pid. The VMA provides the information needed to
find out what a given page's virtual address is in that process's address
space, and that, in turn, can be used to find the correct page table
entry.
So all the object-based RMAP patch does is remove the direct reverse
mapping entry (pointing from the page structure directly to the
page table entry). When it is necessary to find that entry, the virtual
memory subsystem simply takes the longer way around, via the
address_space and vm_area_struct structures. Finding a
page table entry this way certainly will take longer than following a
direct pointer, but it should come out cheaper when one considers all of
the RMAP information that no longer needs to be maintained.
The object-based RMAP patch does not change the handling of anonymous
pages, which do not have an associated address_space structure.
Martin Bligh has posted some initial
benchmarks showing some moderate improvement in the all-important
kernel compilation test. The object-based approach does seem to help with
some of the worst RMAP performance regressions. Andrew Morton pointed out a worst-case performance scenario for
this approach, but it is not clear how big a problem it would really be.
Andrew has included this patch in his 2.5.62-mm3 tree.
Assuming that this patch goes in (it's late in the development process, but
that hasn't stopped Linus from taking rather more disruptive VM patches
before...), one might wonder if a complete object-based implementation
might follow. The answer is "probably not." Anonymous pages tend to be
private to individual processes, so there is no long chain of reverse
mappings to manage in any case. So even if such pages came to look like
file-backed pages (as could happen, say, with a rework of the swapping
code), there isn't necessarily much to be gained from the object-based
approach.
(
Log in to post comments)