Fixing up the shared page table patch
[Posted December 31, 2002 by corbet]
One patch that is still apparently being considered for 2.5 is the shared
page table code. Since this patch makes significant changes to the VM
subsystem, it is worth looking at why it is interesting, and what its
prospects are.
Shared page tables do exactly what one would expect: they allow processes to
share their page tables. The primary application of this technique is at
fork() time; when a process creates a new child, the two processes
share the same low-level page tables. These tables are shared in a "copy
on write" mode; when either process changes memory both the page being
changed and the page table that point to it are copied. The idea is that
if the new process calls exec() before changing much memory, much
of the page table copying overhead can be avoided entirely.
Shared page tables can also save significant amounts of memory when large
processes (or large shared memory segments) are involved, but the
fork() overhead is the real driving force behind this patch. The
2.5 kernel has a significantly slower fork() than 2.4, as a result
of the reverse mapping VM code. Copying page tables requires copying the
reverse map entries, which slows fork() down. Shared page tables,
it is hoped, can eliminate that copy and get fork() back to
something close to its 2.4 performance.
So it was a little disappointing when Andrew Morton ran some benchmarks and discovered that shared
page tables made fork() even slower than it was before. The
optimization, it seems, is really a pessimization - at least when
relatively small processes are involved, which is the case that matters to
most users.
Dave McCracken figured out what is going
on. Most smaller processes, it seems, have three distinct areas of
writable memory, being the data area, the stack, and the C library's data
area. On most systems, a single page table page holds enough page table
entries to map 4MB of actual memory. Unless the process is fairly large,
then, there will be exactly one page table page for each of the three
writable areas, or three in all.
The shared page table patch thus allows the deferral of the copying of
three pages worth of page table entries. As soon as either process changes
the memory mapped by one of those page table pages, that page can no longer
be shared and all page table entries within that page must be copied.
Unfortunately, even a process which does nothing but call exec()
will almost certainly write memory in all three areas, requiring the
unsharing of all three page table pages.
In other words, the shared page table patch is introducing the extra
overhead required to share and unshare page table pages, but, in most
cases, all of those pages will have to be unshared and copied anyway. So
the extra overhead just makes things even slower than they were before.
There are a couple of things that can be done to address this problem.
Dave posted a relatively simple fix: simply
do not share page tables unless the forking process has at least four pages
worth. It turns out that, if even one page table page need not be copied,
the sharing overhead is worthwhile. So, if you turn off sharing in the case
where it doesn't help, you get back to where you were before, and can enjoy
the benefits of page table sharing for very large processes.
A more involved approach would be to spread out a process's writable memory
so that it is mapped by more than one page table page. Writable process
memory comes in numerous distinct chunks; a look at the
/proc/.../maps entry for the emacs process being used to write
this article shows 33 separate, writable virtual memory areas (VMAs). If
each VMA is mapped on its own 4MB boundary, and thus has its own page table
page, then writing in one VMA does not require copying the page table
entries for all the other VMAs.
Andrew Morton gave this approach a try, and
saw a 5-10% speedup. Performance is improved, in other words, but is still
far short of what a 2.4 kernel can do.
The bottom line appears to be this: the shared page table patch, while
providing some benefits, is failing in its goal of mitigating the extra
fork() overhead brought by the reverse mapping VM. Unless
somebody finds a way to address this problem, shared page tables seem
unlikely to find their way into the 2.5 kernel.
(
Log in to post comments)