LWN.net Logo

Fixing up the shared page table patch

One patch that is still apparently being considered for 2.5 is the shared page table code. Since this patch makes significant changes to the VM subsystem, it is worth looking at why it is interesting, and what its prospects are.

Shared page tables do exactly what one would expect: they allow processes to share their page tables. The primary application of this technique is at fork() time; when a process creates a new child, the two processes share the same low-level page tables. These tables are shared in a "copy on write" mode; when either process changes memory both the page being changed and the page table that point to it are copied. The idea is that if the new process calls exec() before changing much memory, much of the page table copying overhead can be avoided entirely.

Shared page tables can also save significant amounts of memory when large processes (or large shared memory segments) are involved, but the fork() overhead is the real driving force behind this patch. The 2.5 kernel has a significantly slower fork() than 2.4, as a result of the reverse mapping VM code. Copying page tables requires copying the reverse map entries, which slows fork() down. Shared page tables, it is hoped, can eliminate that copy and get fork() back to something close to its 2.4 performance.

So it was a little disappointing when Andrew Morton ran some benchmarks and discovered that shared page tables made fork() even slower than it was before. The optimization, it seems, is really a pessimization - at least when relatively small processes are involved, which is the case that matters to most users.

Dave McCracken figured out what is going on. Most smaller processes, it seems, have three distinct areas of writable memory, being the data area, the stack, and the C library's data area. On most systems, a single page table page holds enough page table entries to map 4MB of actual memory. Unless the process is fairly large, then, there will be exactly one page table page for each of the three writable areas, or three in all.

The shared page table patch thus allows the deferral of the copying of three pages worth of page table entries. As soon as either process changes the memory mapped by one of those page table pages, that page can no longer be shared and all page table entries within that page must be copied. Unfortunately, even a process which does nothing but call exec() will almost certainly write memory in all three areas, requiring the unsharing of all three page table pages.

In other words, the shared page table patch is introducing the extra overhead required to share and unshare page table pages, but, in most cases, all of those pages will have to be unshared and copied anyway. So the extra overhead just makes things even slower than they were before.

There are a couple of things that can be done to address this problem. Dave posted a relatively simple fix: simply do not share page tables unless the forking process has at least four pages worth. It turns out that, if even one page table page need not be copied, the sharing overhead is worthwhile. So, if you turn off sharing in the case where it doesn't help, you get back to where you were before, and can enjoy the benefits of page table sharing for very large processes.

A more involved approach would be to spread out a process's writable memory so that it is mapped by more than one page table page. Writable process memory comes in numerous distinct chunks; a look at the /proc/.../maps entry for the emacs process being used to write this article shows 33 separate, writable virtual memory areas (VMAs). If each VMA is mapped on its own 4MB boundary, and thus has its own page table page, then writing in one VMA does not require copying the page table entries for all the other VMAs.

Andrew Morton gave this approach a try, and saw a 5-10% speedup. Performance is improved, in other words, but is still far short of what a 2.4 kernel can do.

The bottom line appears to be this: the shared page table patch, while providing some benefits, is failing in its goal of mitigating the extra fork() overhead brought by the reverse mapping VM. Unless somebody finds a way to address this problem, shared page tables seem unlikely to find their way into the 2.5 kernel.


(Log in to post comments)

Emacs is tiny.

Posted Jan 2, 2003 5:47 UTC (Thu) by kbob (guest, #1770) [Link]

the emacs process being used to write this article shows 33 separate, writable virtual memory areas
And the mozilla process I'm using to reply shows 267 separate writable areas! Emacs has definitively lost the title of Unix's most bloated app.

It depends on how it does memory management

Posted Jan 2, 2003 15:57 UTC (Thu) by pflugstad (subscriber, #224) [Link]

See the section "Buried in VMA's" here:

http://lwn.net/2001/0809/kernel.php3

in particular read Chris Wedgewood's detailed analysis. Seems
that Mozilla alloc's memory in small chunks and GLIBC's handling
of this is rather weird, which leads to lots of these problems
and lots of VMAs.

I think some of the fixes for this have gone into the 2.5.X kernel,
but I don't know abouit 2.4, so maybe simply switching kernels or
updating your GLIBC may reduce the VMAs a lot.

Emacs is tiny.

Posted Jan 3, 2003 23:57 UTC (Fri) by mbligh (subscriber, #7720) [Link]

Doesn't really matter ... unless they're large enough to span multiple PTE-pages (ie 4Mb blobs on non-PAE).

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds