Improving shared memory performance
[Posted August 31, 2005 by corbet]
When a process forks, the kernel must copy that process's memory space for
the new child. Linux has long avoided copying the memory itself; anything
which cannot be shared is simply marked "copy on write" and left in place
until one process or the other does something to force a particular page to
be copied. The kernel
does copy the process's page tables,
however. If the parent process has a large address space, that copy can
take a long time.
Recently, Ray Fucillo noted that the amount
of time required to create a new process increased notably with the size of
any shared memory segments that process was using. After some discussion,
Nick Piggin came up with a quick fix: don't
bother copying page tables in cases where the kernel will be able to
reconstruct them at page fault time anyway. This small patch takes away
the fork() penalty for large shared mappings. In many cases, it
will make fork() more efficient in general; if the child process
never uses those parts of its address space (if it simply uses
exec() to run another program, say), the setup and teardown
overhead can be avoided altogether. On the other hand, if the child
process does use those mappings, a higher cost will be paid
overall. Rebuilding page tables one-by-one in response to faults is more
expensive than simply copying them in bulk at fork() time. The
consensus seems to be that the tradeoff is worthwhile, however, and this
patch has been merged for 2.6.14. If any serious performance regressions
result, they will hopefully be found before 2.6.14 is released.
One might well ask, however: why bother copying page tables for shared
mappings at all? Since the mappings are shared, the associated page tables
might as well be too. Sharing page tables would cut down on
fork() overhead, save the memory used to store multiple copies of
the tables, improve translation buffer performance, and reduce the number
of page faults handled by the kernel. To this end, Dave
McCracken has posted a new shared
page table patch. This patch is simpler than previous versions in that
it does not attempt
to perform copy-on-write sharing of private mappings; instead, it restricts
itself to mappings which are, themselves, shared. Since most processes
have a few of these (consider shared libraries, for example), even the
smaller patch can achieve a fair amount of sharing.
For the most part, sharing of page tables is straightforward; the kernel
need only avoid copying them and point a new process's page directories to
the shared tables. The one problem which does come up is reference
counting. When each process has its own page tables, it is easy to know
when those tables are no longer used. When a page table can be used by
more than one process, however, the kernel needs a way to keep track of how
many users each table has. The shared page table patch addresses this by
using the _mapcount field in the page structure
describing the page table page itself.
[Yes, page tables can already be shared by threads which share an entire
address space. In that case, however, the kernel can track usage by
looking at references to the full address space, rather than to individual
portions of it.]
Not everybody is convinced that shared page tables are a good idea. The
added complexity may not be justified by the resulting performance gains.
Dave claims a 3% improvement on an unnamed "industry standard database
benchmark," which is significant. There is also a fundamental conflict between
shared page tables and address space
randomization. For page tables to be shared, the corresponding
mappings must be at the same virtual address in every process, but
randomization explicitly breaks that assumption. Dave apparently has ideas
for making the patch work in the presence of randomization (if the
alignment of the mappings works out), but, for now, the two features are
incompatible.
It has also been asked: do shared page tables still yield a performance
benefit when Nick's deferred page table copying patch is taken into
account? The answer would appear to be "yes." The deferred copying patch
is entirely aimed at shortening the process creation time. Shared page
tables should also help in that regard, but, unlike the copying patch
(which may hurt ongoing performance slightly until the page tables are
populated), shared page tables speed things up throughout the life of the
process. So there may well be room in the kernel for both patches.
(
Log in to post comments)