Improving shared memory performance

[Posted August 31, 2005 by corbet]

When a process forks, the kernel must copy that process's memory space for the new child. Linux has long avoided copying the memory itself; anything which cannot be shared is simply marked "copy on write" and left in place until one process or the other does something to force a particular page to be copied. The kernel does copy the process's page tables, however. If the parent process has a large address space, that copy can take a long time.

Recently, Ray Fucillo noted that the amount of time required to create a new process increased notably with the size of any shared memory segments that process was using. After some discussion, Nick Piggin came up with a quick fix: don't bother copying page tables in cases where the kernel will be able to reconstruct them at page fault time anyway. This small patch takes away the fork() penalty for large shared mappings. In many cases, it will make fork() more efficient in general; if the child process never uses those parts of its address space (if it simply uses exec() to run another program, say), the setup and teardown overhead can be avoided altogether. On the other hand, if the child process does use those mappings, a higher cost will be paid overall. Rebuilding page tables one-by-one in response to faults is more expensive than simply copying them in bulk at fork() time. The consensus seems to be that the tradeoff is worthwhile, however, and this patch has been merged for 2.6.14. If any serious performance regressions result, they will hopefully be found before 2.6.14 is released.

One might well ask, however: why bother copying page tables for shared mappings at all? Since the mappings are shared, the associated page tables might as well be too. Sharing page tables would cut down on fork() overhead, save the memory used to store multiple copies of the tables, improve translation buffer performance, and reduce the number of page faults handled by the kernel. To this end, Dave McCracken has posted a new shared page table patch. This patch is simpler than previous versions in that it does not attempt to perform copy-on-write sharing of private mappings; instead, it restricts itself to mappings which are, themselves, shared. Since most processes have a few of these (consider shared libraries, for example), even the smaller patch can achieve a fair amount of sharing.

For the most part, sharing of page tables is straightforward; the kernel need only avoid copying them and point a new process's page directories to the shared tables. The one problem which does come up is reference counting. When each process has its own page tables, it is easy to know when those tables are no longer used. When a page table can be used by more than one process, however, the kernel needs a way to keep track of how many users each table has. The shared page table patch addresses this by using the _mapcount field in the page structure describing the page table page itself.

[Yes, page tables can already be shared by threads which share an entire address space. In that case, however, the kernel can track usage by looking at references to the full address space, rather than to individual portions of it.]

Not everybody is convinced that shared page tables are a good idea. The added complexity may not be justified by the resulting performance gains. Dave claims a 3% improvement on an unnamed "industry standard database benchmark," which is significant. There is also a fundamental conflict between shared page tables and address space randomization. For page tables to be shared, the corresponding mappings must be at the same virtual address in every process, but randomization explicitly breaks that assumption. Dave apparently has ideas for making the patch work in the presence of randomization (if the alignment of the mappings works out), but, for now, the two features are incompatible.

It has also been asked: do shared page tables still yield a performance benefit when Nick's deferred page table copying patch is taken into account? The answer would appear to be "yes." The deferred copying patch is entirely aimed at shortening the process creation time. Shared page tables should also help in that regard, but, unlike the copying patch (which may hurt ongoing performance slightly until the page tables are populated), shared page tables speed things up throughout the life of the process. So there may well be room in the kernel for both patches.

Index entries for this article
Kernel	Memory management