/dev/ksm: dynamic memory sharing
Copy-on-write with fork() works because the kernel knows that each process expects to find the same contents in those pages. When the kernel lacks that knowledge, though, it will generally be unable to arrange sharing of identical pages. One might not think that this would ordinarily be a problem, but the KVM developers have come up with a couple of situations where this kind of sharing opportunity might come about. Your editor cannot resist this case proposed by Avi Kivity:
Beyond such typical systems, though, consider the case of a host running a
number of virtualized guests. Those guests will not share a process-tree
relationship which makes the sharing of pages between them easy, but they
may well be using a substantial portion of their memory to hold identical
contents. If that host could find a way to force the sharing of pages with
identical contents, it should be able to make much better use of its memory
and, as a result, run more guests.
This is the kind of thing which gets the attention of virtualization
developers. So the hackers at Qumranet Red Hat (Izik
Eidus, Andrea Arcanageli, and Chris Wright in particular) have put
together a mechanism to make that kind of sharing happen. The resulting
code, called KSM, was recently posted for wider review.
KSM takes the form of a device driver for a single, virtual device: /dev/ksm. A process which wants to take part in the page sharing regime can open that device and register (with an ioctl() call) a portion of its address space with the KSM driver. Once the page sharing mechanism is turned on (via another ioctl()), the kernel will start looking for pages to share.
The algorithm is relatively simple. The KSM driver, inside a kernel thread, picks one of the memory regions registered with it and start scanning over it. For each page which is resident in memory, KSM will generate an SHA1 hash of the page's contents. That hash will then be used to look up other pages with the same hash value. If a subsequent memcmp() call shows that the contents of the pages are truly identical, all processes with a reference to the scanned page will be pointed (in COW mode) to the other one, and the redundant page will be returned to the system. As long as nobody modifies the page, the sharing can continue; once a write operation happens, the page will be copied and the sharing will end.
The kernel thread will scan up to a maximum number of pages before going to sleep for a while. Both the number of pages to scan and the sleep period are passed in as parameters to the ioctl() call which starts scanning. A user-space control process can also pause scanning via another ioctl() call.
The initial response to the patch from Andrew Morton was not entirely enthusiastic:
The answer from Avi Kivity was reasonably clear:
Izik Eidus adds that, with this patch, a host running a bunch of Windows guests is able to overcommit its memory 300% without terribly ill effects. This technique, it seems, is especially effective with Windows guests: Windows apparently zeroes all freed memory, so each guest's list of free pages can be coalesced down to a single, shared page full of zeroes.
What has not been done (or, at least, not posted) is any sort of
benchmarking of the impact KSM has on a running system. The scanning,
hashing, and comparing of pages will require some CPU time, and it is
likely to have noticeable cache effects as well. If you are trying to run
dozens of Windows guests, cache effects may well be relatively low on your
list of problems. But that cost may be sufficient to prevent the more
general use of KSM, even though systems which are not using virtualization
at all may still have a lot of pages with identical contents.
| Index entries for this article | |
|---|---|
| Kernel | Memory management/Kernel samepage merging |
