Unmapped page cache control
The Linux page cache keeps copies of pages from on-disk files in main memory in the hopes of avoiding I/O operations when those pages are accessed. At any given time, the page cache can easily account for more than half of the system's total memory usage. The actual size of the page cache varies over time; as other types of memory use (kernel memory and anonymous pages) grow, the page cache shrinks to make room. Balancing the demands of the page cache with other memory users can be a challenge, but Linux seems to get it close to right most of the time.
Balbir's patch is intended to give the system administrator a bit more control over page cache usage; to that end, it provides a new boot-time parameter (unmapped_page_control) which sets an upper bound on the number of unmapped pages in the cache. "Unmapped" pages are those which are not mapped into any process's address space - they do not appear in any page tables. Unmapped pages arguably have a lower chance of being needed in the near future; they are also a bit easier for the system to get rid of. This patch thus gives the system administrator a relatively easy way to minimize page cache memory usage.
The obvious question is: why? Page cache pages will be reclaimed anyway if the system has other needs for the memory, so there would seem to be little point in shrinking it prematurely. The problem, it seems, is virtualization. When a process on a virtualized system reads a page from a file, the guest operating system will dutifully store a copy of that page in its page cache. The actual read operation, though, will be passed through to (and executed by) the host, which will also store a copy in its page cache. So the page gets cached twice - perhaps even more times if it is used by multiple virtual machines. Caching a page can be a good thing, but caching multiple copies is likely to be too much of a good thing.
So what Balbir's patch is doing can be explained this way: it is forcibly cleaning copies of pages out of guest page caches to minimize duplicate copies. The memory freed in this way can be captured by a balloon driver and returned to the host, making it available for more productive use elsewhere in the system.
This technique should clearly improve the situation. Less duplication is good, and, if the guest ends up needing some of the freed pages, those pages stand a good chance of being found in the host's page cache. But one can't help but wonder if it might not be an overly indirect approach. Rather than forcibly reclaim pages from the guests' caches, might it be better to have all of the systems share the same page cache? A single, unified page cache could be managed to maximize the performance of the system as a whole; that should yield better results than managing a number of seemingly independent page caches.
Virtualization based on containers has exactly this type of unified page cache since all of the containers are running on the same kernel. That may be one of the reasons why containers are seen to perform better than fully virtualized systems. Bringing a shared page cache to the virtualized world would be a bit of a challenge, though, which probably has a lot to do with why it has not already been done.
To begin with, there would be some clear security issues. A virtualized system should be unable to access any resources which have not been explicitly provided to it. Any sort of shared page cache would have to be designed in a way which would leave the host in control of which pages are visible to each guest. In practice, that would probably mean using the virtualized block drivers which make filesystems available to virtualized guests now. Rather than "read" a page into a page controlled by the guest, the driver could somehow just map the host's copy of the page into the guest's address space.
Making that work correctly would require the addition of a significant new, Linux-only API between the host and the guest. It would be hard to do it in a way which maintained any sort of illusion that the guest is running on hardware of its own. Such a scheme would complicate memory management in the guest - hardware is increasingly dynamic, but individual pages of memory still do not come and go spontaneously. A shared page cache would also frustrate attempts to use huge pages for guest memory.
In other words, the difficulties of sharing the page cache between hosts
and guests look to be decidedly nontrivial. It is not surprising that we
are still living in a world where scarce memory pages can be soaked up with
duplicate copies of data. As long as that situation holds, there will be a
place for patches which cause guests to behave in ways which are more
friendly to the system as a whole.
Index entries for this article | |
---|---|
Kernel | Memory management/Virtualization |
Kernel | Virtualization |
Posted Dec 16, 2010 14:51 UTC (Thu)
by Halmonster (guest, #4537)
[Link] (8 responses)
Posted Dec 16, 2010 15:00 UTC (Thu)
by corbet (editor, #1)
[Link]
As noted right above the comment box, we do prefer to receive these reports as email.
Posted Dec 19, 2010 4:47 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link] (6 responses)
Posted Dec 19, 2010 6:42 UTC (Sun)
by Fowl (subscriber, #65667)
[Link] (1 responses)
What am I missing?
Posted Dec 19, 2010 7:07 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link]
Posted Dec 20, 2010 11:39 UTC (Mon)
by joern (guest, #22392)
[Link]
It does - if only a single guest is caching the pages in question.
Posted Dec 20, 2010 14:32 UTC (Mon)
by rilder (guest, #59804)
[Link] (2 responses)
Posted Dec 20, 2010 14:39 UTC (Mon)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Posted Jan 3, 2011 5:17 UTC (Mon)
by balbir_singh (guest, #34142)
[Link]
Posted Dec 16, 2010 21:45 UTC (Thu)
by alonz (subscriber, #815)
[Link] (1 responses)
(Except that KSM solves the problem much more directly, no?...)
Posted Dec 17, 2010 11:00 UTC (Fri)
by balbir_singh (guest, #34142)
[Link]
Posted Jan 4, 2011 21:38 UTC (Tue)
by Russ.Dill@gmail.com (guest, #52805)
[Link]
Typo
Given that the name was correctly spelled two times out of three, one assumes we know the correct spelling. The mistake has been fixed.
Typo
Typo
Obvious Answer?
Obvious Answer?
Typo
Interesting
Interesting
Interesting
Isn't this doing (almost) the same as KSM?
KSM
KSM
Unmapped page cache control