Unmapped page cache control

By Jonathan Corbet
December 13, 2010

Virtualization places some interesting demands on the host system, many of which are related to memory management. When two agents within the system both believe that they are in charge of memory, interesting conflicts are bound to arise. A recent patch from Balbir Singh shows the kind of effort which is being made to address these conflicts, but it also gives a hint at how a more ambitious effort might really solve the problem.

The Linux page cache keeps copies of pages from on-disk files in main memory in the hopes of avoiding I/O operations when those pages are accessed. At any given time, the page cache can easily account for more than half of the system's total memory usage. The actual size of the page cache varies over time; as other types of memory use (kernel memory and anonymous pages) grow, the page cache shrinks to make room. Balancing the demands of the page cache with other memory users can be a challenge, but Linux seems to get it close to right most of the time.

Balbir's patch is intended to give the system administrator a bit more control over page cache usage; to that end, it provides a new boot-time parameter (unmapped_page_control) which sets an upper bound on the number of unmapped pages in the cache. "Unmapped" pages are those which are not mapped into any process's address space - they do not appear in any page tables. Unmapped pages arguably have a lower chance of being needed in the near future; they are also a bit easier for the system to get rid of. This patch thus gives the system administrator a relatively easy way to minimize page cache memory usage.

The obvious question is: why? Page cache pages will be reclaimed anyway if the system has other needs for the memory, so there would seem to be little point in shrinking it prematurely. The problem, it seems, is virtualization. When a process on a virtualized system reads a page from a file, the guest operating system will dutifully store a copy of that page in its page cache. The actual read operation, though, will be passed through to (and executed by) the host, which will also store a copy in its page cache. So the page gets cached twice - perhaps even more times if it is used by multiple virtual machines. Caching a page can be a good thing, but caching multiple copies is likely to be too much of a good thing.

So what Balbir's patch is doing can be explained this way: it is forcibly cleaning copies of pages out of guest page caches to minimize duplicate copies. The memory freed in this way can be captured by a balloon driver and returned to the host, making it available for more productive use elsewhere in the system.

This technique should clearly improve the situation. Less duplication is good, and, if the guest ends up needing some of the freed pages, those pages stand a good chance of being found in the host's page cache. But one can't help but wonder if it might not be an overly indirect approach. Rather than forcibly reclaim pages from the guests' caches, might it be better to have all of the systems share the same page cache? A single, unified page cache could be managed to maximize the performance of the system as a whole; that should yield better results than managing a number of seemingly independent page caches.

Virtualization based on containers has exactly this type of unified page cache since all of the containers are running on the same kernel. That may be one of the reasons why containers are seen to perform better than fully virtualized systems. Bringing a shared page cache to the virtualized world would be a bit of a challenge, though, which probably has a lot to do with why it has not already been done.

To begin with, there would be some clear security issues. A virtualized system should be unable to access any resources which have not been explicitly provided to it. Any sort of shared page cache would have to be designed in a way which would leave the host in control of which pages are visible to each guest. In practice, that would probably mean using the virtualized block drivers which make filesystems available to virtualized guests now. Rather than "read" a page into a page controlled by the guest, the driver could somehow just map the host's copy of the page into the guest's address space.

Making that work correctly would require the addition of a significant new, Linux-only API between the host and the guest. It would be hard to do it in a way which maintained any sort of illusion that the guest is running on hardware of its own. Such a scheme would complicate memory management in the guest - hardware is increasingly dynamic, but individual pages of memory still do not come and go spontaneously. A shared page cache would also frustrate attempts to use huge pages for guest memory.

In other words, the difficulties of sharing the page cache between hosts and guests look to be decidedly nontrivial. It is not surprising that we are still living in a world where scarce memory pages can be soaked up with duplicate copies of data. As long as that situation holds, there will be a place for patches which cause guests to behave in ways which are more friendly to the system as a whole.

Index entries for this article
Kernel	Memory management/Virtualization
Kernel	Virtualization

Typo

Posted Dec 16, 2010 14:51 UTC (Thu) by Halmonster (guest, #4537) [Link] (8 responses)

Typo in the article. The patch author's name is Balbir, not Bilbir.

Typo

Posted Dec 16, 2010 15:00 UTC (Thu) by corbet (editor, #1) [Link]

Given that the name was correctly spelled two times out of three, one assumes we know the correct spelling. The mistake has been fixed.

As noted right above the comment box, we do prefer to receive these reports as email.

Typo

Posted Dec 19, 2010 4:47 UTC (Sun) by quotemstr (subscriber, #45331) [Link] (6 responses)

Doesn't having the host use uncached IO to service virtualized guest IO eliminate the duplicate caching?

Obvious Answer?

Posted Dec 19, 2010 6:42 UTC (Sun) by Fowl (subscriber, #65667) [Link] (1 responses)

Or just the disabling the cache on the guests? Thereby allowing them to all share the host's.

What am I missing?

Obvious Answer?

Posted Dec 19, 2010 7:07 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

It's easier and less invasive to configure a program to use unbuffered IO for a particular program's access to a particular file (a virtual disk in this case) than to disable buffering for a whole operating system, which might be actually be impossible.

Typo

Posted Dec 20, 2010 11:39 UTC (Mon) by joern (guest, #22392) [Link]

> Doesn't having the host use uncached IO to service virtualized guest IO eliminate the duplicate caching?

It does - if only a single guest is caching the pages in question.

Interesting

Posted Dec 20, 2010 14:32 UTC (Mon) by rilder (guest, #59804) [Link] (2 responses)

I don't think that is possible -- to be able to differentiate requests from the host and the guest to serve uncached to guest and cached to host -- if this is possible then it is great.

Interesting

Posted Dec 20, 2010 14:39 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (1 responses)

It's not only possible, but trivial: IO from guests goes through the hypervisor, which can use simple uncached file IO (e.g., O_DIRECT) to service these requests. Host IO goes through the normal path and uses normal buffering.

Interesting

Posted Jan 3, 2011 5:17 UTC (Mon) by balbir_singh (guest, #34142) [Link]

Yes, it is possible, but there have been reports of large overheads in doing so. Please see the email from Chris at http://www.mail-archive.com/kvm@vger.kernel.org/msg30821.... for throughput issues.

KSM

Posted Dec 16, 2010 21:45 UTC (Thu) by alonz (subscriber, #815) [Link] (1 responses)

Isn't this doing (almost) the same as KSM?

(Except that KSM solves the problem much more directly, no?...)

KSM

Posted Dec 17, 2010 11:00 UTC (Fri) by balbir_singh (guest, #34142) [Link]

KSM today deduplicates pages between guests and not between guest and host. One of the future TODOs is to look at KSM as well, but dealing with anonymous pages as KSM does today is very different from dealing with anonymous (guest cache as seen by the host) and page cache as seen by the host is not a trivial task.

Unmapped page cache control

Posted Jan 4, 2011 21:38 UTC (Tue) by Russ.Dill@gmail.com (guest, #52805) [Link]

It seems like this could be much easier, at least for the read-only page case and work similarly to the page merging stuff. If the IO driver does DMA then rather then actually copying data from the virtual hardware into the page the host's page containing that data and the guests page that should contain the data could get merged. If either writes to the page, it splits.