Memory management for virtualization

By Jonathan Corbet
April 7, 2010

For some time now, your editor has asserted that, at the kernel level, the virtualization problem is mostly solved. Much of the remaining work is in the performance area. That said, making virtualized systems perform well is not a small or trivial problem. One of the most interesting aspects of this problem is in the interaction between virtualized guests and host memory management. A couple of patch sets under discussion illustrate where the work in this area is being done.

The transparent huge pages patch set was discussed here back in October. This patch seeks to change how huge pages are used by Linux applications. Most current huge page users must be set up explicitly to use huge pages, which, in turn, must be set aside by the system administrator ahead of time; see the recent series by Mel Gorman for more information on how this is done. The "some assembly required" nature of huge pages limits their use in many situations.

The transparent huge page patch, instead, works to provide huge pages to applications without those applications even being aware that such pages exist. When large pages are available, applications may have their scattered pages joined together into huge pages automatically; those pages can also be split back apart when the need arises. When the system operates in this mode, huge pages can be used in many more situations without the need for application or administrator awareness. This feature turns out to be especially beneficial when running virtualized guests; huge pages map well to how guests tend to see and use their address spaces.

The transparent huge page patches have been working their way toward acceptance, though it should be noted that some developers still have complaints about this work. Andrew Morton recently pointed out a different problem with this patch set:

It appears that these patches have only been sent to linux-mm. Linus doesn't read linux-mm and has never seen them. I do think we should get things squared away with him regarding the overall intent and implementation approach before trying to go further... [T]his is a *large* patchset, and it plays in an area where Linus is known to have, err, opinions.

It didn't take long for Linus to join the conversation directly; after a couple of digressions into areas not directly related to the benefits of the transparent huge pages patch, he realized that this work was motivated by the needs of virtualization. At that point, he lost interest:

So I thought it was a more interesting load than it was. The virtualization "TLB miss is expensive" load I can't find it in myself to care about. "Get a better CPU" is my answer to that one.

He went on to compare the transparent huge page work to high memory, which, in turn, he called "a failure". The right solution in both cases, he says, is to get a better CPU.

It should be pointed out that high memory was a spectacularly successful failure, extending the useful life of 32-bit systems for some years. It still shows up in surprising places - you editor's phone is running a high-memory-enabled kernel. So calling high memory a failure is something like calling the floppy driver a failure; it may see little use now, but there was a time when we were glad we had it.

Perhaps, someday, advances in processor architecture will make transparent huge pages unnecessary as well. But, while the alternative to high memory (64-bit processors) has been in view for a long time, it's not at all clear what sort of processor advance might make transparent huge pages irrelevant. So, should this code get into the kernel, it may well become one of those failures which is heavily used for many years.

A related topic under discussion was the recently-posted VMware balloon driver patch. A balloon driver has an interesting task; its job is to "inflate" within a guest system, taking up memory and making it unavailable for processes running within the guest. The pages absorbed by the balloon can then be released back to the host system which, presumably, has a more pressing need for them elsewhere. Letting "air" out of the balloon makes memory available to the guest once again.

The purpose of this driver, clearly, is to allow the host to dynamically balance the memory needs of its guest systems. It's a bit of a blunt instrument, but it's the best we have. But Andrew Morton questioned the need for a separate memory control mechanism. The kernel already has a function, called shrink_all_memory(), which can be used to force the release of memory. This function is currently used for hibernation, but Andrew suspects that it could be adapted to the needs of virtualization as well.

Whether that is really true remains to be seen; it seems that the bulk of the complexity lies not with the freeing of memory but in the communication between the guest and the hypervisor. Beyond that, the longer-term solution is likely to be something more sophisticated than simply applying memory pressure and watching the guest squirm until it releases enough pages. As Dan Magenheimer put it:

Historically, all OS's had a (relatively) fixed amount of memory and, since it was fixed in size, there was no sense wasting any of it. In a virtualized world, OS's should be trained to be much more flexible as one virtual machine's "waste" could/should be another virtual machine's "want".

His answer to this problem is the transcendent memory patch, which allows the operating system to designate memory which is available for the taking should the need arise, but which can contain useful data in the mean time.

This is clearly an area that needs further work. The whole point of virtualization is to isolate guests from each other, but a more cooperative approach to memory requires that these guests, somehow, be aware of the level of contention for resources like memory and respond accordingly. Like high memory and transparent huge pages, balloon drivers may eventually be consigned to the pile of failed technologies. Until something better comes along, though, we'll still need them.

Index entries for this article
Kernel	Huge pages
Kernel	Memory management/Virtualization
Kernel	Virtualization

Memory management for virtualization

Posted Apr 8, 2010 2:42 UTC (Thu) by Thalience (subscriber, #4217) [Link]

Indeed, CPU vendors seem decidedly uninterested in telling you how many TLB entries a particular CPU (and/or northbridge) sports.

Considering how important this can be for some workloads, this has always puzzled me. After all, they'll cheerfully tout the amount of L2 cache and if you care to look around, you can find details such as the number of ALUs, FPUs and vector units.

Memory management for virtualization

Posted Apr 8, 2010 8:07 UTC (Thu) by liljencrantz (guest, #28458) [Link] (2 responses)

> That said, making virtualized systems perform well is not a small or nontrivial problem.
«is not a ... nontrivial problem» is a double negation and is unlikely to be what was intended. This is, of course, nit picking, bit I enjoy the editors writing style and feel that it deserves correct editing.

Memory management for virtualization

Posted Apr 8, 2010 13:32 UTC (Thu) by clugstj (subscriber, #4020) [Link] (1 responses)

Since we are picking nits, it is a bit confusingly worded. I would suggest that "not a small nor trivial problem". To me, at least, the "nor" assures that the "not" applies to both "small" and "trivial".

The author could say, however, that the use of "or" implies that the "not" does not apply to "nontrivial" and that the original is correct.

But, IANAET (I am not an English teacher), so what do I know.

Memory management for virtualization

Posted Apr 8, 2010 16:46 UTC (Thu) by mebrown (subscriber, #7960) [Link]

"neither a small nor trivial problem" would be the correct construction.

Memory management for virtualization

Posted Apr 8, 2010 18:49 UTC (Thu) by anton (subscriber, #25547) [Link] (3 responses)

My guess is that the "better CPU" features that Linus refers to is stuff like the AMD K10's nested paging ("Rapid Virtualization Indexing"), and maybe also having a large enough (L2) TLB (although the latter does not just help virtualization).

Larger pages are also helpful for non-virtualized applications that perform lots of memory accesses with low spatial locality with a large-enough footprint. An extreme case would be walking through memory with a 4160 byte stride: Every step would consume a TLB entry and a cache line entry; Once you have run out of TLB entries (on AMD K10: 48 L1 TLB entries, 512 L2 TLB entries, 1024 L1 cache lines, 8192 L2 cache lines), you can start over, and you will have a workload that hits the cache and misses the TLB all the time.

Memory management for virtualization

Posted Apr 8, 2010 21:31 UTC (Thu) by avik (guest, #704) [Link] (2 responses)

Nested paging actually makes things worse: the cache footprint of the page table doubles, and a tlb fill needs 24 memory accesses instead of 4. Large pages are a way to reduce the cache footprint and tlb fill cost to something tolerable.

Memory management for virtualization

Posted Apr 9, 2010 10:36 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

Nested paging makes things worse than what? If it's worse than whatever they do in virtualization without them, why were they introduced?

Memory management for virtualization

Posted Apr 9, 2010 10:43 UTC (Fri) by avik (guest, #704) [Link]

Nested paging has worse cache footprint and tlb fill latency than shadow paging. However shadow paging is a lot worse in terms of scalability and mmu operation costs, so nested paging is an overall win (by a large margin).

So, nested paging is overall much better than shadow paging (but worse in some aspects), large pages bridge the gap and make nested paging better overall.

still rockin' highmem today

Posted Apr 9, 2010 2:19 UTC (Fri) by vapier (guest, #15768) [Link]

i'm still happy with the highmem "failure" on servers that have no need to be replaced since the current performance is more than acceptable

Huge pages and KSM

Posted Apr 11, 2010 11:17 UTC (Sun) by trekker.dk (guest, #65149) [Link]

Are there any tests when transparent huge pages run together with KSM? I mean, my guess is that you can find less identical 2MB pages and thus save less memory compared to 4kB pages.

Memory management for virtualization

Posted Apr 15, 2010 1:11 UTC (Thu) by dcoutts (guest, #5387) [Link]

Isn't one reason we don't have transparent support for bigger pages that huge pages are just too huge? It's 4mb on x86 iirc. Why don't they support something like 64k, 512k as well as the really large ones? Surely that would make it much easier to use bigger pages transparently. Apps probably still need to give hints to be able to use the really really big pages.

As I understand it, POWER cpus do have a range of sensible sizes as well as the massive pages. AIX (I think) support the smaller ones transparently and the big ones explicitly. Seems quite reasonable.

Memory management for virtualization

Posted Apr 15, 2010 19:59 UTC (Thu) by rilder (guest, #59804) [Link]

FreeBSD(7.2 onwards) has support for large-pages in the name of super pages. Promotion of normal pages to large without awareness of applications in FreeBSD.

Memory management for virtualization

Posted Apr 18, 2010 11:48 UTC (Sun) by dafu (guest, #42913) [Link]

Linus raised a general question: do we need a software optimization in kernel if there is a (immature) upgrading in hardware?

In my opinion, YES!