Memory management for virtualization
The transparent huge pages patch set was discussed here back in October. This patch seeks to change how huge pages are used by Linux applications. Most current huge page users must be set up explicitly to use huge pages, which, in turn, must be set aside by the system administrator ahead of time; see the recent series by Mel Gorman for more information on how this is done. The "some assembly required" nature of huge pages limits their use in many situations.
The transparent huge page patch, instead, works to provide huge pages to applications without those applications even being aware that such pages exist. When large pages are available, applications may have their scattered pages joined together into huge pages automatically; those pages can also be split back apart when the need arises. When the system operates in this mode, huge pages can be used in many more situations without the need for application or administrator awareness. This feature turns out to be especially beneficial when running virtualized guests; huge pages map well to how guests tend to see and use their address spaces.
The transparent huge page patches have been working their way toward acceptance, though it should be noted that some developers still have complaints about this work. Andrew Morton recently pointed out a different problem with this patch set:
It didn't take long for Linus to join the conversation directly; after a couple of digressions into areas not directly related to the benefits of the transparent huge pages patch, he realized that this work was motivated by the needs of virtualization. At that point, he lost interest:
He went on to compare the transparent huge
page work to high memory, which, in turn, he called "a
failure
". The right solution in both cases, he says, is to get a
better CPU.
It should be pointed out that high memory was a spectacularly successful failure, extending the useful life of 32-bit systems for some years. It still shows up in surprising places - you editor's phone is running a high-memory-enabled kernel. So calling high memory a failure is something like calling the floppy driver a failure; it may see little use now, but there was a time when we were glad we had it.
Perhaps, someday, advances in processor architecture will make transparent huge pages unnecessary as well. But, while the alternative to high memory (64-bit processors) has been in view for a long time, it's not at all clear what sort of processor advance might make transparent huge pages irrelevant. So, should this code get into the kernel, it may well become one of those failures which is heavily used for many years.
A related topic under discussion was the recently-posted VMware balloon driver patch. A balloon driver has an interesting task; its job is to "inflate" within a guest system, taking up memory and making it unavailable for processes running within the guest. The pages absorbed by the balloon can then be released back to the host system which, presumably, has a more pressing need for them elsewhere. Letting "air" out of the balloon makes memory available to the guest once again.
The purpose of this driver, clearly, is to allow the host to dynamically balance the memory needs of its guest systems. It's a bit of a blunt instrument, but it's the best we have. But Andrew Morton questioned the need for a separate memory control mechanism. The kernel already has a function, called shrink_all_memory(), which can be used to force the release of memory. This function is currently used for hibernation, but Andrew suspects that it could be adapted to the needs of virtualization as well.
Whether that is really true remains to be seen; it seems that the bulk of the complexity lies not with the freeing of memory but in the communication between the guest and the hypervisor. Beyond that, the longer-term solution is likely to be something more sophisticated than simply applying memory pressure and watching the guest squirm until it releases enough pages. As Dan Magenheimer put it:
His answer to this problem is the transcendent memory patch, which allows the operating system to designate memory which is available for the taking should the need arise, but which can contain useful data in the mean time.
This is clearly an area that needs further work. The whole point of
virtualization is to isolate guests from each other, but a more cooperative
approach to memory requires that these guests, somehow, be aware of the
level of contention for resources like memory and respond accordingly.
Like high memory and transparent huge pages, balloon drivers may eventually
be consigned to the pile of failed technologies. Until something better
comes along, though, we'll still need them.
Index entries for this article | |
---|---|
Kernel | Huge pages |
Kernel | Memory management/Virtualization |
Kernel | Virtualization |
Posted Apr 8, 2010 2:42 UTC (Thu)
by Thalience (subscriber, #4217)
[Link]
Considering how important this can be for some workloads, this has always puzzled me. After all, they'll cheerfully tout the amount of L2 cache and if you care to look around, you can find details such as the number of ALUs, FPUs and vector units.
Posted Apr 8, 2010 8:07 UTC (Thu)
by liljencrantz (guest, #28458)
[Link] (2 responses)
Posted Apr 8, 2010 13:32 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (1 responses)
The author could say, however, that the use of "or" implies that the "not" does not apply to "nontrivial" and that the original is correct.
But, IANAET (I am not an English teacher), so what do I know.
Posted Apr 8, 2010 16:46 UTC (Thu)
by mebrown (subscriber, #7960)
[Link]
Posted Apr 8, 2010 18:49 UTC (Thu)
by anton (subscriber, #25547)
[Link] (3 responses)
Larger pages are also helpful for non-virtualized applications that
perform lots of memory accesses with low spatial locality with a
large-enough footprint. An extreme case would be walking through
memory with a 4160 byte stride: Every step would consume a TLB entry
and a cache line entry; Once you have run out of TLB entries (on AMD
K10: 48 L1 TLB entries, 512 L2 TLB entries, 1024 L1 cache lines, 8192
L2 cache lines), you can start over, and you will have a workload that
hits the cache and misses the TLB all the time.
Posted Apr 8, 2010 21:31 UTC (Thu)
by avik (guest, #704)
[Link] (2 responses)
Posted Apr 9, 2010 10:36 UTC (Fri)
by anton (subscriber, #25547)
[Link] (1 responses)
Posted Apr 9, 2010 10:43 UTC (Fri)
by avik (guest, #704)
[Link]
So, nested paging is overall much better than shadow paging (but worse in some aspects), large pages bridge the gap and make nested paging better overall.
Posted Apr 9, 2010 2:19 UTC (Fri)
by vapier (guest, #15768)
[Link]
Posted Apr 11, 2010 11:17 UTC (Sun)
by trekker.dk (guest, #65149)
[Link]
Posted Apr 15, 2010 1:11 UTC (Thu)
by dcoutts (guest, #5387)
[Link]
As I understand it, POWER cpus do have a range of sensible sizes as well as the massive pages. AIX (I think) support the smaller ones transparently and the big ones explicitly. Seems quite reasonable.
Posted Apr 15, 2010 19:59 UTC (Thu)
by rilder (guest, #59804)
[Link]
Posted Apr 18, 2010 11:48 UTC (Sun)
by dafu (guest, #42913)
[Link]
In my opinion, YES!
Memory management for virtualization
> That said, making virtualized systems perform well is not a small or nontrivial problem.
Memory management for virtualization
«is not a ... nontrivial problem» is a double negation and is unlikely to be what was intended. This is, of course, nit picking, bit I enjoy the editors writing style and feel that it deserves correct editing.
Memory management for virtualization
Memory management for virtualization
My guess is that the "better CPU" features that Linus refers to is
stuff like the AMD K10's nested paging ("Rapid Virtualization
Indexing"), and maybe also having a large enough (L2) TLB (although
the latter does not just help virtualization).
Memory management for virtualization
Memory management for virtualization
Nested paging makes things worse than what? If it's worse than
whatever they do in virtualization without them, why were they
introduced?
Memory management for virtualization
Memory management for virtualization
still rockin' highmem today
Huge pages and KSM
Memory management for virtualization
Memory management for virtualization
Memory management for virtualization