LWN.net Logo

2003 Kernel Summit: Transparent superpages and page clustering

This article is part of LWN's 2003 Kernel Developers' Summit coverage.
A couple of Kernel Summit sessions were devoted to the problem of making larger pages from smaller ones. The first, led by Dave Mosberger, was on the concept of transparent superpages. "Transparent" means that the application is not involved; the idea behind transparent superpages is that a process can be working with large pages (and the performance benefits they can bring) without needing to do anything to bring that about.

The main goal behind superpages is to improve use of the processor's translation lookup buffer (TLB). On modern systems where small pages (i.e. the 4K pages used by Linux on most architectures) are in use, the TLB might not cover even enough memory to fill the cache. The "hugetlb" feature (already in 2.6) can make things better for specific applications, but they do not really solve the problem. Hugetlb pages are not transparent (the application must set them up explicitly) and they are a scarce resource. If the application does not nail down its huge pages soon after boot, system memory is likely to fragment to the point that no such pages are available.

The transparent superpage scheme works in a different way. When a process requests a page of memory, the kernel allocates a larger, superpage frame, but only maps the small page needed by the process. If the TLB starts to fill up, the kernel can automatically "promote" the pages in that frame to a superpage. If, instead, the system is suffering from memory pressure, the superpage can be demoted, and the component pages swapped out. This scheme requires some extra housekeeping information (to track promotion and demotion states), and it requires the system to allocate most memory in superpage-sized chunks to avoid fragmentation. For these reasons, the maximum size of superpages is limited to 64KB or so - much smaller than what can be achieved with hugetlb.

A fair amount of transparent superpage work has already been done. J. Navarro at Rice University has a FreeBSD implementation. There is a Linux implementation by Naohiko Shimizu, but it only works for anonymous memory (that which is not backed up by a file somewhere). William Irwin and Hubertus Franke are doing some work at IBM, and Lucy Chubb at UNSW is also working on the problem.

William Irwin got up to talk about some of the implementation details. These include a "speculative reservation" mechanism to allow a process to tentatively grab a superpage; the reservation can be broken in some situations. Page replacement becomes a hard problem once memory gets fragmented. The page table API would need to be enhanced to be able to work with superpage concepts. There is also the need for some sort of page scanning algorithm - and/or a fancy tree data structure - to manage promotion of pages.

At this point, Linus broke in to say that he is not entirely thrilled with the superpage concept. He would rather see the entire system switch over to larger pages, with "sub-pages" used when needed. This approach may seem similar, but a larger page size has its own advantages - in particular, a reduction in the size of the system memory map. Shrinking the memory map is especially helpful for 32-bit systems, which are increasingly constrained by the amount of low memory available. Larger pages would thus be more helpful to the x86 architecture, which is the one Linus really cares about.

There was a discussion of how big pages could really get with such a scheme; 16K was seen as the limit. The page size could, however, become a configuration option, or even a decision made at boot time.

William Irwin then talked about page clustering - his patch was briefly covered here last February. Page clustering differs from superpages in that the operating system creates larger pages in software by logically grouping the system's smaller physical pages. Page clustering is mainly intended to shrink the memory map and other system data structures.

William talked briefly about the changes forced by page clustering; they mostly have to do with code which makes assumptions about what PAGE_SIZE means. He has a patch which works and passes "light functional tests," but it still does not perform all that well. William apparently knows how to fix most of the problems and will be doing so in the near future.

(This article has been updated to fix a couple of misspelled names).


(Log in to post comments)

Bill Irvine?

Posted Jul 23, 2003 5:02 UTC (Wed) by error27 (subscriber, #8346) [Link]

My guess is that "Bill Irvine" is really "William Irwin" but pronounced with an accent.

Bill Irvine?

Posted Jul 23, 2003 12:38 UTC (Wed) by corbet (editor, #1) [Link]

...except that I took the name directly from the slide. William Irwin was in the room, of course, and didn't take exception. So I think the name is right...

Bill Irvine?

Posted Jul 23, 2003 21:06 UTC (Wed) by StevenCole (guest, #3068) [Link]

It looks like some one added an extra 'c' to the name of Hubertus Franke. Googling for [Bill William] Irvine IBM would seem to indicate that wli the 3rd is the one indicated, unless the real Mr. Irvine keeps a very low Google profile.

Bill Irvine?

Posted Jul 24, 2003 14:57 UTC (Thu) by mbligh (subscriber, #7720) [Link]

Nah, David M just mis-spelled it. It's just Bill Irwin in a cunning disguise

2003 Kernel Summit: Transparent superpages and page clustering

Posted Jul 24, 2003 14:59 UTC (Thu) by mbligh (subscriber, #7720) [Link]

Note that the "subpages" that Linus is asking for *is* page clustering, just by a different name ....

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds