User: Password:
Subscribe / Log in / New account

KHB: Transparent support for large pages

KHB: Transparent support for large pages

Posted Jun 22, 2006 3:08 UTC (Thu) by ianw (subscriber, #20143)
Parent article: KHB: Transparent support for large pages

There is a lot of current work on superpages.

The current Linux approach to large pages is HugeTLB. This is a static approach, and not transparent. People are working on this with things like libhugetlbfs [1], and I have heard rumors of dynamic, per process hugetlb.

The other approach, as is mentioned, is one that is transparent. Naohiko Shimizu came up with an approach [2] which was implemented on Alpha, SPARC and i386. This showed good results, but the patch never ended up going anywhere.

Gelato@UNSW is activley working on large page support for Itanium Linux [3], using a Shimizu inspired approach. Itanium has excellent support for multiple page sizes, and with suitable modifications can use a hardware walker to re-fill the TLB with superpages with very little OS intervention. The project is currently in the hands of a master's student, but even with a hacked together proof of concept we can see great potential [4].

Clearly, as identified, fragmentation is an issue with larger pages. We are keeping an eye on the above mentioned projects, and others such as Chris Yeoh's work on fragmentation avoidance [5].

For Itanium, we believe we could get a working superpage implementation with very few overall lines of code difference, as mentioned. There is some doubt about how generic this could be; the Rice paper was implemented as a FreeBSD module using hooks into the VM layer; not an easy proposition with Linux.

Dynamic, transparent superpages are really not suited to the multi-level tree design as used by the Linux VM. There are a range of more suitable page table designs that incorporate support for large, spare address spaces and superpages. To this end, Gelato@UNSW are working on a page table abstraction interface [6]. One of the most promising approaches is a guarded page table [7], which we are actively developing behind our interface. Our long term goal is to marry a guarded page table with dynamic superpages.

If others are working in this area, please contact us at


(Log in to post comments)

KHB: Transparent support for large pages

Posted Jun 22, 2006 16:06 UTC (Thu) by dododge (subscriber, #2870) [Link]

The current Linux approach to large pages is HugeTLB. This is a static approach, and not transparent.

Yeah, "not transparent" is an understatement. For those who've never dealt with hugetlb, it goes something like this:

You have to explicitly allocate the pages from the kernel. You can dynamically allocate and free the pages, but since they have to be physically contiguous the number of hugepages you can get at any particular time is dependent on things like memory fragmentation from prior applications. So the best time to reserve them is at kernel startup (you can do this on the kernel command line), but even then the available number of hugepages can vary. For example I had a dataset that required a large number of hugepages, such that even at startup there were only one or two extra to spare -- then we added a few more CPUs to the machine and the next time it booted it could no longer construct enough hugepages to hold the data. When we updated the kernel a short while later the number changed again, thankfully back in our favor.

While allocated to hugepages, that memory can only be used for hugepages. So if you grab them early in order to be sure you can get enough for a later job, and end up devoting most of your RAM to hugepages, that memory is not available for normal use even if the pages aren't holding anything yet.

Access to hugepages is only available through the "hugetlbfs" filesystem, which basically acts like a ramdisk where files you store in it will be backed by hugepages. But hugetlbfs has a nasty little property in that it doesn't support normal I/O such as read(), write(), and ftruncate() on its files. All I/O has to be done through mmap(). This isn't so bad once the data is there, but copying files in and out of hugetlbfs is a big pain because the usual tools like "cp" and "dd" don't work.

That said, hugetlbfs is useful. You can store large amounts of data in hugetlbfs and it will stay memory-resident until reboot, giving you pretty much instant startup and shutdown times when you open/mmap/close the dataset. Alternatives such as tmpfs and mlock() are problematic when the amount of data gets into 10's of gigabytes or is nearing the total system RAM.

If someone is working on a way to get the benefits of hugetlbfs without the downsides of preallocation and limited I/O, that would be great.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds