KHB: Transparent support for large pages
Posted Jun 22, 2006 16:06 UTC (Thu) by
dododge (subscriber, #2870)
In reply to:
KHB: Transparent support for large pages by ianw
Parent article:
KHB: Transparent support for large pages
The current Linux approach to large pages is HugeTLB. This is a static approach, and not transparent.
Yeah, "not transparent" is an understatement.
For those who've never
dealt with hugetlb, it goes something like this:
You have to explicitly allocate
the pages from the kernel. You can dynamically allocate and free the pages, but since they have to be physically contiguous the number of hugepages you can get at
any particular time is dependent on things like memory fragmentation from
prior applications. So the best time to reserve them is at kernel
startup (you can do this on the kernel command line), but even
then the available number of hugepages can vary. For example I had a
dataset that required a large number of hugepages, such that even
at startup there were only one or two extra to spare -- then we added
a few more CPUs to the machine and the next time it booted it could no
longer construct enough hugepages to hold the data. When we updated the kernel a short while later the number changed again, thankfully back in our favor.
While allocated to hugepages, that memory can only
be used for hugepages. So if you grab them early in order to be sure you can get enough for a later job, and end up devoting most
of your RAM to hugepages, that memory is not available for normal
use even if the pages aren't holding anything yet.
Access to hugepages is only available through the "hugetlbfs" filesystem,
which basically acts like a ramdisk where files you store in it will be
backed by hugepages. But hugetlbfs has a nasty little property in that it doesn't support normal I/O such as read(), write(), and ftruncate() on its files. All I/O has to be done through mmap(). This isn't so bad once the data is there, but copying files in and out of hugetlbfs is a big pain because the usual tools like "cp" and "dd" don't work.
That said, hugetlbfs is useful. You can store large amounts of data in hugetlbfs and it will stay memory-resident until reboot, giving you pretty much instant startup and shutdown times when you open/mmap/close the dataset. Alternatives such as tmpfs and mlock() are problematic when the amount of data gets into 10's of gigabytes or is nearing the total system RAM.
If someone is working on a way to get the benefits of hugetlbfs without the downsides of preallocation and limited I/O, that would be great.
(
Log in to post comments)