KHB: Transparent support for large pages
Posted Jun 24, 2006 1:16 UTC (Sat) by
dododge (subscriber, #2870)
In reply to:
KHB: Transparent support for large pages by smooth1x
Parent article:
KHB: Transparent support for large pages
(we have broken the 4GB for 1 shared memory segment size recently!).
Just FYI there shouldn't be much trouble with mappings that size. On a system with 96GB of RAM, I regularly do single 80GB shared mappings and I've managed to push it as high as 90GB keeping it all in-core. This system is actually a small configuration for the hardware and it wouldn't surprise me if people with bigger machines are doing mappings in the hundreds of gigabytes.
One limit you can run into is that the POSIX shm_open (and SVR4 shmget?) is typically implemented by using a file in /dev/shm, and the tmpfs mounted there is usually sized to half your RAM. If you want to go larger, you can do things like mount a larger tmpfs, or mmap some other file or block device (for example a striped LVM volume), or use hugetlbfs instead of tmpfs.
Another thing about /dev/shm is that it won't stop you creating and mapping a sparse file bigger than it can actually hold. I don't know if shm_open checks for this. I found out about it the hard way -- I mapped a new 50GB file in a 48GB tmpfs and had the application bus error when /dev/shm ran out of pages a few hours later.
The biggest issue we have is simply getting the data in and out of RAM, especially if the shared memory is directly backed by disk. Imagine hitting control-C in an application and having to wait 20-30 minutes for the shell prompt to return, as the OS flushes a zillion pages back to the drive(s).
Large pages for large (>1GB?) shared memory allocations is all we need.
Oh, and pinned into physical memory (non-pageable and NOT looked
at by the paging/swapping code).
I think hugetlbfs will do this for you today, if you want it immediately.
You can also use mlock to keep things resident, but be aware that the last time I tried using it (admittedly it was a 2.4 kernel), it instantly dirtied all of the pages in the mapping. So when the mapping (backed by a disk file) was then unlocked, it insisted on flushing the entire thing even if it hadn't been modified, and the flushing was done single-threaded in the kernel. For a large mapping, this can take a long time.
(
Log in to post comments)