remap_file_pages()
Many kinds of applications use mmap() to map a file into virtual memory. mmap() makes a simple, linear mapping between a region of virtual memory and a corresponding part of the file on disk. Some applications, however, have more complicated needs; they typically want to map several pieces of a file into different parts of memory. This sort of nonlinear mapping is used, for example, by large database management systems as a way of managing the movement of data to and from the disk.
Nonlinear mappings can be created on any system which supports mmap(); it's just a matter of creating a separate mapping for each piece of the file. Such mappings can be expensive to set up, however, and even more expensive to use. In the Linux kernel, each mapping creates a separate virtual memory area (VMA). Each VMA uses kernel memory; the presence of large numbers of VMAs will also slow down the VM subsystem.
The remap_file_pages() system call addresses these problems by allowing a process to rearrange the memory mapping of a file on the fly. It is called as:
int remap_file_pages(unsigned long start, unsigned long size, unsigned long prot, unsigned long pgoff, unsigned long flags);
Essentially, this call says that size pages from the file, starting at page offset pgoff, should be mapped into the process's virtual memory beginning at start. The file should already be mapped into a VMA which contains start. Since the system call works entirely through page table manipulation, it is quite fast. It also can create complicated nonlinear mappings without needing to create new VMAs.
remap_file_pages(), as found in the 2.5.64 kernel, only has one little problem: the remapping information is lost if the page is swapped out. Users must thus either lock the area in memory (which is generally not a problem for the "big database management system" scenario, which tends to perform this locking anyway), or take pains to reestablish the mapping on swapin. Ingo's latest patch clears up that last bit of trouble by storing the mapping information into the page table entry when a page is swapped out. On 32-bit systems, this technique limits the maximum size of a nonlinear mapping to 1-2TB (depending on the architecture) because some of the PTE bits are not available for this use. Given the trouble most 32-bit systems have in simply addressing that much memory, this limitation is not likely to bother too many people.
For now, it is not possible to change protections within a single VMA (the
prot parameter to remap_file_pages() is ignored). At
some future point, that could change. Some applications (i.e. memory
debuggers) currently struggle to control memory protection in a
fine-grained manner. Being able to simply set protections on a per-page
basis (without creating new VMAs) would make things much easier.