remap_file_pages()
[Posted March 5, 2003 by corbet]
Ingo Molnar's new
remap_file_pages() system call was first merged
into the 2.5.46 kernel. The final parts of that are just now circulating
in patch form, however. So it seems like a good time to look at what this
system call does.
Many kinds of applications use mmap() to map a file into virtual
memory. mmap() makes a simple, linear mapping between a region of
virtual memory and a corresponding part of the file on disk. Some
applications, however, have more complicated needs; they typically want to
map several pieces of a file into different parts of memory. This sort of
nonlinear mapping is used, for example, by large database management
systems as a way of managing the movement of data to and from the disk.
Nonlinear mappings can be created on any system which supports
mmap(); it's just a matter of creating a separate mapping for each
piece of the file. Such mappings can be expensive to set up, however, and
even more expensive to use. In the Linux kernel, each mapping creates a
separate virtual memory area (VMA). Each VMA uses kernel memory; the
presence of large numbers of VMAs will also slow down the VM subsystem.
The remap_file_pages() system call addresses these problems by
allowing a process to rearrange the memory mapping of a file on the fly.
It is called as:
int remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);
Essentially, this call says that size pages from the file,
starting at page offset pgoff, should be mapped into the process's
virtual memory beginning at start. The file should already be
mapped into a VMA which contains start. Since the system call
works entirely through page table manipulation, it is quite fast. It also
can create complicated nonlinear mappings without needing to create new
VMAs.
remap_file_pages(), as found in the 2.5.64 kernel, only has one
little problem: the remapping information is lost if the page is swapped
out. Users must thus either lock the area in memory (which is generally
not a problem for the "big database management system" scenario, which
tends to perform this locking anyway), or take pains to reestablish the
mapping on swapin. Ingo's latest patch
clears up that last bit of trouble by storing the mapping information into
the page table entry when a page is swapped out. On 32-bit systems, this
technique limits the maximum size of a nonlinear mapping to 1-2TB
(depending on the architecture) because some of the PTE bits are not
available for this use. Given the trouble most 32-bit systems have in
simply addressing that much memory, this limitation is not likely to bother
too many people.
For now, it is not possible to change protections within a single VMA (the
prot parameter to remap_file_pages() is ignored). At
some future point, that could change. Some applications (i.e. memory
debuggers) currently struggle to control memory protection in a
fine-grained manner. Being able to simply set protections on a per-page
basis (without creating new VMAs) would make things much easier.
(
Log in to post comments)