The object-based reverse-mapping VM

[Posted February 25, 2003 by corbet]

The reverse-mapping VM (RMAP) was merged into 2.5 to solve a specific problem: there was no easy way for the kernel to find out which page tables referred to a given physical page. Certain activities - swapping being at the top of the list - require making changes to all relevant page tables. You simply can not swap a page to disk until all of the page table entries pointing to it have been invalidated. The 2.4 kernel handles swapping by scanning through the page tables, one process at a time, and invalidating entries for pages that look like suitable victims. If it happens to find all of the page table entries in time, the page can then be evicted to disk.

In 2.5, a new data structure was added to make this process easier. Initially each page in the system (as represented by its struct page structure in the system memory map) had a linked list of reverse mapping entries pointing to every page table entry referencing that page. That worked, but it introduced some problems of its own. The reverse mapping entries took up a lot of memory, and quite a bit of time to maintain. Operations which required working with a lot of pages slowed down. And the fork() system call, which must add a new reverse mapping entry for every page in the process's address space, slowed significantly. As a result, there has been an ongoing effort to mitigate RMAP's costs.

Now a new technique, as embodied in this patch by Dave McCracken, has been proposed. This approach, called "object-based reverse mapping," is based on the realization that, in some cases at least, there are other paths from a struct page to a page table entry. If those paths can be used, the full RMAP overhead is unnecessary and can be cut out.

By one reckoning, there are two basic types of user-mode page in a Linux system. Anonymous pages are just plain memory, the kind a process would get from malloc(). Most other pages are file-backed in some way; this means that, behind the scenes, the contents of that page are associated with a file somewhere in the system. File-backed pages include program code and files mapped in with mmap(). For these pages, it is possible to find their page table entries without using RMAP entries. To see how, let us refer to the following low-quality graphic, the result of your editor's nonexistent drawing skills:

The struct page structure for a given page is in the upper left corner. One of the fields of that structure is called mapping; it points to an address_space structure describing the object which backs up that page. That structure includes the inode for the file, various data structures for managing the pages belonging to the file, and two linked lists (i_mmap and i_mmap_shared) containing the vm_area_struct structures for each process which has a mapping into the file. The vm_area_struct (usually called a "VMA") describes how the mapping appears in a particular process's address space; the file /proc/pid/maps lists out the VMAs for the process with ID pid. The VMA provides the information needed to find out what a given page's virtual address is in that process's address space, and that, in turn, can be used to find the correct page table entry.

So all the object-based RMAP patch does is remove the direct reverse mapping entry (pointing from the page structure directly to the page table entry). When it is necessary to find that entry, the virtual memory subsystem simply takes the longer way around, via the address_space and vm_area_struct structures. Finding a page table entry this way certainly will take longer than following a direct pointer, but it should come out cheaper when one considers all of the RMAP information that no longer needs to be maintained.

The object-based RMAP patch does not change the handling of anonymous pages, which do not have an associated address_space structure.

Martin Bligh has posted some initial benchmarks showing some moderate improvement in the all-important kernel compilation test. The object-based approach does seem to help with some of the worst RMAP performance regressions. Andrew Morton pointed out a worst-case performance scenario for this approach, but it is not clear how big a problem it would really be. Andrew has included this patch in his 2.5.62-mm3 tree.

Assuming that this patch goes in (it's late in the development process, but that hasn't stopped Linus from taking rather more disruptive VM patches before...), one might wonder if a complete object-based implementation might follow. The answer is "probably not." Anonymous pages tend to be private to individual processes, so there is no long chain of reverse mappings to manage in any case. So even if such pages came to look like file-backed pages (as could happen, say, with a rework of the swapping code), there isn't necessarily much to be gained from the object-based approach.

The object-based reverse-mapping VM

Posted Feb 27, 2003 19:44 UTC (Thu) by giraffedata (guest, #1954) [Link] (1 responses)

Just a comment on clear terminology:

The professor who first taught me what virtual memory is emphasized that we must never confuse pages with page frames, and over the years I've seen his wisdom. A page is a bunch of bytes of data. A page frame is an area of real memory that can hold a page.

A virtual memory system has a lot more pages than it has page frames.

A Linux 'struct page' describes a page frame, not a page. A page table entry, among other things, describes a page.

The object-based reverse-mapping VM

Posted Feb 28, 2003 9:09 UTC (Fri) by IkeTo (subscriber, #2122) [Link]

>A Linux 'struct page' describes a page frame, not a page. A page table entry,
>among other things, describes a page.

I teach operating systems as well, and so I'm the kind of people who insists that people should use the right terminology. ;p

A page table entry describes *both* a page and a frame, so the above statement is quite misleading. In particular, a page table is a mapping from pages to frames in case the frame is really in physical memory. For those pages that are not yet created in physical memory, a page table says nearly nothing about it. So it is more a frame thing than a page thing.

In Linux, memory of the system is described by a "mm_struct" memory descriptor. The descriptor has two important part: the physical part, which includes the page table described above (about the frames), and the logical part, which includes something called a "VM area struct". Such a vm_area_struct is the thing that really know what to do when a page that has no frame is accessed. In particular, it knows whether that the page is mapped to some disk file, so that whenever read the disk should be loaded; or that the page is mapped to some device, so the device driver should be called to generate the page content; or that the page is on copy-on-write, which means that the kernel must make a copy to the read-only frame that is written into; etc.

Of course there is the second half of the story, which is about the swap. This time the page table really have the information about it. In particular, the "swapped-out page identifier". It is used to hold information about "where is the page stored in the swap?". If you treat swap as extension to the physical frames, it is natural that it is store in the page table: because it is something about the physical frames (indeed its extension), not about the logical page.

The object-based reverse-mapping VM

Posted Mar 8, 2003 12:35 UTC (Sat) by thefly (guest, #10011) [Link]

If i understand correctly, this way the rmap code isn't necessary anymore, so the initial code merge of the -rmap stuff is useless now. Do i understand correctly?