Rik van Riel discussed the reverse mapping VM subsystem. The first part of
the talk discussed the motiviations and mechanisms of rmap, which have been
covered elsewhere. The real discussion centered around two areas.
The first had to do with locking to protect the rmap chains. The initial
implementation used a global lock, which, of course, did not perform
particularly well. A finer-grained locking scheme has since been
implemented, but it is implemented with bit operations. That
implementation also does not perform as well as one might like on some
architectures, since the required atomic operations can be expensive. The
next step appears to go to a hashed spinlock scheme.
Hashed spinlocks are a compromise between the use of a single lock (which
has big contention problems) and using a lock for each rmap chain (which is
a lot of locks). An intermediate number of locks can be set up,
with the hashing scheme deciding which lock protects each chain. Hashed
spinlocks are popping up in a few places in the kernel.
In fact, there are enough hashed spinlock implementations that Linus broke
in to request that rmap not create another one. Instead, Linus would like
to see a single, global hashed spinlock implementation that could be used
wherever hashed spinlocks make sense. Some concerns were raised about
preserving lock ordering rules with a shared spinlock arrangement, but the
feeling seems to be that any such problems can be dealt with.
The other topic of interest was the object-based reverse mapping scheme.
Objrmap attempts to eliminate the reverse mapping pointer overhead for at
least some kinds of virtual memory area by taking advantage of information
available in the VMA structures; it was covered in LWN last February. This scheme works
in a number of situations, but it has not yet been merged - meaning that it
is unlikely to appear in 2.6 at this point.
Rik pointed out some of the concerns about objrmap; for some situations, it
can be vastly slower than the rmap scheme it replaces. There are also some
denial of service scenarios with objrmap. Most of the worst
problems have been fixed, however. There is still a problem with file
truncation, however; if a truncate operation races with a page fault, the
result can be an unswappable page. There is apparently a fix for this
The harder problem is how to deal with the remap_file_pages()
system call (covered here last
March). remap_file_pages() creates a nonlinear virtual memory
area which messes up the objrmap mechanism. The problem can be fixed, but
it involves adding some ugly hacks to the VM subsystem.
Rik observed that remap_file_pages() is really only used by a
couple of applications - it is a highly specialized system call. Those
applications also tend to lock the remapped file pages into physical
memory. If you add a requirement that memory areas using
remap_file_pages() always be locked in this manner, there is no
need to maintain reverse mapping information for those areas - they cannot
be swapped out, after all. Rather than mess up the VM for an obscure case,
wouldn't it be better to change the rules and make the problem go away?
There was some discussion of the issue, and it was pointed out that it
needs to be made easier for ordinary user processes to lock pages into
memory. In the end, though, it looks like the requirement will be added.
Now is the time to make such a change, given that
remap_file_pages() has not, yet, appeared in a stable kernel
to post comments)