2003 Kernel Summit: VM Topics
[Posted July 22, 2003 by corbet]
Martin Bligh talked about an assortment of virtual memory topics. The
first was Ingo Molnar's
4G/4G patch. He was
shocked, he says, by "how not-bad it is." The problem with this patch is
that it requires a translation lookup buffer flush every time the system
switches between user and kernel space. The cost of that flush, according
to Martin, is an 8% performance hit. Many of the techniques being employed
to deal with the low memory crunch (i.e. moving page tables to high memory)
have similar costs. So, asks Martin, might it not be better to just go
with the 4G/4G approach and be done with it?
Linus is willing to consider the idea, though he expects to see the
distributors to apply the patch themselves for a while first. Not
everybody accepts the 8% figure, however. Others, with different
processors and workloads, have seen an impact closer to 30%. This patch
will require quite a bit more study before it finds itself in a mainline
kernel.
Martin talked a bit about looking for better page replacement algorithms.
The current least recently used scheme does not yield optimum results -
though decades of virtual memory research have shown that no other
algorithm does either. Martin talked about having processes page against
themselves so that no single process can take over the system; this
technique was employed by VMS over two decades ago, but don't tell the
kernel hackers that. The real problem with this scheme, as noted by Linus,
is how you balance the resident set sizes of the various processes on the
system. One idea suggested by Martin was to track page faults and see how
many of them correspond to pages which were recently kicked out of memory;
the system could then try to equalize that rate across the system. In the
end, though, there really is no way to get it truly right.
Martin then suggested that, perhaps, the time has come to replace the
kernel's page allocator, which is based on the "buddy system." That was a
hard sell, however; few people are convinced that it is possible to write a
better allocator with decent performance. The only possible next step in
this area would be to show some code.
Finally, Martin talked about rearranging the layout of user-space memory.
Currently, the stack is placed up at the high end of the address range;
Martin would like to move it into the 120MB hole at the bottom of the
address space. Among other things, POSIX apparently specifies that the
stack should be down there. Moving the stack would also allow the system
to allocate mmap() regions from the top of the address space,
yielding a bit more flexibility in how that space is used.
There wasn't a whole lot of opposition to moving the stack, though there
was some concern that some programs - especially scientific applications -
actually need more than 120MB of stack space. So a relocation of the stack
would have to be accompanied by an option that would allow it to stay in
its old place for programs that need it there.
(
Log in to post comments)