|| ||Ulrich Drepper <drepper-AT-redhat.com>|
|| ||Blaisorblade <blaisorblade-AT-yahoo.it>|
|| ||Re: [patch 00/14] remap_file_pages protection support|
|| ||Sat, 06 May 2006 09:05:29 -0700|
|| ||Nick Piggin <nickpiggin-AT-yahoo.com.au>,
Andrew Morton <akpm-AT-osdl.org>, linux-kernel-AT-vger.kernel.org,
Linux Memory Management <linux-mm-AT-kvack.org>,
Val Henson <val.henson-AT-intel.com>|
> I've not seen the numbers indeed, I've been told of a problem with a "customer
> program" and Ingo connected my work with this problem. Frankly, I've been
> always astonished about how looking up a 10-level tree can be slow. Poor
> cache locality is the only thing that I could think about.
It might be good if I explain a bit how much we use mmap in libc. The
numbers can really add up quickly.
- for each loaded DSO we might have up to 5 VMAs today:
1. text segment, protection r-xp, normally never written to
2. a gap, protection ---p (i.e., no access)
3. a relro data segment, protectection r--p
4. a data segment, protection rw-p
5. a bss segment, anonymous memory
The first four are mapped from the file. In fact, the first segment
"allocates" the entire address space of all segment, even if it's longer
than the file.
Then gap is done using mprotect(PROT_NONE). Then the area for segment 3
and 4 is mapped in one mmap() call. It's in the same file but the
offset used in the mmap is not the same as the same as the offset which
naturally is already established through the first mmap. I.e., if the
first mmap() would start at offset 0 and continue for 1000 pages, the
gap might start at a, say, offset of 4 pages and continue for 500 pages.
Then the "natural" offset of the first data page would be 504 pages but
the second mmap() call would in fact use the offset 4 because the text
and data segment are continuous in the _file_ (although not in memory).
Anyway, once relocations are done the protection of the relro segment is
changed, splitting the data segment in two.
So, for DSO loading there would be two steps of improvement:
1. if a mprotect() call wouldn't split the VMA we would have 3 VMAs in
the end instead of 5. 40% gain.
2. if I could use remap_file_pages() for the data segment mapping and
the call would allow changing the protection and it would not split the
VMAs, then we'd be down to 2 mappings. 60% down.
A second big VMA user are thread stacks. I think the application which
was mentioned in this thread briefly used literally thousands of
threads. Leaving aside the insanity of this (it's unfortunately how
many programmers work) this can create problems because we get at least
two (on IA-64 three) VMAs per thread. I.e., thread stack allocation
works likes this:
1. allocate area big enough for stack and guard (we don't use automatic
growing, this cannot work)
2. change the protection of the guard end of the stack to PROT_NONE.
So, for say 1000 threads we'll end up with 2000 VMAs. Threads are also
important to mention here because
- often they are short-lived and we have to recreate them often. We
usually reuse stacks but only keep that much allocated memory around.
So more often than we like we actually free and later re-allocate stacks.
- these thousands of stack VMAs are really used concurrently. ALl
threads are woken over a period of time.
A third source of VMAs is anonymous memory allocation. mmap is used in
the malloc implementation and directly in various places. For
randomization reasons there isn't really much we can do here, we
shouldn't lump all these allocations together.
A fourth source of VMAs are the programs themselves which mmap files.
Often read-only mappings of many small files.
Put all this together and non-trivial apps as written today (I don't say
they are high-quality apps) can easily have a few thousand, maybe even
10,000 to 20,000 VMAs. Firefox on my machine uses in the moment ~560
VMAs and this is with only a handful of threads. Are these the numbers
the VM system is optimized for? I think what our people running the
experiments at the customer site saw is that it's not. The VMA
traversal showed up on the profile lists.
â§ Ulrich Drepper â§ Red Hat, Inc. â§ 444 Castro St â§ Mountain View, CA â
to post comments)