Reorganizing the address space

[Posted June 30, 2004 by corbet]

The traditional organization of the virtual address space (as seen from user space, on x86 systems) is as shown in the diagram to the right. The very bottom part of the address space is unused; it is there to catch NULL pointers and such. Starting at 0x8000000 is the program text - the read-only, executable code. The text is followed by the heap region, being the memory obtainable via the brk() system call. Typically functions like malloc() obtain their memory from this area; non-automatic program data is also stored there.

The heap differs from the first two regions in that it grows in response to program needs. A program like cat will not make a lot of demands on the heap (one hopes), while running a yum update can grow the heap in a truly disturbing way. The heap can expand up to 1GB (0x40000000), at which point it runs into the mmap area; this is where shared libraries and other regions created by the mmap() system call live. The mmap area, too, grows upward to accommodate new mappings.

Meanwhile, the kernel owns the last 1GB of address space, up at 0xc0000000. The kernel is inaccessible to user space, but it occupies that portion of the address space regardless. Immediately below the kernel is the stack region, where things like automatic variables live. The stack grows downward. On a really bad day, the stack and the mmap area can run into each other, at which point things start to fail.

This organization has worked for some time, but it does have a couple of disadvantages. It fragments the address space, such that neither the heap nor the mmap area can make use of the entire space. If one program makes heavy use of the heap, it could run out of memory, even though a large chunk of space is available between the mmap area and the stack. Normally, not even yum can occupy that much heap, but there are other applications out there which are up to that challenge.

[revised memory layout] As a way of making life safer for the true memory hogs out there, Ingo Molnar has posted a patch which rearranges user space along the lines of the revised diagram on the left. The mmap area has been moved up to the top of the address space, and it now grows downward toward the heap. As a result, the bulk of the address space is preserved in a single, contiguous chunk which can be allocated to either the heap or mmap, as the application requires.

As an added bonus, this organization reduces the amount of kernel memory required to hold each process's page tables, since the fragment at 0x40000000 is no longer present.

There are a couple of disadvantages to this approach. One is that the stack area is rather more confined than it used to be. The actual size of the stack area is determined by the process's stack size resource limit, with a sizable cushion added, so problems should be rare. The other problem is that, apparently, a very small number of applications get confused by the new layout. Any application which is sensitive to how virtual memory is laid out is buggy to begin with; according to Arjan van de Ven, the most common case is applications which store pointers in integer variables and then do the wrong thing when they see a "negative" value.

The fact is that most users will never notice the change; for a demonstration, consider that Fedora kernels have been shipping with this patch for some time. Even a vanilla Fedora Core 1 system has it; a command like "cat /proc/self/maps" will show the new layout at work. The patch is currently part of the -mm kernel, and will probably find its way into the mainline before too long.

Index entries for this article
Kernel	Memory management/User-space layout

Reorganizing the address space

Posted Jul 1, 2004 0:30 UTC (Thu) by parimi (guest, #5773) [Link] (1 responses)

The explanation and the figures provided are excellent. Thanks Jon for such a great article!

Reorganizing the address space

Posted Jan 5, 2005 7:21 UTC (Wed) by jcm (subscriber, #18262) [Link]

I hope that diagram is straight out of LDD3, since if it is then that's something which can only serve to make the next book more successful.

Jon.

Reorganizing the address space

Posted Jul 1, 2004 1:54 UTC (Thu) by ajax (guest, #7251) [Link] (2 responses)

> Any application which is sensitive to how virtual memory is
> laid out is buggy to begin with; according to Arjan van de
> Ven, the most common case is applications which store
> pointers in integer variables and then do the wrong thing
> when they see a "negative" value.

Changing the address space layout rules also affects those applications that specify where in the address space their mmaps are to be placed. Changing the rules where the 'holes' are can and usually break such applications.

Reorganizing the address space

Posted Jul 1, 2004 17:48 UTC (Thu) by vmole (guest, #111) [Link] (1 responses)

Such applications are already broken. There's never been any guarantee that the the specified start address would be honored, and the mmap documentation has always been clear on that.

Reorganizing the address space

Posted Jul 1, 2004 18:56 UTC (Thu) by obobo (guest, #684) [Link]

There's a difference between non-portable and broken. For example, I've used the mmap start address specification to do emulation and testing (on my desktop machine) of a flash filesystem that would run on an embedded device (and that was located at a certain address on that device). While the mmap call wasn't guaranteed to work, it did, and saved me a few weeks of effort re-writing the filesystem.

If this change broke my program (it didn't) I wouldn't have cause to yell too loud; it was not guaranteed to continue to work. But I still wouldn't call the program "broken".

-Bill

Reorganizing the address space

Posted Jul 1, 2004 2:16 UTC (Thu) by jreiser (subscriber, #11027) [Link] (1 responses)

The default .text base on x86 is 0x08048000, chosen by the default linker script for static binding (which is revealed by ld --verbose). I believe that the value 0x08048000 originates in early Unix-like software for x86 from Santa Cruz. The value 0x40000000 is the kernel symbol TASK_UMAPPED_BASE. Another reason to avoid the address range 0 to 0x110000 (1MB + 64KB) is for use by vm86 mode. Fedora Core 1 and 2 also have features exec-shield and exec-shield-randomize, which default to 1 ("on") in /proc/sys/kernel. Exec-shield tries to put an mmap() that specifies PROT_EXEC (and not MAP_FIXED) into the range 1MB to 16MB, because the return address to a CALL from that range has a high byte of 0. This tends to limit damage by malware that overwrites the stack (and hence return addresses) with strings, because '\0' occurs infrequently in the new data. Exec-shield-randomize also varies the base address of such an mmap(), which tends to improve the chance of avoiding a malware attack that depends on fixed addresses. Preference for low addresses with PROT_EXEC also enhances the effectiveness of setting the segment limit for the cs code segment to the least upper bound of pages having PROT_EXEC. This is another [partial] defense against malware.

The details of kernel policy on mmap() are more important than they should be, partly because Linux lacks a binary structure interface to /proc/pid/maps. It is tedious and painful to have to parse variable-length character data that contains non-quoted literal strings. Nothing prevents a filename from having a path component that looks like a line from /proc/pid/maps. Win32 has a binary structure interface VirtualQuery() which is concise, fast, and easy to decode. With such an interface, it is easy for the user to manage placement policy for all the pages of the address space. On Linux, it requires a dirty hack: http://www.BitWagon.com/tub.html

Reorganizing the address space

Posted Jul 5, 2004 8:04 UTC (Mon) by glettieri (subscriber, #15705) [Link]

I believe that the value 0x08048000 originates in early Unix-like software for x86 from Santa Cruz

IIRC, the value 0x08048000 was chosen to accomodate the stack below the .text section (i.e., in the unused black space in the illustrations), growing downward. The 0x48000 bytes could be mapped by the same page table already required by the .text section (thus saving a page table in most cases), while the remaining 0x08000000 would allow more room for stack-hungry applications.

I have always wondered why the stack ended up in the upper portion of the address space. Is there any technical reason, or only historical? Anybody knows?

Reorganizing the address space

Posted Jul 1, 2004 15:14 UTC (Thu) by mwh (guest, #582) [Link] (2 responses)

I really thought the layout of the address space had more to do with /usr/lib/crt0.o and ELF
headers than the kernel. Am I hopelessly mistaken?

Reorganizing the address space

Posted Jul 1, 2004 20:28 UTC (Thu) by riel (subscriber, #3142) [Link]

When you call mmap(2) without specifying a preferred memory address, then the kernel's defaults kick in.

Since moste mmap()s do not specify any address, the kernel's defaults determine the amount of contiguous virtual memory available to pretty much every application. This is what's fixed by Ingo's patch.

Effect of ELF headers on layout

Posted Jul 2, 2004 1:50 UTC (Fri) by giraffedata (guest, #1954) [Link]

While the ELF headers don't affect where the program's mmaps and mallocs get memory, they do determine where the program itself gets loaded. That means all the stuff that's in the ELF file - the text section, data section, etc.

Note that the layout change here doesn't affect where the program gets loaded.

Reorganizing the address space

Posted Jul 1, 2004 17:19 UTC (Thu) by ngmr (guest, #4393) [Link] (1 responses)

Hmm, I wonder what stack size is reserved for a process' stack size resource limit of 'unlimited' (I assume it is picked up from the 'ulimit -s' setting).

From observation, I believe that malloc() will suballocate memory from the mmap() region if it needs / wants to, anyway.

So the distinction between the "heap" and "mmap" sections are less distinct than that presented, I believe. Only "statically" allocated memory (eg. for the data for static & global variables in C), is allocated "exclusively" from the "heap" section.

The general run of applications don't particularly care how memory is laid out, and therefore shouldn't be adversely effected by any change.

However, the category of application that this change is probably trying to address (the "memory hogs" referred to above) are probably precisely those that take special measures to try and achieve the best results from the existing memory layout.

They are also, therefore, most liable to (arguably) "legitimate" breakage when the layout scheme is changed, and most impacted if the ramifications of any such change are not fully thought through.

Reorganizing the address space

Posted Jul 3, 2004 5:46 UTC (Sat) by Ross (guest, #4065) [Link]

It's sbrk(2) vs. mmap(2). Calling malloc(3) usually results in a call
to sbrk(2) if there is not enough free space, but sometimes, in some
implementations, it can result in a call to mmap(2).

Reorganizing the address space

Posted Jul 9, 2004 20:23 UTC (Fri) by huaz (guest, #10168) [Link]

Is there still a gap between stack and mmap? I'd rather there is as it will catch stack overflow.

Reorganizing the address space

Posted Aug 24, 2005 13:08 UTC (Wed) by jasnevo (guest, #32041) [Link]

Great explanation, thank you.

Just one thing that makes me curious... beforehand, some words on the situation. I've developed a highly specific, thread-safe allocator for a CAD-Kernel not using the OS-malloc utility. Originally designed on Win32, this allocator is using the Virtual* functions for allocation. As a post further down stated, those functions are highly straightforward. On my way to LINUX one question pains me: do I have to care for sbrk() at all or may I exclusively use the mmap() functions (and by-the-way gain POSIX conformance)? More precisely, by giving a start address to mmap(), can I force mmap() to go below the sbrk() threshold? The latter figure somewhat supposes that this is possible...

Further information is gratefully appreciated.

Christian.