VM_GROWSDOWN

Posted Aug 20, 2010 10:35 UTC (Fri) by helge.bahmann (subscriber, #56804)
Parent article: An ancient kernel hole is closed

Could anyone remind me why "VM_GROWSDOWN" (and the resulting bloody hack of a movable guard page) is actually needed anymore? Address space is cheap, afetr all.

Sure there may be apps that rely on the present specific address space layout, so I'd say define a new elf flag and just pre-allocate the stack address space with a proper static guard page instead of this mess...

VM_GROWSDOWN

Posted Aug 20, 2010 17:05 UTC (Fri) by njs (subscriber, #40338) [Link] (3 responses)

For 32-bit apps, address space is not at all cheap.

VM_GROWSDOWN

Posted Aug 20, 2010 19:26 UTC (Fri) by helge.bahmann (subscriber, #56804) [Link] (2 responses)

This is typically about 8MB and therefore <1% of the total address space (assuming 32-bit kernel). If your app cannot tolerate that, you are in worse trouble.

VM_GROWSDOWN

Posted Aug 21, 2010 4:28 UTC (Sat) by chad.netzer (subscriber, #4257) [Link] (1 responses)

Threaded processes can conceivably allocate a lot of (mostly unused?) stack space, and the kernel allows processes to specify a much bigger limit for those wacky applications that need it. Also, mmap'ing applications can put a lot of pressure on the 32-bit address mapping. So the address space may not be as cheap as you envision.

As the article mentions, and spender helpfully emphasizes, the Delalleau paper gives a good graphical overview of the situation.

http://cansecwest.com/core05/memory_vulns_delalleau.pdf

VM_GROWSDOWN

Posted Aug 21, 2010 11:58 UTC (Sat) by helge.bahmann (subscriber, #56804) [Link]

The stacks for threads are not allocated with VM_GROWSDOWN [*], so you already pay the "price" for a fully reserved address space there. VM_GROWSDOWN apparently onlys affect the main thread, so the expenditure of additional 8MB of address space is really only once (and if the admin likes fine-tuning, s/he can always rlimit this further down).

[*] I don't see how GROWSDOWN would make sense for thread stacks, to provide any meaningful growth potential for them you would have to thoughtfully sprinkle them throughout the address space and carefully dance around these locations with other mappings.

VM_GROWSDOWN

Posted Aug 22, 2010 22:55 UTC (Sun) by Blaisorblade (guest, #25465) [Link] (5 responses)

What you suggest is not sufficient. What would happen with existing vulnerable applications? You cannot remove VM_GROWSDOWN for them.
So we have some sort of mess. Reading the description, I think what got in is messier than Arcangeli's proposal. But I digress.

However, since VM_GROWSDOWN seems to be used just for the stack of the main thread (I think it wouldn't be possible to do otherwise for reasons already discussed), ditching it out would mostly make sense. _Except_ if you have a single-threaded application (and there are lots) which needs a big stack. So now each application has to decide whether to switch to the new layout.
For myself, I would enable the new layout by default, but I know somebody is going to complain; unfortunately, backward compatibility makes such changes hard.
Just my 2 cents.

VM_GROWSDOWN

Posted Aug 23, 2010 6:33 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link] (4 responses)

You are right that this is not a short-term solution, vulnerable apps would stay vulnerable this way until they were recompiled to have their elf flags changed to take advantage of a pre-allocated stack. The implemented "solution" however just trades a "code injection" vulnerability for a "denial of service" vulnerability. While this is an improvement, it should IMHO therefore not be the final answer.

I am not sure single-threaded apps with large stack requirements are the problematic case here -- they are already now bounded by the stack size rlimit, so the kernel could make an initial reservation of exactly the specified rlimit to keep them happy, which should be doable and even resizing the VMA in case the app changes its rlimit should be possible (with the added bonus of the kernel immediately detecting that resizing failed due to collision with other mappings). More likely the problem cases are apps that do "fancy things" wrt their memory mappings, but short of trying it to see what breaks there is probably no way to discover which these are :)

VM_GROWSDOWN

Posted Aug 23, 2010 12:44 UTC (Mon) by spender (guest, #23067) [Link] (3 responses)

What about RLIMIT_INFINITY?

on a 64bit OS, the max stack size is larger than the possible address space
on a 64bit OS with a 32bit userland app, the max stack size is larger than the possible address space

(these are both bugs still waiting to be fixed even though I've already published http://grsecurity.net/~spender/64bit_dos.c)

on a 32bit OS, the only limitation is on the initial arg/env stack, limited to 1GB (it should be the same with the 64bit OS and 32bit userland app above, but it's not)

you sure you want to do that reservation? ;)

-Brad

VM_GROWSDOWN

Posted Aug 23, 2010 13:13 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

Sure, but there's already differing behavior depending on whether the stack size is limited or not.

If the stacksize is limited, mmap starts allocating below the stack rlimit (the stack is at the top of memory) and moves down until it hits the heap at the beginning of the memoryspace. Then it'll start filling in holes in other places (such as between the end of the actual stack and the stack rlimit size).

If stacksize is not limited, mmap starts allocating partway between the heap and stack, and moves up until it hits the stack. And then starts filling in holes (such as below the begin address above the heap).

It seems to me that it'd be fairly sane to in the first case, also disable the VM_GROWSDOWN behavior and just allocate a stack of the RLIMIT size immediately. But that *would* mean that you lose RLIMIT_STACK amount of memory in your VM space which could've otherwise been used for mmap'ing, which might be a problem in some cases.

VM_GROWSDOWN

Posted Aug 23, 2010 17:51 UTC (Mon) by PaXTeam (guest, #24616) [Link]

when talking about getting rid of VM_GROWSDOWN, it seems that people forget that it does not only expand the stack as needed, but it can also detect a kind of userland bug where the stack expansion request is beyond a certain architecture dependent limit (just look at the callers of expand_stack in the arch specific page fault handler and the checks before that). so statically allocating the initial task's stack range would let those bugs go undetected in the future. now admittedly this is a rare bug class (IIRC, gcc 2.96 had such a code generation bug) but it still means that there'll be a userland visible change when you get rid of VM_GROWSDOWN.

VM_GROWSDOWN

Posted Aug 23, 2010 17:35 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link]

I'm not sure there are that many applications that rely on "unlimited stack" meaning "allow to fill the entire address-space", but that's why I would not change the default behavior and pick a new elf flag instead (and for anyone needing ridiculously large stacks, split stacks are IMHO the better long-term answer, see http://gcc.gnu.org/wiki/SplitStacks).

There is certainly the practical question of what it means to run a process with stacksize == RLIMIT_INFINITY when the stack vma is supposed to be fully expanded -- I'd say pick some random really large value like 512M, just enough to get sysvinit/upstart/systemd/whatever running, demand that sane limits be set afterwards and have admins suffer really if they do not.

In any case, apparently nothing breaks with my distribution's default 8MB stack rlimit, so I would expect that gradually converting the whole system over to use pre-allocated stack VMAs would not hit too many obstacles.