An ancient kernel hole is closed

By Jake Edge
August 18, 2010

A longstanding bug in the Linux kernel—quite possibly since the first 2.6 release in 2003—has been fixed by a recent patch, but the nearly two-month delay between the report and the fix is raising some eyebrows. It is a local privilege escalation flaw that can be triggered by malicious X clients forcing the server to overrun its stack.

The problem was discovered by Rafal Wojtczuk of Invisible Things Lab (ITL) while working on Qubes OS, ITL's virtualization-based, security-focused operating system. ITL's CEO Joanna Rutkowska describes the flaw on the company's blog and Wojtczuk released a paper [PDF] on August 17 with lots more details. In that paper, he notes that he reported the problem to the X.org security team on June 17, and by June 20 the team had determined that it should be fixed in the kernel. But it took until August 13 before that actually happened.

In addition, the description in the patch isn't terribly forthcoming about the security implications of the bug. That is in keeping with Linus Torvalds's policy of disclosing security bugs via code, but not in the commit message, because he feels that may help "script kiddies" easily exploit the flaw. There have been endless arguments about that policy on linux-kernel, here at LWN, and elsewhere, but Torvalds is quite adamant about his stance. While some are calling it a "silent" security fix—and to some extent it is—it really should not come as much of a surprise.

The bug is not in the X server, though the fact that it runs as root on most distributions makes the privilege escalation possible. Because Linux does not separate process stack and heap pages, overrunning a stack page into an adjacent heap page is possible. That means that a sufficiently deep stack (from a recursive call for example) could end up using memory in the heap. A program that can write to that heap page (e.g. an X client) could then manipulate the return address of one of the calls to jump to a place of its choosing. That means that the client can cause the server to run code of its choosing—arbitrary code execution—which can be leveraged to gain root privileges.

Evidently, this kind of exploit has been known for five years or more as Wojtczuk's paper points to a presentation [PDF] by Gaël Delalleau at CanSecWest in 2005 describing the problem, and pointing out that Linux was vulnerable to it. Unfortunately it would seem that the information didn't reach the kernel security team until it was rediscovered recently.

The X server has some other attributes that make it an ideal candidate to exploit the kernel vulnerability. Most servers run with the MIT shared memory extension (MIT-SHM) which allows clients to share memory with the server to exchange image data. An attacker can cause the X server to almost completely exhaust its address space by creating many shared memory segments to share with the server. 64-bit systems must allocate roughly 36,000 32Kx32K pixmaps in the server before creating the shared memory to further reduce the address space. One of the shared memory segments will get attached by the server in the "proper" position with respect to the server's stack.

Once that is done, the client then causes the X server to make a recursive function call. By looking through the shared memory segments for non-zero data, the client can figure out which of the segments is located adjacent to the stack. At that point, it spawns another process that continuously overwrites that segment with the attack payload and triggers the recursion again. When the recursion unwinds, it will hit the exploit code and jump off to do the attacker's bidding—as root.

It is possible that other root processes or setuid programs are vulnerable to the kernel flaw, and X servers with MIT-SHM disabled may be as well. All of those cases are, as yet, hypothetical, and are likely to be much harder to exploit.

X.org hacker Keith Packard described how the fix progressed within the X team. He said that they tried several fixes in the X server, including using resource limits to reduce the address space allowed to the server and limiting recursion depth while ensuring adequate stack depth. None of those were deemed complete fixes for the problem, though.

Andrea Arcangeli and Nick Piggin worked on a fix on the kernel side, but it was not accepted by Torvalds because it "violated some internal VM rules", Packard said. As the deadline for disclosure neared—after being extended from its original August 1 date—Torvalds implemented his own solution which fixed the problem. Overall, Packard was pleased with the response:

The various security teams worked well together in coming up with proposed solutions, although the process was a bit slower than I would have liked. The kernel patch proposed by Linus was tested by Peter Hutterer within a few hours to verify that it prevented the specific attack written by Rafal.

It should also be noted that Torvalds's original fix had a bug, which he has since fixed. The new patch, along with a fix for a user-space-visible change to the /proc/<pid>/maps file are out for stable kernel review at the time of this writing. So, a full correct fix for the problem is not yet available except for those running development kernels or patching the fix in on their own.

All of the "fancy security mechanisms" in Linux were not able to stop this particular exploit, Rutkowska said. She also pointed out that the "sandbox -X" SELinux compartmentalization would not stop this exploit. While it isn't a direct remote exploit, it only takes one vulnerable X client (web browser, PDF viewer, etc.) to turn it into something that is remotely exploitable. Given the number of vulnerable kernels out there, it could certainly be a bigger problem in the future.

The most unfortunate aspect of the bug is the length of time it took to fix. Not just the two months between its discovery and fix, but also the five years since Delalleau's presentation. We need to get better at paying attention to publicly accessible security reports and fixing the problems they describe. One has to wonder how many attackers took note of the CanSecWest presentation and have been using that knowledge for ill. There have been no reports of widespread exploitation—that would likely have been noticed—but smaller, targeted attacks may well have taken advantage of the flaw.

Index entries for this article
Kernel	Security
Security	Kernel stack
Security	Linux kernel

stack management

Posted Aug 18, 2010 22:35 UTC (Wed) by pflugstad (subscriber, #224) [Link] (2 responses)

So, I guess I don't understand how the kernel is involved with managing the stack for user space processes? It obviously is, but what is the mechanism for it?

Hmmm... the process starts with some stack at the top of memory, which it can push/pop data from (calling to and returning from function calls). If when the process pushes more data onto the stack than is actually allocated for it (i.e. it goes over the page boundary), this will fault to the kernel, which automatically allocates more memory for it? Does that sound about right?

The bug is that if the stack runs into some already allocated heap pages, then no fault occurs (since the memory is already allocated), and so now you can write to the heap area and effectively control the stack and return addresses on it.

So, Linus' fix is to add a guard page at the bottom of the currently allocated stack? And this is causing some problems with some applications that probably pay way to much attention than they should to the stack allocation.

Seems like this would be a REALLY old problem, that up until recently was effectively impossible to exploit (due to the sizes of memory and in ability to get a remote process to actually allocate that much memory).

Separately, how are other Unixen not vulnerable to the exact same problem? If not, how do they address the problem?

stack management

Posted Aug 18, 2010 23:14 UTC (Wed) by cesarb (subscriber, #6266) [Link]

> And this is causing some problems with some applications that probably pay way to much attention than they should to the stack allocation.

If I read the discussion correctly, said application does not pay attention to the stack allocation in particular. Instead, it parses the information about all of its own mappings from /proc and mlock()s each one of them, instead of just using mlockall(). When the mlock() hit the guard page, it extended the stack mapping down by one page, which then confused said application when it later parsed again the mapping information from /proc (since the mappings were now different from what it had read before).

The fix was just to hide the extra guard page from /proc.

stack management

Posted Aug 21, 2010 4:32 UTC (Sat) by chad.netzer (subscriber, #4257) [Link]

The article mentions the Delalleau paper, and spender points out that it discusses a lot of the issues you raise, and has informative graphs. So check it out.

http://cansecwest.com/core05/memory_vulns_delalleau.pdf

An ancient kernel hole is closed

Posted Aug 18, 2010 22:52 UTC (Wed) by msmeissn (subscriber, #13641) [Link]

Actual the first fix for this was submitted 6 years ago by Andrea Arcangeli:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-09/7...

I tried to remember and also read through the comments why it wasn't accepted then (probably due to uglyness), but did not enter history deepy to find out why.

An ancient kernel hole is closed

Posted Aug 18, 2010 23:01 UTC (Wed) by paravoid (subscriber, #32869) [Link] (5 responses)

Brad Spender recognized the security implications of the commit on a comment, here on LWN.

http://lwn.net/Articles/400141/

The comments in his exploit code are also interesting; "curious what actual vuln was involved that they were trying to silently fix [...] I smell privesc...mumblings of X server/recursion". Great catch!

An ancient kernel hole is closed

Posted Aug 19, 2010 0:41 UTC (Thu) by clugstj (subscriber, #4020) [Link] (4 responses)

Brad, how many accounts do you have on LWN?

An ancient kernel hole is closed

Posted Aug 19, 2010 12:13 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] (1 responses)

paravoid is Faidon Liambotis.

An ancient kernel hole is closed

Posted Aug 19, 2010 12:20 UTC (Thu) by spender (guest, #23067) [Link]

Don't let facts get in the way of his conspiracy! ;) It's very difficult for guys like him to accept me being right time and again when it comes to these issues.

-Brad

An ancient kernel hole is closed

Posted Aug 19, 2010 15:57 UTC (Thu) by zooko (guest, #2589) [Link] (1 responses)

Joanna Rutkowska also congratulated Brad Spender in her blog post on the subject, and she is sort of a god among security researchers (in my humble and under-informed opinion).

http://theinvisiblethings.blogspot.com/2010/08/skeletons-...

An ancient kernel hole is closed

Posted Aug 20, 2010 1:15 UTC (Fri) by drag (guest, #31333) [Link]

Yes Yes. Spender tends to back up his arguments with facts, which is a win in my book.

Linux was never really designed with security in mind. It's priorities lay in practical uses and performance. It's the conscious application of effort and improvements that improve the security of Linux and not comparisons to Windows. :)

An ancient kernel hole is closed

Posted Aug 18, 2010 23:07 UTC (Wed) by einstein (subscriber, #2052) [Link] (7 responses)

All the more reason for making the X server run as non-root.

An ancient kernel hole is closed

Posted Aug 18, 2010 23:12 UTC (Wed) by arjan (subscriber, #36785) [Link] (4 responses)

Various distributions (MeeGo at least) already does this....

ask your own distro why they don't do this yet I suppose...

An ancient kernel hole is closed

Posted Aug 18, 2010 23:32 UTC (Wed) by cesarb (subscriber, #6266) [Link]

Probably because of legacy drivers which do not use kernel modesetting, or to be able to use X with kernel modesetting disabled (for the drivers which can run either with or without kernel modesetting).

I wonder which restrictions xserver_t has on selinux. If it is restricted enough, it is possible that, even if you can inject code on Xorg running as root, you cannot do much without having to first do DMA tricks to break out of it.

It might be an interesting exercise to make Xorg drop even more permissions (by changing for instance to a xserver_kms_t which cannot touch the hardware) when kernel modesetting is enabled (while keeping the ability to run without kernel modesetting by simply not dropping the extra permissions).

An ancient kernel hole is closed

Posted Aug 19, 2010 0:12 UTC (Thu) by HelloWorld (guest, #56129) [Link] (1 responses)

As far as I know, rootless X requires kernel mode setting, which causes all kinds of breakage on my system at least (e. g. suspend-to-ram doesn't work any longer, xvideo breaks, 3D performance is absymal).

An ancient kernel hole is closed

Posted Aug 19, 2010 8:45 UTC (Thu) by epa (subscriber, #39769) [Link]

For a moment there I thought 'rootless X' must refer to running the X server without a root window - as commonly done with X servers such as Xming on Microsoft Windows. But you meant 'running the X server as a non-root user'.

An ancient kernel hole is closed

Posted Aug 19, 2010 22:26 UTC (Thu) by nix (subscriber, #2304) [Link]

The lack of any way to revoke() other users of the input devices, I understand.

An ancient kernel hole is closed

Posted Aug 20, 2010 0:11 UTC (Fri) by cmccabe (guest, #60281) [Link] (1 responses)

> All the more reason for making the X server run as non-root.

OpenBSD uses privilege separation in its port of X.org.

I wonder if, with kernel modesetting, a totally non-root X.org will ever be possible. Or an selinux-sandboxed one, for that matter.

An ancient kernel hole is closed

Posted Aug 20, 2010 1:30 UTC (Fri) by drag (guest, #31333) [Link]

Well as long as your using decent open source video drivers then your X Server can run as just a regular application _right_now_, rather then some monster that needs to fiddle with bits on your PCI bus like is traditionally needed.

It's certainly and absolutely possible.

But it probably breaks most closed source drivers so development is going to continue to be painfully slow.

There are probably a lots of problems with it, like was mentioned above with input devices, but it's absolutely possible that at some time we can have a non-root-privileged X. But people do have it working in a more-or-less fashion.

Xorg flaw

Posted Aug 18, 2010 23:35 UTC (Wed) by avik (guest, #704) [Link] (17 responses)

Isn't there still a flaw in Xorg? With the patched kernel an X client can still trigger recursion and crash the server.

What kind of server code allows unbounded recursion to be triggered by a client?

Xorg flaw

Posted Aug 19, 2010 0:43 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (8 responses)

That was my first response as well. This sounds like a stack-overflow bug in Xorg, not a kernel bug at all. Since when have guard pages been considered a robust security feature?

Xorg flaw

Posted Aug 19, 2010 1:01 UTC (Thu) by airlied (subscriber, #9104) [Link] (7 responses)

and thats why we are thankful you aren't doing security development.

Xorg flaw

Posted Aug 19, 2010 1:07 UTC (Thu) by avik (guest, #704) [Link]

In addition to being thankful, can you also be helpful and explain? With the guard page, isn't the attack transformed from an exploit to a DoS?

Sure, the guard page helps mitigate the consequences, but as far as I can see a vulnerability still exists.

Xorg flaw

Posted Aug 19, 2010 1:08 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (5 responses)

Care to explain, instead of posting snarky remarks?

You can't exploit this without making Xorg overflow its stack. Even with the changes to the kernel that will cause Xorg to crash. Ergo, it is a bug in Xorg. It's nice that the kernel offers a way to catch stack overflows, but the responsibility lies with Xorg not to do that in the first place. This is no more a kernel security bug than an internal Xorg buffer overflow or the like, which can lead to exactly the same sort of privilege elevation.

If you're the kind to rely on guard pages for security rather than avoiding stack overflow by design, I suppose we can be glad you don't do security development.

Xorg flaw

Posted Aug 19, 2010 5:35 UTC (Thu) by smurf (subscriber, #17840) [Link] (4 responses)

How exactly do you prevent stack overflows? Call a check_for_stack_limit() function from everywhere, which checks how big the stack is vs. how big it should be? (It would need to scan /proc/self/maps to figure out where the last shared memory segment happened to be placed.)

"The application is responsible" is a cop-out because there are zillions of programs out there, but only one kernel. Therefore, fixing the problem once (in the kernel), with a guard page (no need for expensive user-mode checks), is the right solution.

Of course, X should not recursively overrun its stack. It's (probably) still a bug in the X server. So?

"Security by forcing the programmer to write correct code" does not work. As a further example of this principle, witness the large number of PHP-based web sites with SQL injection holes.

Xorg flaw

Posted Aug 19, 2010 6:20 UTC (Thu) by avik (guest, #704) [Link] (2 responses)

What's the algorithm that requires unbounded (or user-bounded) recursion in X? Is it impossible to write it in a way that uses an external allocation rather than the stack?

Cf. quicksort.

The kernel should provide a guard page to prevent against unknown flaws, but known flaws should be corrected.

recursion

Posted Aug 19, 2010 16:33 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

AFAIU It is always possible to transform a recursive algorithm into an equivalent iterative one which stores intermediate state on the heap.

In some cases the recursive algorithm will be clearer, which makes maintenance easier (and reduces the chance of security relevant bugs). In some cases the iterative algorithm will be faster (particularly if your programming language or compiler suck)

recursion

Posted Aug 22, 2010 3:31 UTC (Sun) by jeremiah (subscriber, #1221) [Link]

>AFAIU It is always possible to transform a recursive algorithm into an equivalent iterative one which stores intermediate state on the heap.<

you don't write much XSLT do you...;)

Xorg flaw

Posted Aug 19, 2010 15:27 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> How exactly do you prevent stack overflows?

This is not exactly a new problem. There is plenty of software out there (e.g. real-time embedded systems; the Linux kernel itself) which manages not to crash or misbehave when faced with a fixed-size stack and no special VM protection. The tools are simple:

1. Do not permit unbounded stack recursion.
2. Static analysis - know your worst-case stack requirements.

There may be "zillions" of application programs, but most of them don't run as root and simultaneously share memory with untrusted clients. As a privileged server process, Xorg should be designed to be more secure than most, since *any* code-execution vulnerability in Xorg is a (potentially remote) privilege-escalation vulnerability.

> Of course, X should not recursively overrun its stack.... So?

So it's not a kernel bug. It may be easier in this case to block one known exploit vector by changing the VM behavior of the kernel, and I'm not arguing against the patch, but it's not the kernel's job to prevent you from mapping untrusted memory right below your stack, or from overflowing said stack.

Xorg flaw

Posted Aug 19, 2010 1:14 UTC (Thu) by xtifr (guest, #143) [Link] (7 responses)

If you can figure out a way to implement MIT-SHM without this flaw and without crippling performance, I'm sure the developers at Xorg would be happy to accept your patch. Note that the article specifically mentions that Xorg tried and failed to find a complete fix on their side.

Xorg flaw

Posted Aug 19, 2010 1:40 UTC (Thu) by avik (guest, #704) [Link] (5 responses)

From what I've read it appears the recursion flaw was unrelated to MIT-SHM:

"3. Allocate windows arranged so that when X processes them, some function
F is called recursively. Trigger F recursion."

Looks like any X client can crash the server, with or without a patched kernel.

Xorg flaw

Posted Aug 19, 2010 3:21 UTC (Thu) by xtifr (guest, #143) [Link] (1 responses)

Fair enough, but it still says that Xorg tried and failed to find an adequate fix. If the kernel fix prevents privilege escalation, then what we're left with is merely a potentially annoying bug. Sure, it would be nice to have the bug fixed, but until someone comes up with a fix that works, it's going to be hard to fix it! (The first rule of Tautology Club is the first rule of Tautology Club.)

In any case, runaway memory use already puts your processes in the whimsical hands of the OOM-killer.

Xorg flaw

Posted Aug 19, 2010 17:32 UTC (Thu) by iabervon (subscriber, #722) [Link]

I think that there are actually three things the client has to do: get the server to allocate enough server-side, non-shared resources to use up most of the address space and force the remainder somewhere useful; get a shared memory segment so that the client will be able to change an area of the server's address space; and get the server to overflow the stack into the shared memory segment.

The shared memory aspect is not really a flaw to avoid; the flaws to be fixed on the userspace side are really that the server will go overboard allocating resources for clients, rather than applying some limits to protect itself, and that the server's stack can grow into the heap. At some point, the server should refuse to do what the clients are asking in order to protect itself from overloading (which is hard); the kernel should do better at preventing overloading from leading to unexpected aliasing (which they did). The MIT-SHM aspect just makes the exploit comprehensible.

I don't doubt that a sufficiently clever request could get the server to overflow the stack into the area where the response to the request will be written and write a chosen response into a spot that aliases a return address on the stack, causing the server to return to effectively calling system() on a chunk of an image provided by the client.

Xorg flaw

Posted Aug 19, 2010 14:33 UTC (Thu) by NAR (subscriber, #1313) [Link] (2 responses)

Looks like any X client can crash the server, with or without a patched kernel.

If I understood correctly the problem (which is far from certain) the client can ask the server to allocate memory in the server's address space. Consequently the X server can run out of memory and the OOM killer can kill it. This seems to a be a feature, not a bug (i.e. the whole X server was designed this way). By the way, the X server uses the most memory on my system currently (according to top) and as far as I know, most of the memory is allocated on behalf of clients.

Anyway, nowadays most X clients run locally and if a malicious attacker already controls a client locally, even if it doesn't find any local root holes (which I'm sure there are plenty of), he can delete all of the user's files, send e-mails in the user's name, etc.

Xorg flaw

Posted Aug 21, 2010 9:12 UTC (Sat) by niner (subscriber, #26151) [Link] (1 responses)

"send e-mails in the user's name, etc."

Just because this is one of my favourite misconceptions floating around: nothing at all prevents anyone from sending e-mails in any user's name. Same as you can write any name as sender on an envelope of bad old snail mail. The only thing proving the identity of the sender is in both cases a signature. The electronic version even more so than your easy to fake hand writing. And of course, such a signature should not lie around on your computer unprotected...

Xorg flaw

Posted Aug 23, 2010 23:42 UTC (Mon) by mgedmin (subscriber, #34497) [Link]

The bad thing that malicious programs can do is send emails using the user's bandwidth (and their IP, to avoid spam blacklists).

Xorg flaw

Posted Aug 19, 2010 15:28 UTC (Thu) by HelloWorld (guest, #56129) [Link]

If it can't be done without crippling performance, they'll have to cripple it. Reliability is just more important.

An ancient kernel hole is closed

Posted Aug 19, 2010 16:19 UTC (Thu) by cde (guest, #46554) [Link] (17 responses)

Quite unbelievable that we had to wait 5+ years for this bug to be fixed -- kinda shows Linus' priorities in the developement of the kernel. Not that other competing operating systems do a better job.

An ancient kernel hole is closed

Posted Aug 19, 2010 18:41 UTC (Thu) by chad.netzer (subscriber, #4257) [Link] (16 responses)

From the article: "but the nearly two-month delay between the report and the fix is raising some eyebrows."

Then you: "Quite unbelievable that we had to wait 5+ years for this bug to be fixed-- kinda shows Linus' priorities"

You don't need to slander. This sense of entitlement saying that "we" had to "wait" 5+ years for a fix from Linus is bogus. It implies that he was sitting on his hands over this issue for 5 years, uninterested in fixing it. Instead, he and others were working their asses off to ensure that *your* crappy POS hardware device, etc. worked well with Linux, while trying to address any (and I think all) security issues that were reported in the meantime. Unless you claim that the issue was known to Linus for 5 years, I think you should ask how to better ensure security bug discoverers report more quickly to bug fixers, and lay off the BS.

An ancient kernel hole is closed

Posted Aug 19, 2010 19:27 UTC (Thu) by avik (guest, #704) [Link] (15 responses)

A fix in fact was submitted in 2004.

An ancient kernel hole is closed

Posted Aug 19, 2010 20:06 UTC (Thu) by chad.netzer (subscriber, #4257) [Link] (14 responses)

And note how very different it is from Linus's patches, which *only* touch mm/memory.c, and none of the arch specific files.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...

vs.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-09/7...

So the issue is still, if AA had the right idea, but the wrong approach back in 2004, then how could the proper discussion to address this have been started back then? Once a compelling case was presented, Linus seems to have put effort into implementing an elegant fix, so how does that mean *his* priorities are screwed up? Was the security community nagging LKML about this issue that whole time? (if they were, I'll gladly retract my claims that this is BS).

An ancient kernel hole is closed

Posted Aug 20, 2010 8:30 UTC (Fri) by PaXTeam (guest, #24616) [Link] (13 responses)

> And note how very different it is from Linus's patches

you mean the one that he had to fix like 3 times?

> which *only* touch mm/memory.c

right, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-... (and let's not get started how bloody crap this whole 'solution' is, just look at the hacks like this commit introduces and high-five for not using an easily changeable define for the gap size)

> and none of the arch specific files.

with due respect to Andrea's efforts, there was really no need to touch arch specific code, one can get away with patching expand_downwards, get_unmapped_area and mprotect or so these days.

> So the issue is still, if AA had the right idea, but the wrong approach back in 2004

it's still the right idea and approach (the userland managed guard page brought up in the original discussion is easy to handle).

> Once a compelling case was presented,

that happened in 2004-2005 already (not saying people weren't aware of the problem many years before that though).

> Linus seems to have put effort into implementing an elegant fix

it's a butt-ugly hack he couldn't even get right the first time (where was the lkml discussion before committing it btw? oh right, there wasn't).

> so how does that mean *his* priorities are screwed up

5 years of ignoring the problem (not that it was only him) implies 'screwed up' in my book. certain big commercial companies get grilled for spending a fraction of that time on a security fix.

> Was the security community nagging LKML about this issue that whole time?

ask your security@kernel.org and/or vendor-sec contacts for their 2005 archives and you'll see the discussion take place then.

PS: there's some irony in mm/nommu.c still having an unused heap_stack_gap variable.

An ancient kernel hole is closed

Posted Aug 20, 2010 9:50 UTC (Fri) by chad.netzer (subscriber, #4257) [Link] (12 responses)

> you mean the one that he had to fix like 3 times?

How irrelevant. I suppose you just give up after the first try? They touch completely different sets of files, which context allows you to understand is the point. AA's patch was more intrusive.

> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...

"404 - Unknown commit object"

>with due respect to Andrea's efforts, there was really no need to touch arch
>specific code

So you admit the patch shouldn't have been accepted.

> it's still the right idea and approach

You just admitted that it was the wrong approach, since it touched arch files it didn't need to, etc.

Which vendor or distro included AA's 2004 patch, since you now claim it is "the right approach"? If you've got a (working) link to a vendor commit, from a long time ago, you will begin to have a better argument.

> [a compelling case was presented] in 2004-2005 already

You've previously claimed that Linus and the other kernel developers are "not qualified" to make judgements about security patches "due to lack of expertise".

http://lwn.net/Articles/373896/

So, what have the security "experts" come up with that is better than Linus's solution, and why not 5 years ago when they discussed it? Where is the non-butt-ugly-hack fix that you advocate?

An ancient kernel hole is closed

Posted Aug 20, 2010 11:56 UTC (Fri) by PaXTeam (guest, #24616) [Link] (11 responses)

> How irrelevant.

it'd be relevant if you hadn't carefully omitted some bits of my response ;). you see, those bugs could have been found and fixed before committing the patch had they been discussed publicly. there's some irony in that it was you in the first place who was asking about "the security community nagging LKML about this issue that whole time" and now you find it irrelevant that discussion of this long sought-after fix missed LKML.

> They touch completely different sets of files

so you admit that your claim of "Linus's patches, which *only* touch mm/memory.c" is flat out false.

> AA's patch was more intrusive.

how do you measure intrusiveness? but besides that, whether something is more or less intrusive is of secondary importance when it comes to correctness (and not to mention that ugly hack about lying about vma boundaries).

> "404 - Unknown commit object"

oops, a stray 'v' slipped into the url, here's the correct one: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-... .

> So you admit the patch shouldn't have been accepted.

not at all. what often happens is that a problem gets fixed and later refined, even simplified. just look at the evolution of rmap, there were quite a few rounds of complexity added only to be removed later. it also happens sometimes that a problem gets fixed on a specific arch only then gets generalized (and simplified) for all/most archs. the need for evolution is not a reason to prevent code from entering the tree.

> You just admitted that it was the wrong approach, since it touched arch files it didn't need to, etc.

Chad, i think you should stop commenting on issues you clearly have no idea about. 'approach' here means solving the heap/stack gap problem by either stopping stack growth at the automatic vma expansion level or detecting stack expansion attempts at the lower guard page access level. it's an implementation detail where you add the detection logic in either case and can of course always be discussed/refined/etc (something that didn't really happen for either approach).

> Which vendor or distro included AA's 2004 patch, since you now claim it is "the right approach"? If you've got a (working) link to a vendor commit, from a long time ago, you will begin to have a better argument.

it's irrelevant to correctness who included it but it was present in -mm of the time and also SuSE, IIRC, but you'll have to do the digging yourself i'm afraid.

> You've previously claimed that [...]

How irrelevant. (tm)

seriously, do you understand the difference between being able to *recognize* a security issue in *existing* code and *writing* *new* kernel code to solve problems? by the look of it, you still don't quite get it.

> So, what have the security "experts" come up with that is better than Linus's solution, and why not 5 years ago when they discussed it? Where is the non-butt-ugly-hack fix that you advocate?

now that you (hopefully) understand what 'approach' means in this context, i believe you can answer these questions yourself.

An ancient kernel hole is closed

Posted Aug 20, 2010 17:47 UTC (Fri) by chad.netzer (subscriber, #4257) [Link] (10 responses)

> it'd be relevant if you hadn't carefully omitted some bits of my response ;).

Your response was "you mean the one that he had to fix like 3 times?" It didn't seem worth repeating; now you forced me to...

> you see, those bugs could have been found and fixed before committing the patch had they been discussed publicly.

Oh, so instead of being a snarky response to *my* comment that you quoted immediately prior, about comparing the two approaches, it was an unrelated response to a *completely* different issue; I should have guessed you'd bait-n-switch on me. Why'd you quote me if your response wasn't the issue you were replying to?

As I said, it was an irrelevant response.

> so you admit that your claim of "Linus's patches, which *only* touch mm/memory.c" is flat out false.

Hold on, your link was broken, and I now learned that the web engine doesn't automatically match prefixes. A "git log d7824370e" would have worked had I been near my repo at the time.

Ohhhh, so *now* you mean that a follow up patch, which is not actually the security fix, also touches 2 more files, and *that* is what I should have mentioned? The patch which has *no equivalent* in the 2004 AA patch, so it wasn't necessary to point out in my comparison (thus I didn't link to it)? Pathetic. That patch is explicitly mentioned in the article, you only needed to have said that (ie, I had already seen it).

[SNIP]

> Chad, i think you should stop commenting on issues you clearly have no idea about.

Oh, yes sir.

Understanding the security issue isn't hard; the article explains it nicely. But it isn't the guard page (or gap) itself that is the issue of discussion here (clearly those involved see that as a solution), it is the means of implementing that solution in code in a way that could make its way upstream in a decent timeframe.

> it's irrelevant to correctness who included it but it was present in -mm of the time and also SuSE, IIRC, but you'll have to do the digging yourself i'm afraid.

So you claim the 2004 patch *was* accepted and included by a vendor (I'll accept that claim on faith). THAT is the whole point. If there was an implementation not only proposed, but also accepted and put in use (ie. tested), then that is (finally) evidence there was a fix in use in the field, that wasn't filtering up into Linus's tree. That is clearly a problem, I concede that. Why couldn't you just say that?

> How irrelevant. (tm)

You are unoriginal.

> by the look of it, you still don't quite get it.

And condescending. Since it was *you* who mentioned the butt-uglyness of the implementation of the accepted solution, it is perfectly reasonable to assume you have issues with the coding details of the guard page/gap solution (not the guard/gap concept itself), which *again* was what my response was specifically addressing. So now you re-define what I meant by 'approach'; that's dishonest.

> now that you (hopefully) understand what 'approach' means in this context

The 'approach' in this discussion was NEVER about the issue of whether the guard page/gap concept itself is the right fix. So why are you mentioning that now, and presuming it is *me* who doesn't understand? You wanna claim that the security expert's job is to "recognize" issues in existing code, and not "write new kernel code" to fix it, then stop yammering about the coding details of the solution UNLESS you are claiming that it is insecure? Is it?

Your ego based, toxic form of communication is a thread killer, and it wasn't worth initially responding to. I thought you'd shed some light. If you are at all representative of how insanely frustrating it must be for implementers to get security feedback on their code, it helps explain why there can be such a long disconnect. I guess that is somewhat enlightening in itself.

An ancient kernel hole is closed

Posted Aug 20, 2010 18:40 UTC (Fri) by sbishop (guest, #33061) [Link] (2 responses)

Your ego based, toxic form of communication is a thread killer, and it wasn't worth initially responding to. I thought you'd shed some light.

Yeah, it's funny given the Latin meaning of "pax".

I remember when PaXTeam and spender first showed up here and we thought they were the same person. The difference seems to be that spender can be worth engaging. PaXTeam seems to be a symbolic link to /dev/vitriol.

An ancient kernel hole is closed

Posted Aug 20, 2010 20:30 UTC (Fri) by chad.netzer (subscriber, #4257) [Link] (1 responses)

Check out their commenting on Joanna Rutkowska's blog posting (if you like watching these kind of train wrecks).

http://theinvisiblethings.blogspot.com/2010/08/skeletons-...

An ancient kernel hole is closed

Posted Aug 21, 2010 14:31 UTC (Sat) by PaXTeam (guest, #24616) [Link]

hey, i got an ever better one you can chew on: http://lists.immunitysec.com/pipermail/dailydave/2010-Aug... ;)

An ancient kernel hole is closed

Posted Aug 20, 2010 20:32 UTC (Fri) by zakalwe2 (guest, #50472) [Link] (1 responses)

PaXTeam is a gentleman and a scholar. Every time I have had an issue with PaX on a new kernel release he extremely helpful and a fountain of knowledge. Your fanboy defence of linus was worthy of attack by someone who has worked tirelessly to improve the real security of linux and got nothing but grief for his efforts. It is only because of spender and the pax team that the upstream kernel developers stubborn, cavalier attitude towards security is being eroded and the real security of your systems being improved.

An ancient kernel hole is closed

Posted Aug 20, 2010 21:13 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

Whatever. Give them credit for their work where it is due (absolutely). But people who hijack threads with twisted misrepresentations of what is being discussed are ruinous to the value of this site.

My "defense" of Linus against what seems an unwarranted accusation is justified since no one has adequately explained why the 2004 patch didn't get reworked into an acceptable solution that then got promoted upstream. It now seems less of a technical issue, and more of a communication issue, and I'm seeing *ample* evidence to suggest what kind of factors may have led to that unfortunate situation.

On a side note, to both 'cde' and 'avik', I appreciate your input. In particular 'cde', whom I took issue with stating this was a problem of "priority". This further convinces me it isn't a matter of priorities, but personalities, and "we" may indeed have been denied an earlier fix solely because of it. Thankfully, the right personalities belatedly addressed this particular issue.

An ancient kernel hole is closed

Posted Aug 20, 2010 21:12 UTC (Fri) by spender (guest, #23067) [Link] (2 responses)

Here's a question: before entering in a discussion with the PaX Team, did you bother to do any research of your own? Did you, for instance, read Gael Delalleau's 2005 presentation? Did you specifically read slide 24 and onward? Did you bother to read any of the news articles recently that had mentioned that SuSE has had the fix since SuSE Linux Enterprise 9 (released in 2004)? Had you bothered to create the following test application for instance and see how it happily accesses over the stack gap (using gcc 4.3.2 here but it applies to every other gcc version)?

int main(void)
{
char buf[4096];
char buf2[4096];

strcpy(buf2, "hello");

printf("%s\n", buf2);

return 0;
}

You'll notice the beginning of main() gets compiled by gcc to:
lea ecx, [esp+0x4]
and esp, 0xfffffff0
push dword ptr [ecx-0x4]
push ebp
mov ebp, esp
push ecx
sub esp, 0x2014 <--- look here
mov dword ptr [esp+0x8], 0x6 <--- and now here
mov dword ptr [esp+0x4], 0x80484f0

See, if you had done any research, you would have known about this behavior and known why then a single hardcoded guard page isn't acceptable in certain contexts for security. You'd know that Windows and MSVC don't have these problems. You would also have known about the additional hacks Linus added specifically to account for an incompatibility with an LVM app (after the stable kernels were already released and his buggy patch was pushed out without community review, causing oopses on some machines in addition).

From all of these reasons you would have known why the PaX Team objected to the patch itself and the way it was created and could have engaged in a reasonable discussion, yet with no knowledge and no intention of obtaining any on your own (you decided to take it "on faith" that Andrea's patch was used by SuSE) you chose to argue.

Why is it that people like you choose to engage in heated arguments with people who *have* done their research when it's evident that you've done absolutely none? How about taking responsibility for your own actions and behavior?

-Brad

An ancient kernel hole is closed

Posted Aug 20, 2010 21:39 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

Since you're the expert, is the current kernel fix adequate, or not? If not, what can be done to fix it?

I'm not the one claiming to be the security coding expert, but I *AM* now claiming that some of these experts are apparently an enormous pain to have to deal with.

> you decided to take it "on faith" that Andrea's patch was used by SuSE

I said that specifically *because* that was what PaXTeam claimed and *I* was giving him credit for probably being correct! I was attempting to be cordial on that point. And now you take umbrage, and try to hang me for it...

I'm done. You've successfully shouted down another inquisitor. Thanks for all your efforts with improving Linux security (sincerely; I can see you have enormous talent), but I don't need to put up with your distortions of my meaning, intent, and words.

An ancient kernel hole is closed

Posted Sep 3, 2010 0:20 UTC (Fri) by nix (subscriber, #2304) [Link]

You would also have known about the additional hacks Linus added specifically to account for an incompatibility with an LVM app

Well, the LVM app was doing something sufficiently bizarre that if you'd asked me beforehand, I'd have said of course nobody would do anything like that. (I mean, parsing /proc/self/maps and mlock()ing everything you see? Why not mlockall()? Why would it fail for the guard page yet not for the vdso or vsyscall pages? Well, now we know.)

So, Linus didn't test every single app in the entire world before releasing a security fix. He didn't even test every ramification of every app on his machine. That's simply terrible. He should be publically whipped.

An ancient kernel hole is closed

Posted Aug 21, 2010 14:29 UTC (Sat) by PaXTeam (guest, #24616) [Link] (1 responses)

> Your response was "you mean the one that he had to fix like 3 times?"

i know what my response was but it wasn't the only thing what i was referring to (piece of advise, if i may: when you read some text next time, try to see the forest for the trees ;). if you parse your own statements in context in this thread, you'd have realized that you'd previously asked about the security fix being discussed on LKML:

> Was the security community nagging LKML about this issue that whole
> time? (if they were, I'll gladly retract my claims that this is BS).

now my comment you called irrelevant was merely hinting at the fact that this rush of fixes on top of fixes (which made it into the -stable trees after a week only btw, what's your take on that?) could have been avoided had the discussion actually taken place in public on LKML. i see that you still haven't connected the dots though, but i was actually saying that yes, you are right to raise the issue: what you were asking about (public discussion) would actually have been important (something i explained in my previous response too, apparently to no avail). man, i'm beginning to wonder if i'll ever be able to please you! ;)

> Oh, so instead of being a snarky response to *my* comment that you
> quoted immediately prior, about comparing the two approaches, it was an
> unrelated response to a *completely* different issue;

comparing two approaches involves more than the raw code itself, the design/submission/discussion/refinement/etc processes are all part of it. you know, it's called software development. and yes, the two approaches went through quite a different development process.

> Hold on, your link was broken, and I now learned that the web engine
> doesn't automatically match prefixes.

it does, you just have to supply it with a valid prefix (a 'v' isn't part of it).

> Ohhhh, so *now* you mean that a follow up patch, which is not actually
> the security fix, also touches 2 more files, and *that* is what I should
> have mentioned?

what you call a follow up patch (btw, did you see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-... and http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-... ?) would actually have been part of the actual security fix had the problems with it been recognized originally. you know, as it normally happens in the public discussion/review process that kernel developers use for, well, development.

> The patch which has *no equivalent* in the 2004 AA patch, so it wasn't
> necessary to point out in my comparison (thus I didn't link to it)?

it has no equivalent because his approach didn't need such crap workarounds. speaking of comparisons, do you realize that comparing things involves pointing out both similarities and differences? unless of course you want to have a skewed one.

> Pathetic.

looks like i must have touched on a nerve ;).

> That patch is explicitly mentioned in the article, you only needed to
> have said that (ie, I had already seen it).

are you implying that you read everything else explicitly mentioned in the article, including Gaël's paper? ;)

> But it isn't the guard page (or gap) itself that is the issue of
> discussion here (clearly those involved see that as a solution),

yes, that is exactly the issue here. you either prevent the stack VMA from growing next to another VMA for good, or you allow that growth but prevent actual use of the stack guard page (ideally, that'd be 'pages', but let's not digress). clearly, Linus didn't like the first approach for some reason we'll probably never learn, so he implemented something else instead.

> it is the means of implementing that solution in code in a way that
> could make its way upstream in a decent timeframe.

implementation details are just that, details. first comes overall design and obviously you don't get to write code until there's agreement on the 'right approach'. now as far as we know, such discussion never took place in public for Linus's solution, it just magically happened (and got fixed like 3 times, and broke all -stable releases in one fell swoop, for a week). call it what you want, i find *that* pathetic.

> So you claim the 2004 patch *was* accepted and included by a vendor (I'll accept that claim on faith).

no need to, you can just dig out the SuSE kernel trees of the time (and ever since, i hear) and see for yourself. or you can just take *their* claims on faith: http://support.novell.com/security/cve/CVE-2010-2240.html .

> That is clearly a problem, I concede that. Why couldn't you just say that?

uhm, say what exactly?

> You are unoriginal.

well, what can i say, it was your term, not mine, so suit yourself ;).

> And condescending. Since it was *you* who mentioned the butt-uglyness of
> the implementation of the accepted solution, it is perfectly reasonable
> to assume you have issues with the coding details of the guard page/gap
> solution (not the guard/gap concept itself), which *again* was what my
> response was specifically addressing. So now you re-define what I meant
> by 'approach'; that's dishonest.

Chad, before indulging in more name calling, you might pay attention to what i've been trying to explain to you already: there is *no* guard/gap concept per se. these are *two* *different* concepts. you either have an enforced address space gap *or* you don't but have a guard page instead at the boundary. can you understand this?

> The 'approach' in this discussion was NEVER about the issue of whether
> the guard page/gap concept itself is the right fix. [...]

since these are two different concepts, of course the discussion was always about which one was better.

> So why are you mentioning that now, and presuming it is *me* who doesn't understand?

because you obviously didn't understand the conceptual (let alone implementation) difference between the two. heck, you apparently believed that they were the *same*.

> You wanna claim that the security expert's job is to "recognize" issues
> in existing code, and not "write new kernel code" to fix it, then stop
> yammering about the coding details of the solution UNLESS you are
> claiming that it is insecure? Is it?

i tried to parse this a few times but still can't figure out what you were trying to say here, but back to your original statement:

> You've previously claimed that Linus and the other kernel developers
> are "not qualified" to make judgements about security patches "due to
> lack of expertise".

instead i said this:

> [..]they're simply not qualified to make such judgement calls due to
> lack of expertise.

now what was the judgement call to make in question? quoting dlang:

> the kernel developers and stable team have decided not to try and judge
> which patches are security fixes and which are merely bugfixes.

so what i said was about deciding whether any given patch (newly written code) has security implications or not (i.e., it fixes a security problem, and then let's not get started about recognizing patches that introduce said problems...). i never once mentioned 'security patches' in there, something you were trying to (mis)attribute to me. think about it, if a patch is known to be a 'security patch' (assuming you mean a patch that fixes a known security issue such as this heap/stack gap issue) then there's no need to judge anything, it is a security fix and that's it (mind you, as Jake pointed it out as well, Linus conveniently 'forgot' to omit this little fact from the commit).

so what security experts do is recognize security issues in code (among others), whereas kernel developers write that code (including security fixes, that then gets analyzed by security experts, ad infinitum). it's two very different mindsets, although there're some people who are good at both.

> Your ego based, toxic form of communication is a thread killer, and it
> wasn't worth initially responding to. I thought you'd shed some light.

i explained a few times already what you don't understand, but then i guess there's only so much one can do ;). when you get over yourself, you might realize that if you stop acting like an RVI, you won't be treated as one.

An ancient kernel hole is closed

Posted Aug 21, 2010 20:39 UTC (Sat) by chad.netzer (subscriber, #4257) [Link]

This 'fisking' style of back and forth responses to specific lines of text doesn't seem to be helping.

Basically, there is a response to a specific point with a new point (or implied new point), without enough context to show how it's related, and there's just a pointless adversarial back and forth.

One of your recurring points in this thread is there was no public discussion of this new patch (or series of patches, really). Which is a fine point to raise, and worth a thread of its own (as it's mentioned in the LWN article). I don't disagree with the criticism of that point. But working it in as a response to an unrelated (IMO) one line quote, is not useful. Rather than a discussion on a good new, and related issue, it leads to a confused argument.

Note, I gave you credit (I felt) for answering my question about which vendors had already guarded against this (SuSE), and I conceded that that was damning evidence about the development and upstreaming process, and a *core* point to make. So, thanks for that.

I'm done with the arguing. I learned a few things, I hope it wasn't a waste of space for the site. However, I am not continuing it.

An ancient kernel hole is closed

Posted Aug 19, 2010 17:00 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

Isn't xorg cross-platform?

So does this mean every OS but Linux has been protecting against this kind of abuse, or that the press only talks about the most visible OS?

VM_GROWSDOWN

Posted Aug 20, 2010 10:35 UTC (Fri) by helge.bahmann (subscriber, #56804) [Link] (10 responses)

Could anyone remind me why "VM_GROWSDOWN" (and the resulting bloody hack of a movable guard page) is actually needed anymore? Address space is cheap, afetr all.

Sure there may be apps that rely on the present specific address space layout, so I'd say define a new elf flag and just pre-allocate the stack address space with a proper static guard page instead of this mess...

VM_GROWSDOWN

Posted Aug 20, 2010 17:05 UTC (Fri) by njs (subscriber, #40338) [Link] (3 responses)

For 32-bit apps, address space is not at all cheap.

VM_GROWSDOWN

Posted Aug 20, 2010 19:26 UTC (Fri) by helge.bahmann (subscriber, #56804) [Link] (2 responses)

This is typically about 8MB and therefore <1% of the total address space (assuming 32-bit kernel). If your app cannot tolerate that, you are in worse trouble.

VM_GROWSDOWN

Posted Aug 21, 2010 4:28 UTC (Sat) by chad.netzer (subscriber, #4257) [Link] (1 responses)

Threaded processes can conceivably allocate a lot of (mostly unused?) stack space, and the kernel allows processes to specify a much bigger limit for those wacky applications that need it. Also, mmap'ing applications can put a lot of pressure on the 32-bit address mapping. So the address space may not be as cheap as you envision.

As the article mentions, and spender helpfully emphasizes, the Delalleau paper gives a good graphical overview of the situation.

http://cansecwest.com/core05/memory_vulns_delalleau.pdf

VM_GROWSDOWN

Posted Aug 21, 2010 11:58 UTC (Sat) by helge.bahmann (subscriber, #56804) [Link]

The stacks for threads are not allocated with VM_GROWSDOWN [*], so you already pay the "price" for a fully reserved address space there. VM_GROWSDOWN apparently onlys affect the main thread, so the expenditure of additional 8MB of address space is really only once (and if the admin likes fine-tuning, s/he can always rlimit this further down).

[*] I don't see how GROWSDOWN would make sense for thread stacks, to provide any meaningful growth potential for them you would have to thoughtfully sprinkle them throughout the address space and carefully dance around these locations with other mappings.

VM_GROWSDOWN

Posted Aug 22, 2010 22:55 UTC (Sun) by Blaisorblade (guest, #25465) [Link] (5 responses)

What you suggest is not sufficient. What would happen with existing vulnerable applications? You cannot remove VM_GROWSDOWN for them.
So we have some sort of mess. Reading the description, I think what got in is messier than Arcangeli's proposal. But I digress.

However, since VM_GROWSDOWN seems to be used just for the stack of the main thread (I think it wouldn't be possible to do otherwise for reasons already discussed), ditching it out would mostly make sense. _Except_ if you have a single-threaded application (and there are lots) which needs a big stack. So now each application has to decide whether to switch to the new layout.
For myself, I would enable the new layout by default, but I know somebody is going to complain; unfortunately, backward compatibility makes such changes hard.
Just my 2 cents.

VM_GROWSDOWN

Posted Aug 23, 2010 6:33 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link] (4 responses)

You are right that this is not a short-term solution, vulnerable apps would stay vulnerable this way until they were recompiled to have their elf flags changed to take advantage of a pre-allocated stack. The implemented "solution" however just trades a "code injection" vulnerability for a "denial of service" vulnerability. While this is an improvement, it should IMHO therefore not be the final answer.

I am not sure single-threaded apps with large stack requirements are the problematic case here -- they are already now bounded by the stack size rlimit, so the kernel could make an initial reservation of exactly the specified rlimit to keep them happy, which should be doable and even resizing the VMA in case the app changes its rlimit should be possible (with the added bonus of the kernel immediately detecting that resizing failed due to collision with other mappings). More likely the problem cases are apps that do "fancy things" wrt their memory mappings, but short of trying it to see what breaks there is probably no way to discover which these are :)

VM_GROWSDOWN

Posted Aug 23, 2010 12:44 UTC (Mon) by spender (guest, #23067) [Link] (3 responses)

What about RLIMIT_INFINITY?

on a 64bit OS, the max stack size is larger than the possible address space
on a 64bit OS with a 32bit userland app, the max stack size is larger than the possible address space

(these are both bugs still waiting to be fixed even though I've already published http://grsecurity.net/~spender/64bit_dos.c)

on a 32bit OS, the only limitation is on the initial arg/env stack, limited to 1GB (it should be the same with the 64bit OS and 32bit userland app above, but it's not)

you sure you want to do that reservation? ;)

-Brad

VM_GROWSDOWN

Posted Aug 23, 2010 13:13 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

Sure, but there's already differing behavior depending on whether the stack size is limited or not.

If the stacksize is limited, mmap starts allocating below the stack rlimit (the stack is at the top of memory) and moves down until it hits the heap at the beginning of the memoryspace. Then it'll start filling in holes in other places (such as between the end of the actual stack and the stack rlimit size).

If stacksize is not limited, mmap starts allocating partway between the heap and stack, and moves up until it hits the stack. And then starts filling in holes (such as below the begin address above the heap).

It seems to me that it'd be fairly sane to in the first case, also disable the VM_GROWSDOWN behavior and just allocate a stack of the RLIMIT size immediately. But that *would* mean that you lose RLIMIT_STACK amount of memory in your VM space which could've otherwise been used for mmap'ing, which might be a problem in some cases.

VM_GROWSDOWN

Posted Aug 23, 2010 17:51 UTC (Mon) by PaXTeam (guest, #24616) [Link]

when talking about getting rid of VM_GROWSDOWN, it seems that people forget that it does not only expand the stack as needed, but it can also detect a kind of userland bug where the stack expansion request is beyond a certain architecture dependent limit (just look at the callers of expand_stack in the arch specific page fault handler and the checks before that). so statically allocating the initial task's stack range would let those bugs go undetected in the future. now admittedly this is a rare bug class (IIRC, gcc 2.96 had such a code generation bug) but it still means that there'll be a userland visible change when you get rid of VM_GROWSDOWN.

VM_GROWSDOWN

Posted Aug 23, 2010 17:35 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link]

I'm not sure there are that many applications that rely on "unlimited stack" meaning "allow to fill the entire address-space", but that's why I would not change the default behavior and pick a new elf flag instead (and for anyone needing ridiculously large stacks, split stacks are IMHO the better long-term answer, see http://gcc.gnu.org/wiki/SplitStacks).

There is certainly the practical question of what it means to run a process with stacksize == RLIMIT_INFINITY when the stack vma is supposed to be fully expanded -- I'd say pick some random really large value like 512M, just enough to get sysvinit/upstart/systemd/whatever running, demand that sane limits be set afterwards and have admins suffer really if they do not.

In any case, apparently nothing breaks with my distribution's default 8MB stack rlimit, so I would expect that gradually converting the whole system over to use pre-allocated stack VMAs would not hit too many obstacles.