Rethinking the Stack Clash fix
Stack Clash is generally seen as a user-space problem: a combination of large on-stack allocations and the lack of stack probing open up opportunities for an attacker to jump over the guard page at the end of the stack. Fixing those problems (and deploying the fixes widely) will take some time; meanwhile, it was thought, systems can be protected by replacing the kernel-provided guard page with a 1MB (or larger) guard region that, hopefully, cannot be jumped over.
That guard region should be invisible to most programs, but it has created a surprising number of problems for some applications. A number of those issues have been worked around, but one has proved difficult to fix; unfortunately, that one is LibreOffice, which can crash on 32-bit systems when running Java components. The issue is a bit complicated but, in short, Java wants to enable code execution in memory located immediately below the stack, which runs afoul of the guard region. This, as Linus Torvalds noted, is a problem:
There is also the lingering fear that other, not-yet-discovered regressions lurk in user-space applications, regressions that might not surface until somebody does a kernel upgrade months or years from now. So, perhaps, adding a large guard region to work around a user-space problem is not the best answer.
On most systems, the resource limit (rlimit) mechanism restricts the stack to a maximum of 8MB of memory. The hard limit tends to be much larger, though, so any unprivileged process can raise the effective stack limit to a larger value — as high as "infinite". The posted exploits for Stack Clash work by raising the stack rlimit, then running a setuid program with a vast array of arguments to fill that huge stack. If that program can be prodded into overflowing the stack without hitting the guard page, it may be possible to overwrite heap data beyond the stack and, from there, take over the program and make use of its privileges.
There may be other ways to get a setuid program to exhaust its stack, but the command-line arguments method is easy and readily under the control of an attacker. (Note: the overflow is relatively easy; a successful exploit is harder). That suggests that the bulk of the Stack Clash exploits can be headed off by preventing the execution of setuid programs with a stack that's both (1) close to the heap area and (2) nearly full at the outset.
Kees Cook took a stab at that problem with this patch attacking the first of those two points. The idea was that, when a process is about to run a setuid program, the stack limit should be reset to a reasonable value; the value Cook chose was whatever the init process has. This patch would not prevent stack exhaustion (indeed, it might cause it if there are setuid programs needing a huge stack), but it would keep the stack from growing large enough to impinge on a heap area.
That patch didn't get far, though, since Torvalds disliked it. One of his complaints was that special-casing setuid programs would be likely to lead to bugs or inadequate protection, since the relevant code would see relatively little use. So Cook's next attempt took a different tack: it places an upper bound on the amount of stack memory that can be occupied by a program's arguments at exec() time. In particular, that limit is 75% of the default stack limit, or 6MB, regardless of the current stack limit. This patch has been merged for 4.13; it's not clear whether this change will find its way into the stable updates to earlier kernel releases.
Limiting stack use by arguments should suffice to block a lot of attacks but, as it turns out, there's still a desire to enforce a limit on the size of the stack for setuid programs. One reason for that might be fears of some sort of pathological behavior that could be exploited to force a setuid program to overflow even a huge and mostly empty stack. But it also turns out that, if the stack rlimit is set to "infinity", the kernel will change the layout of a program's memory areas. A large stack limit suggests that the stack itself is likely to be large, so the kernel maps other memory areas low in the address space to preserve room for the stack to grow into. If, instead, the stack is limited, the kernel will map those areas at higher locations. As a result, the stack rlimit gives an attacker a bit of control over how the target program's memory is laid out — not a desirable thing to do.
Thus this patch series, which applies a maximum 8MB stack limit on setuid programs. These patches, posted on July 10, have not yet been merged; applying this limit required a surprising number of changes to the core exec*() system-call code, so more than the usual amount of review is indicated. There would appear to be general agreement on the goal, though, so this change seems likely to find its way into the mainline eventually. There has been some talk of allowing a larger stack via an annotation in the binary file, but that has not been implemented and may not be without a demonstrated need.
At this point, nobody has said whether these changes will be enough to
allow the removal of the larger guard region from the stack. Returning to
the previous layout semantics would ease a lot of worries about regressions
that might turn up months or years in the future, though, so it's not hard
to see why the idea has appeal. It would seem that at least some of the
kernel's internal memory-layout policies have become a part of the
user-space ABI, so they need to be preserved if possible.
| Index entries for this article | |
|---|---|
| Kernel | Development model/User-space ABI |
| Kernel | Security/Vulnerabilities |
| Security | Linux kernel |
Posted Jul 13, 2017 21:49 UTC (Thu)
by nix (subscriber, #2304)
[Link] (11 responses)
Arbitrary limits for security reasons are one thing. Tiny ones on huge systems are another.
Posted Jul 13, 2017 23:28 UTC (Thu)
by foom (subscriber, #14868)
[Link] (10 responses)
There is an API for getting the max argument size, sysconf(_SC_ARG_MAX). Documented in man sysconf:
Implemented in glibc as:
With this kernel patch, that is now a complete lie, and any program depending on it in order to decide whether the arguments can be passed on the commandline (vs., say, in a response file) are now broken. The clang compiler is one such program I know of, but I'm sure there are others...
WTF?
Posted Jul 14, 2017 4:49 UTC (Fri)
by areilly (subscriber, #87829)
[Link] (1 responses)
Posted Jul 14, 2017 11:20 UTC (Fri)
by foom (subscriber, #14868)
[Link]
Posted Jul 14, 2017 12:23 UTC (Fri)
by joey (guest, #328)
[Link] (6 responses)
Posted Jul 14, 2017 12:39 UTC (Fri)
by jchaxby (subscriber, #63942)
[Link] (2 responses)
And I'm pretty sure there are other perfectly legitimate uses, as nix suggests, of very large arg sizes.
Posted Jul 27, 2017 16:46 UTC (Thu)
by geuder (subscriber, #62854)
[Link] (1 responses)
I have not tweaked ulimit in anyway, the stack size is 8192.
Luckily "find ... -exec ... {} +" worked without the error. Was just a bad old habit to use xargs in this case, where exec + was possible, too. But there might be some more legitimate use cases of "xargs -0"
Unluckily that means that I have not digged any deeper whether it was really a stack clash fix that introduced the "argument list too long" problem. Would be be quite weird coincidence if it weren't.
Posted Jul 27, 2017 18:11 UTC (Thu)
by mbunkus (subscriber, #87248)
[Link]
Parallel execution, for example: `… | xargs -0 --max-args=1 --max-procs=$(getconf _NPROCESSORS_ONLN) the-program-to-parallelize`
Posted Jul 14, 2017 15:06 UTC (Fri)
by corbet (editor, #1)
[Link] (2 responses)
Posted Jul 15, 2017 0:46 UTC (Sat)
by foom (subscriber, #14868)
[Link]
But, once you have increased it for *any* purpose, this kernel change will break programs attempting to choose the right number of args to pass on the command line via calling sysconf. You don't need to be actively requiring or even desiring a large args array...
Posted Jul 17, 2017 14:21 UTC (Mon)
by NightMonkey (subscriber, #23051)
[Link]
Posted Jul 14, 2017 18:03 UTC (Fri)
by Frogging101 (guest, #113180)
[Link]
I would do it but I don't want to butt in without really knowing what I'm talking about.
Posted Jul 13, 2017 21:58 UTC (Thu)
by josh (subscriber, #17465)
[Link]
Posted Jul 14, 2017 0:45 UTC (Fri)
by eru (subscriber, #2753)
[Link] (4 responses)
Posted Jul 14, 2017 1:06 UTC (Fri)
by josh (subscriber, #17465)
[Link]
But you also can't count on all of userspace using a reasonable compiler.
Posted Jul 14, 2017 7:59 UTC (Fri)
by mjw (subscriber, #16740)
[Link] (2 responses)
This series introduces -fstack-check=clash which is a variant of
Posted Jul 14, 2017 21:51 UTC (Fri)
by eru (subscriber, #2753)
[Link] (1 responses)
Posted Jul 14, 2017 22:16 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Rethinking the Stack Clash fix
Um...how come nobody has noted that this new patch totally breaks the userspace API promises?
Rethinking the Stack Clash fix
ARG_MAX - _SC_ARG_MAX
The maximum length of the arguments to the exec(3) family of functions. Must not be less than _POSIX_ARG_MAX (4096).
case _SC_ARG_MAX:
/* Use getrlimit to get the stack limit. */
if (__getrlimit (RLIMIT_STACK, &rlimit) == 0)
return MAX (legacy_ARG_MAX, rlimit.rlim_cur / 4);
return legacy_ARG_MAX;
Rethinking the Stack Clash fix
I'm not sure how well an suid limit like this would interact with something like xargs, that presumably knows similar things about allowable argument sizes. It would seem to require xargs to be aware of the setuid-nature of its client program.
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
"Argument list too long"
"Argument list too long"
Perhaps, but remember that, unless you've tweaked the stack limit yourself, the limit was (and remains) 2MB. Are there really "massive" numbers of users who have changed the stack limit for the purposes of enabling a much larger args array?
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
-fstack-check designed to prevent "jumping the stack" as seen in the
stack-clash exploits.
Rethinking the Stack Clash fix
Rethinking the Stack Clash fix
