Ripples from Stack Clash

By Jonathan Corbet
June 28, 2017

In one sense, the Stack Clash vulnerability that was announced on June 19 has not had a huge impact: thus far, at least, there have been few (if any) stories of active exploits in the wild. At other levels, though, this would appear to be an important vulnerability, in that it has raised a number of questions about how the community handles security issues and what can be expected in the future. The indications, unfortunately, are not all positive.

A quick review for those who are not familiar with this vulnerability may be in order. A process's address space is divided into several regions, two of which are the stack and the heap areas. The stack contains short-lived data tied to the running program's call chain; it is normally placed at a high address and it grows automatically (toward lower addresses) if the program accesses memory below the stack's current lower boundary. The heap, instead, contains longer-lived memory and grows upward. The kernel, as of 2010, places a guard page below the stack in an attempt to prevent the stack from growing into the heap area.

The "Stack Clash" researchers showed that it is possible to jump over this guard page with a bit of care. The result is that programs that could be fooled into using a lot of stack space could be made to overwrite heap data, leading to their compromise; setuid programs are of particular concern in this scenario. The fix that has been adopted is to turn the guard page into a guard region of 1MB; that, it is hoped, is too much to be readily jumped over.

Not a new problem

There are certain developers working in the security area who are quite fond of saying "I told you so". In this case, it would appear that they really told us so. An early attempt to deal with this problem can be found in this 2004 patch from Andrea Arcangeli, which imposed a gap of a configurable size between the stack and the heap. Despite being carried in SUSE's kernels for some time, this patch never found its way into the mainline.

In 2010, an X server exploit took advantage of the lack of isolation between the stack and the heap, forcing a bit of action in the kernel community; the result was a patch from Linus Torvalds adding a single guard page (with no configurability) at the bottom of the stack. It blocked the X exploit, and many people (and LWN) proclaimed the problem to be solved. Or, at least, it seemed solved once the various bugs introduced by the initial fix were dealt with.

In the comments to the above-linked LWN article (two days after it was published), Brad Spengler and "PaX Team" claimed that a single-page gap was insufficient. More recently, Spengler posted a blog entry in his classic style on how they told us about this problem but it never got fixed because nobody else knows what they are doing. The thing they did not do, but could have done if they were truly concerned about the security of the Linux kernel, was to post a patch fixing the problem properly.

Of course, nobody else posted such a patch either; the community can only blame itself for not having fixed this problem. Perhaps LWN shares part of that blame for presenting the problem as being fixed when it was not; if so, we can only apologize and try to do better in the future. But we might argue that the real problem is a lack of people who are focused on the security of the kernel itself. There are few developers indeed whose job requires them to, for example, examine and address stack-overrun threats. Ensuring that this problem was properly fixed was not anybody's job, so nobody did it.

The corporate world supports Linux kernel development heavily, but there are ghetto areas that, seemingly, every company sees as being somebody else's problem; security is one of those. The situation has improved a little in recent times, but the core problem remains.

Meanwhile, one might well ask: has the stack problem truly been fixed this time? One might answer with a guarded "yes" — once the various problems caused by the new patch are fixed, at least; a 1MB gap is likely to be difficult for an attacker to jump over. But it is hard to be sure, anymore.

Embargoes

Alexander "Solar Designer" Peslyak is the manager of both the open oss-security and the closed "distros" list; the latter is used for the discussion of vulnerabilities that have not yet been publicly disclosed. The normal policy for that list is that a vulnerability disclosed there can only be kept under embargo for a period of two weeks; it is intended to combat the common tendency for companies to want to keep problems secret for as long as possible while they prepare fixes.

As documented by Peslyak, the disclosure of Stack Clash did not follow that policy. The list was first notified of a problem on May 3, with the details disclosed on May 17. The initial disclosure date of May 30 was pushed back by Qualys until the actual disclosure date of June 19. Peslyak made it clear that he thought the embargo went on for too long, and that the experience would not be repeated in the future.

The biggest problem with the extended embargo, perhaps, was that it kept the discussion out of the public view for too long. The sheer volume on the (encrypted) distros list was, evidently, painful to deal with after a while. But the delay also kept eyes off the proposed fix, with the result that the patches merged by the disclosure date contained a number of bugs. The urge to merge fixes as quickly as possible is not really a function of embargo periods, but long embargoes fairly clearly delay serious review of those fixes. Given the lack of known zero-day exploits, it may well have been better to disclose the problem earlier and work on the fixes in the open.

That is especially true since, according to Qualys, the reason for the embargo extension was that the fixes were not ready. The longer embargo clearly did not result in readiness. There was a kernel patch of sorts, but the user-space side of the equation is in worse shape. A goal like "recompile all userland code with GCC's -fstack-check option" was never going to happen in a short period anyway, even if -fstack-check were well suited to this application — which it currently is not.

There is a related issue in that OpenBSD broke the embargo by publicly committing a patch to add a 1MB stack guard on May 18 — one day after the private disclosure of the problem. This has raised a number of questions, including whether OpenBSD (which is not a member of the distros list) should be included in embargoed disclosures in the future. But perhaps the most interesting point to make is that, despite this early disclosure, all hell stubbornly refused to break loose in its aftermath. Peslyak noted that:

This matter was discussed, and some folks were unhappy about OpenBSD's action, but in the end it was decided that since, as you correctly say, the underlying issue was already publicly known, OpenBSD's commits don't change things much.

As was noted above, the "underlying issue" has been known for many years. A security-oriented system abruptly making a change in this area should be a red flag for those who follow commit streams in the hope of finding vulnerabilities. But there appears to be no evidence that this disclosure — or the other leaks that apparently took place during the long embargo — led to exploits being developed before the other systems were ready. So, again, it's not clear that the lengthy embargo helped the situation.

Offensive CVE assignment

Another, possibly discouraging, outcome from this whole episode was a demonstration of the use of CVE numbers as a commercial weapon. It arguably started with this tweet from Kurt Seifried, reading: "CVE-2017-1000377 Oh you thought running GRsecurity PAX was going to save you?". CVE-2017-1000377, filed by Seifried, states that the grsecurity/PaX patch set also suffers from the Stack Clash vulnerability — a claim which its developers dispute. Seifried has not said whether he carried out these actions as part of his security work at Red Hat, but Spengler, at least, clearly sees a connection there.

Seifried's reasoning appears to be based on this text from the Qualys advisory sent to the oss-security list:

In 2010, grsecurity/PaX introduced a configurable stack guard-page: its size can be modified through /proc/sys/vm/heap_stack_gap and is 64KB by default (unlike the hard-coded 4KB stack guard-page in the vanilla kernel). Unfortunately, a 64KB stack guard-page is not large enough, and can be jumped over with ld.so or gettext().

The advisory is worth reading in its entirety. It describes an exploit against sudo under grsecurity, but that exploit depended on a second vulnerability and disabling some grsecurity protections. With those protections enabled, Qualys says, a successful exploit could take thousands of years.

It is thus not entirely surprising that Spengler strongly denied that the CVE number was valid; in his unique fashion, he made it clear that he believes the whole thing was commercially motivated; "this taints the CVE process", he said. Seifried defended the CVE as "legitimate" but suggested that he was getting tired of the whole show and might give up on it.

Meanwhile Spengler, not to be outdone, filed for a pile of CVE numbers against the mainline kernel, to the befuddlement of Andy Lutomirski, the author of much of the relevant code. Spengler made it appear that this was a retaliatory act and suggested that Lutomirski talk to Seifried about cleaning things up. "I am certain he will treat a member of upstream Linux the same as I've been treated, as he is a very professional and equitable person."

The CVE mechanism was created as a way to make it easier to track and talk about specific vulnerabilities. Some have questioned its value, but there does seem to be a real use for a unique identifier for each problem. If, however, the CVE assignment mechanism becomes a factory for mudballs to be thrown at the competition, it is likely to lose whatever value it currently has. One can only hope that the community will realize that turning the CVE database into a cesspool of fake news will do no good for anybody and desist from this kind of activity.

In conclusion

Our community's procedures for dealing with security issues have been developed over decades and, in many ways, they have served us well over that time. But they are also showing some signs of serious strain. The lack of investment in the proactive identification and fixing of security issues before they become an emergency has hurt us a number of times and will continue to do so. The embargo processes we have developed are clearly not ideal and could use improvement — if we only knew what form that improvement would take.

It is becoming increasingly apparent to the world as a whole that our industry's security is not at the level it needs to be. Hopefully, that will create some commercial incentives to improve the situation. But it also creates incentives to attack others rather than fixing things at home. That is going to lead to some increasingly ugly behavior; let us just hope that our community can figure out a way to solve problems without engaging in divisive and destructive tactics. Our efforts are much better placed in making Linux more secure for all users than in trying to take other approaches down a notch.

Index entries for this article
Kernel	Security/Vulnerabilities
Security	Linux kernel

Ripples from Stack Clash

Posted Jun 28, 2017 16:31 UTC (Wed) by sorokin (guest, #88478) [Link] (5 responses)

What I find disturbing in the story is the fact that programs on windows had been immune to this kind of vulnerability for ages. If you have a function with more than 4 kbytes of local variables, MSVC injects the code that touches all pages allocated for the variables. Windows requires that stack pages must be touched sequentially. Your program will be killed if it tries skipping a single page of the stack.

Ripples from Stack Clash

Posted Jun 28, 2017 23:11 UTC (Wed) by cesarb (subscriber, #6266) [Link] (4 responses)

I doubt it was an intentional security feature on part of the Windows designers; it was probably a side effect of the design of its virtual memory subsystem, which as far as I could find doesn't have the kind of automatically growing virtual memory area that Linux has. Instead, it allows one to mark a page as a "guard page", where any attempt to access will cause an exception (in userspace; Windows has exceptions even in C), and automatically remove the "guard page" mark. The exception handler has to allocate another page and mark it as the new "guard page".

Since there's only one "guard page" on the stack, and you have to touch it or the stack won't grow, compilers for Windows have to call a function to touch the guard page whenever the stack frame for a function is larger than one page. Since on Linux you don't have to touch the guard page for the stack to grow, compilers for Linux haven't implemented it.

This is not the first time that different design choices happen to prevent a vulnerability. For instance, Linux (X11 actually) was not vulnerable to Shatter attacks, due to not sending function pointers through the same message loop used for inter-application messages.

Ripples from Stack Clash

Posted Jun 29, 2017 7:05 UTC (Thu) by vegard (subscriber, #52330) [Link] (3 responses)

> any attempt to access will cause an exception (in userspace; Windows has exceptions even in C)

Are you referring to page fault exceptions, which is a hardware feature different from C++ exceptions? I'm pretty sure it's the same on Linux, and you can even handle those by handling SIGSEGV and/or SIGBUS.

> I doubt it was an intentional security feature on part of the Windows designers; it was probably a side effect of the design of its virtual memory subsystem,

I don't know, I think one-page-at-a-time expansion is a more sane design from the outset. I mean, for sure they didn't have the current Linux userspace exploits in mind, but I don't find it unthinkable that there was at the very least a vaguely security-related concern behind their design.

I don't think we should assume by default that Windows got something right only because of a "lucky design choice"; that's a bit disingenuous.

Ripples from Stack Clash

Posted Jun 29, 2017 9:39 UTC (Thu) by farnz (subscriber, #17727) [Link]

Windows has had a userspace exception mechanism in all Win32 versions. The Windows kernel won't grow stacks automatically; instead, if you want a stack to grow, you set up a guard page, which will trigger a Guard Page Violation exception when touched (instead of the normal Page Fault exception), and then automatically put the page you set up when you configured the guard page into place.

This lets you map in a new guard page, ready for the next step of growth.

Ripples from Stack Clash

Posted Jun 29, 2017 13:55 UTC (Thu) by felixfix (subscriber, #242) [Link] (1 responses)

"one-page-at-a-time expansion is a more sane design"

What if the local var which expands the stack is a multi-page array which is populated randomly?

Ripples from Stack Clash

Posted Jun 29, 2017 22:22 UTC (Thu) by kmeyer (subscriber, #50720) [Link]

The whole thing gets probed. Why, do you think that particular case is sane and should be optimized for?

Ripples from Stack Clash

Posted Jun 28, 2017 16:34 UTC (Wed) by fbacchella (subscriber, #60898) [Link] (1 responses)

And you forget that the distribution's patch was not of the upmost quality: http://www.openwall.com/lists/oss-security/2017/06/22/6 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1699772 And for own use case: ~$ (ulimit -s 4112 ; uname) Linux ~$ (ulimit -s 4111 ; uname) -bash: /usr/bin/uname: Argument list too long

Ripples from Stack Clash

Posted Jun 28, 2017 16:35 UTC (Wed) by fbacchella (subscriber, #60898) [Link]

In a more readable format:

~$ (ulimit -s 4112 ; uname)
Linux
~$ (ulimit -s 4111 ; uname)
-bash: /usr/bin/uname: Argument list too long

Ripples from Stack Clash

Posted Jun 28, 2017 17:39 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (7 responses)

Would having the stack in a separate memory segment avoid this entire class of vulnerabilities? If so, we just need to replace all modern hardware....

Ripples from Stack Clash

Posted Jun 28, 2017 21:27 UTC (Wed) by flussence (guest, #85566) [Link] (6 responses)

Current PC CPUs are, for the time being, still capable of doing real segmentation faults in hardware. It could be done if everyone in the world could be convinced the x86-64 ABI is a bad idea. (Maybe we could improve x32 with memory isolation to make it actually worth the hassle to use?)

It seems weird that we have hardware and software mechanisms all over the place to keep the other layers of an OS from trampling each other, but the stack and heap are allowed to even though nothing good can possibly come of it.

Ripples from Stack Clash

Posted Jun 29, 2017 1:03 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

Eh, I had no idea. I suppose one wouldn't have to completely throw out the x86_64 ABI -- couldn't a tweaked ABI live in parallel with slightly different ELF metadata?

Ripples from Stack Clash

Posted Jun 29, 2017 9:06 UTC (Thu) by jem (subscriber, #24231) [Link] (3 responses)

I'm genuinely curious how you intend to solve this problem with an ABI change. Memory pages on the x86-64 can be write protected, and they can be marked non-executable, but to my knowledge there is no separation at the hardware level between "stack" pages and "non-stack" data pages.

> It seems weird that we have hardware and software mechanisms all over the place to keep the other layers of an OS from trampling each other, but the stack and heap are allowed to even though nothing good can possibly come of it.

The difference here is that we are not talking about layers of an OS, but memory areas that are internal to a single process. Also, it's a bit inaccurate to talk about "the stack", since there are typically lots of separate stacks, one for each thread.

Ripples from Stack Clash

Posted Jul 2, 2017 3:07 UTC (Sun) by immibis (subscriber, #105511) [Link] (2 responses)

With segmentation, you can have separation between stack and non-stack data pages - such that even if you have an address that points to the heap, if you try to use that address to access the stack, you get a segfault. (A *literal* segfault, not one of those pagefaults that we now call segfaults for historical reasons)

Ripples from Stack Clash

Posted Jul 2, 2017 3:39 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

You can do this with the current architecture. Just use bit 62 of the address to indicate heap/stack and set the mappings accordingly.

Ripples from Stack Clash

Posted Jul 2, 2017 17:04 UTC (Sun) by dtlin (subscriber, #36537) [Link]

You shouldn't be able to change the segment selector through pointer arithmetic. In a flat address space, you have to somehow check that any pointer offset (that could conceivably be controlled by user input) doesn't cause the pointer to change that bit...

Ripples from Stack Clash

Posted Jun 30, 2017 6:48 UTC (Fri) by itvirta (guest, #49997) [Link]

I thought long mode (64-bit mode) only supports flat segments, that contain the whole memory area
from 0 to 2^64. Not just because segmentation isn't used in the ABI, but just that the hardware doesn't support it.
Even if i386-style segmentation would be available, it would require pointers to have the segment id with them
everywhere, which makes them unnecessarily longer (80-bit pointers? Might be somewhat awkward to handle because
of alignment issues etc.)

Now, hypothetically, reserving one bit out of 64 for a stack/heap indicator would give separation without using that much
space, but we would require dedicated arithmetic instructions for pointers so that the usual arithmetic wouldn't be able to
change the indicator bit... Either that, or just go back to having a large unreserved area between the two areas, which kinda
seems like what the fix we have for the current issue.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 19:00 UTC (Wed) by ms-tg (subscriber, #89231) [Link] (12 responses)

Hi Jonathan,

Am I the only one that is increasingly finding it hard to see how Brand Spengler / PaX / GrSecurity / Open Source Security / whomever is or was in the wrong on any of this?

Sure, it's true that their scorched-earth style of discourse is/was not helping their points of view to be heard. But seeing how all this has played out, it doesn't look like the mainstream kernel community, Linus included, came off a whole lot better, does it?

Has anyone proposed the following modest commercial model: Someone, let's say Red Hat, pays a sizable annual contract for the services of PaX. In return, PaX works with someone at Red Hat whom they essentially teach, full-time, all the details of all their threat models, vulnerabilities, and protective measures, as understood and implemented by PaX? I would imagine that such a person would then have a fairly good opportunity, in their full-time role as such, to craft these ideas into kernel patches which are both (a) acceptable upstream, (b) signed-off by PaX as fully providing all protection claimed, and (c) unambiguously citing PaX as the source of the ideas behind them?

In other words, could it become as simple as: pay PaX to sit down and work with someone full-time to upstream their ideas, and also pay that other person to act as the full-time intellectual and implementation intermediary between the two difficult cultures of PaX and upstream Linux?

Just a thought...

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 19:51 UTC (Wed) by riel (subscriber, #3142) [Link] (4 responses)

Kees Cook and others are working to upstream the good stuff from the linux-hardened and PaX/grsecurity trees. The maintainer of the linux-hardened tree is helping out with this, too.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 20:09 UTC (Wed) by ms-tg (subscriber, #89231) [Link] (2 responses)

Yes, but...

Isn't the "missing part", that they are making patch sets that end up acceptable for upstream approval, but not necessarily approved by PaX as being complete and fully understanding the problem, and accounting for every detail that is changed from the original solution and why that doesn't violate any assumptions about the threat model?

In other words, isn't the problem that there's no formal role assigned here for PaX itself? And wouldn't they be willing to play that role if
(a) they were paid for it, probably by Red Hat
(b) they're job was to approve the interpretation of the patches, or clearly explain what was still not right
(c) they were clearly credited for originating the understanding of the vulnerability and/or hardening solution?

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 20:15 UTC (Wed) by andresfreund (subscriber, #69562) [Link]

That sounds like the most likely outcome of that would be zero progress. grsecurity has a commercial interest in *not* getting this stuff into core. Several people associated with grsec have a communication style clearly aimed at not resulting in actual cooperation. They don't approve of working on things incrementally. Why would it be a good idea to give them veto powers - and that's pretty much what you're suggesting.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 20:27 UTC (Wed) by pizza (subscriber, #46) [Link]

> Isn't the "missing part", that they are making patch sets that end up acceptable for upstream approval, but not necessarily approved by PaX as being complete and fully understanding the problem [...]

You're missing a crucial point -- In many ways, there are fundamental conflicts between the needs of upstream and the needs of a hardened patchset. Implications include API changes, performance regressions, maintainability, and so forth. I'm not saying that either side is inherently _wrong_; just that there are legitimate areas of disagreement that could easily result in incomplete bits being mainlined.

Another thing that the core kernel maintainers have repeatedly (and IMO correctly) prioritized is a maintainer's ability to play well with others. None of them exist in a vacuum.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 29, 2017 11:28 UTC (Thu) by paulj (subscriber, #341) [Link]

That's been done in a way that doesn't benefit spender, PaXteam, etc., at all. Which has made the situation worse. Not really surprising that paying one set of people to work on other's people code, leaving those original authors out in the cold, would make that other set of people even more antagonistic.

Yes, I know there's lots of people factor issues here, but... there is clear value in their security work regardless, and the above situation is highly sub-optimal (for users esp.).

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 20:22 UTC (Wed) by pizza (subscriber, #46) [Link]

> Has anyone proposed the following modest commercial model: Someone, let's say Red Hat, pays a sizable annual contract for the services of PaX.

As the saying goes, it takes two to tango.

It's my understanding that Pax/etc has refused to pursue this of their own accord, categorically rejected others trying to arrange this for them, and has set fire to just about every bridge in the process.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 28, 2017 20:28 UTC (Wed) by josh (subscriber, #17465) [Link] (3 responses)

It makes a huge amount of sense to pay people to do this work, which is what's being done. However, if the goal is to make the upstream kernel secure, then rather than selecting people with a history of not collaborating with upstream and intentionally antagonizing everyone around them, it makes more sense to select people demonstrably capable of doing so.

Take a look at the tantrum-level spitefulness on the CVE-related mail and ask yourself if that's who you'd want to employ. There's a reason that the community has carefully weaned itself away from dependencies on people like Joerg Schilling and Ulrich Drepper, in favor of people who work well with the community.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 29, 2017 9:55 UTC (Thu) by nhippi (subscriber, #34640) [Link] (2 responses)

When KSPP started to materialize, it would have made sense to Linux Foundation to ask Brad to do it. Even if they didn't, Brad should have seen the writing in the wall and proposed to do it himself. We don't know why exactly Brad is not getting the KSPP money, and instead some other people are paid to upstream Brads code. But chances are "not being nice to your potential future customers" played a big part in it.

Linus is not much better - which is especially bad because some people think the success of Linux is because of his style and LKML culture - rather than despite of it.

KSPP

Posted Jun 29, 2017 13:44 UTC (Thu) by corbet (editor, #1) [Link]

It is worth remembering that KSPP is not a Linux Foundation project, and there is no "KSPP money" slush fund that the LF could have directed differently.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jul 6, 2017 17:18 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Linus is not much better - which is especially bad because some people think the success of Linux is because of his style and LKML culture - rather than despite of it.

I think you'll find Linus is a *lot* better.

Yes he can have a potty-mouth. But he rarely uses it, and mostly when the other guy has been demonstrably stupid. Also the other guy is usually well-known to Linus.

There's a big difference between a bully picking on everyone, and a playground scrap between friends. Linus is respected for his technical judgement, and his ability to keep out of things while dropping technical bombs into other peoples' conversations. There's two things - being able to tell someone politely that their code is wrong because ..., and also to be able to tell someone that their code "does not feel right". Linus has that ability in spades, while unfortunately Pax et al don't have those people skills ...

Cheers,
Wol

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jun 30, 2017 17:13 UTC (Fri) by raven667 (subscriber, #5198) [Link] (1 responses)

> let's say Red Hat, pays a sizable annual contract for the services of PaX. In return, PaX works

I don't think you are in any position to negotiate working conditions for GrSecurity developers, they are capable of doing that themselves, and you aren't the first person to propose an arrangement like this, if GrSecurity developers really wanted to work this way this would be a solved problem. Since you are starting with a fantasy assumption the rest of your idea doesn't really matter.

Increasingly hard to see how Brad Spengler is/was in the wrong

Posted Jul 2, 2017 15:34 UTC (Sun) by jospoortvliet (guest, #33164) [Link]

So the PAX st all team was offered money to upstream their work and was uninterested? That explains their aggression towards the kspp efforts i guess...

Ripples from Stack Clash

Posted Jun 28, 2017 20:24 UTC (Wed) by josh (subscriber, #17465) [Link] (3 responses)

This detailed follow-up, connecting all the myriad pieces of this story, and providing detailed supporting evidence to show how the various parties involved have acted, is exactly the kind of thing I come to LWN for. I've described LWN in the past as "the next best thing to reading LKML", but really, in this case, it's collating information from a dozen places. Thanks for the quality investigative reporting, here.

Ripples from Stack Clash

Posted Jun 30, 2017 11:33 UTC (Fri) by sml (guest, #75391) [Link]

Yep, my thoughts exactly. Thanks!

Ripples from Stack Clash

Posted Jul 1, 2017 6:03 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)

> "the next best thing to reading LKML",

That's when assuming infinite time, otherwise LWN comes first.

Ripples from Stack Clash

Posted Jul 2, 2017 5:27 UTC (Sun) by josh (subscriber, #17465) [Link]

Well, yeah. The next best thing to reading *all* of LKML, and every other development mailing list out there.

Ripples from Stack Clash

Posted Jun 28, 2017 21:46 UTC (Wed) by Trou.fr (subscriber, #26289) [Link] (9 responses)

> Of course, nobody else posted such a patch either; the community can only
> blame itself for not having fixed this problem. Perhaps LWN shares part of
> that blame for presenting the problem as being fixed when it was not; if so,
> we can only apologize and try to do better in the future. But we might argue
> that the real problem is a lack of people who are focused on the security of
> the kernel itself. There are few developers indeed whose job requires them
> to, for example, examine and address stack-overrun threats. Ensuring that
> this problem was properly fixed was not anybody's job, so nobody did it.

While I'm usually a fan of LWN articles, the recent amount of self-delusion
about security in the Linux kernel has been really annoying.

Maybe you could have reflected on the fact that Linus has been repeatedly
insulting people trying to improve Linux's security for years, which certainly
is a deterent to any contribution on the subject.

And there _are_ people who care about the Linux kernel security: they use grsec
patches.

The KSPP has been trying to improve the situation, but as ms-tg put it in another comment: hiring Brad and PaXTeam to actually port grsecurity to the mainline would have been the most efficient (at least in a technical point of view) way. Just remind yourself that (at least until last year) Brad was working on grsec _on his spare time_.

Ripples from Stack Clash

Posted Jun 28, 2017 22:38 UTC (Wed) by nix (subscriber, #2304) [Link] (8 responses)

The KSPP has been trying to improve the situation, but as ms-tg put it in another comment: hiring Brad and PaXTeam to actually port grsecurity to the mainline would have been the most efficient (at least in a technical point of view) way. Just remind yourself that (at least until last year) Brad was working on grsec _on his spare time_.

Apparently this offer was made. Brad refused (though why is unclear amid all the conspiracy-theorizing). Frankly it seems unlikely this could have ended well: Brad isn't going to be happy until everything he does goes straight into the kernel without review or question, and that's just not how development of anything works. The first code review would lead to a titanic explosion and probably a rapid firing. (Heck, I suspect one wouldn't have to wait that long: the first security bug after he was hired, even in code Brad had nothing to do with, would lead to a mass of snideness, an escalating flamewar...)

Employment requires a degree of treating your colleagues like human beings and considering that their concerns may have value and are not exclusively motivated by hatred and malice. I suspect there is a reason grsecurity is off working on its own...

Ripples from Stack Clash

Posted Jun 28, 2017 22:51 UTC (Wed) by PaXTeam (guest, #24616) [Link] (7 responses)

> Apparently this offer was made.

no, no such offer was made. the best Kees could offer at the time (about 2 years ago) was to talk to the CII (he said google's decision to compete with us instead of cooperation was made above his pay grade) and they didn't answer my question whether they'd be willing to fund the necessary hours to get this work done. but yeah, don't let the facts stop you from your conspiracy theorizing ;).

Ripples from Stack Clash

Posted Jun 28, 2017 23:59 UTC (Wed) by pizza (subscriber, #46) [Link] (6 responses)

> no, no such offer was made. the best Kees could offer at the time (about 2 years ago) was to talk to the CII (he said google's decision to compete with us instead of cooperation was made above his pay grade) and they didn't answer my question whether they'd be willing to fund the necessary hours to get this work done. but yeah, don't let the facts stop you from your conspiracy theorizing ;).

Oh, please.

Spender has explicitly stated, here on LWN (and undoubtedly elsewhere) that he would _never_ accept funding from any entity associated with Linux Foundation, including the CII -- in response to the CII asking him to write a proposal in order to get the funding ball rolling.

That sort of response is a good example of what is known as a "self-inflicted career limiting move".

Ripples from Stack Clash

Posted Jun 29, 2017 0:15 UTC (Thu) by PaXTeam (guest, #24616) [Link]

> [...] the CII asking him to write a proposal in order to get the funding ball rolling.

the CII did what? that never happened, you must have misunderstood something. i was the one who asked on cii-discuss in august 2015 how this whole thing could work (before investing my free time into writing a proposal) and got no real response, spender was never part of that discussion.

Ripples from Stack Clash

Posted Jun 29, 2017 11:35 UTC (Thu) by paulj (subscriber, #341) [Link] (4 responses)

spender has stated that, but it's never been clear if he stated that _because_ he had been so antagonised/insulted by being left out in the cold, with other people being paid (presumably well) to work on code he wrote.

It is obvious that that would induce a degree of bitterness and even hatred. The chain of cause and effect is not clear though.

Further, regardless of prior events, it would still be good to try fix this situation, and find some way to navigate around the social and corporate politics so that spender, et al., can earn a living from their security work. Given that that work has clear value, as there are others being paid to take that work and upstream it.

Ripples from Stack Clash

Posted Jun 29, 2017 13:32 UTC (Thu) by ms-tg (subscriber, #89231) [Link] (3 responses)

> Further, regardless of prior events, it would still be good to try fix this situation, and find some way to navigate around the social and corporate politics so that spender, et al., can earn a living from their security work. Given that that work has clear value, as there are others being paid to take that work and upstream it.

I am seeing this the same way. So, here's a second modest proposal, one sketch of an approach that could possibly help fix the situation:

1. Problem: Linus's belittling and antagonism for years

Solution: Linus writes a formal, personal apology. Quoting all of his derogatory comments that were later proved false (in a concise way), and saying, "mea culpa", please accept our community apology and come back into the fold and work with us in good faith.

2. Problem: No clear paths forward.

Solution: Ask (don't tell, don't make assumptions) for jointly-designed Next Steps.

In a public forum, Linus in tandem with someone at Red Hat and someone at Linux Foundation or other funding-source formally write a short public letter making clear that they are open to paying, over a number of years, for Brad and Pax Team's time to work diligently with the community to upstream the source. And *ask them* for a suggestion of how that relationship might work. Listen to what they say -- see if there are solutions that meet all interests.

Perhaps there's some reason why this sort of common sense approach cannot work -- would love to know more?

Ripples from Stack Clash

Posted Jun 29, 2017 23:25 UTC (Thu) by flussence (guest, #85566) [Link] (1 responses)

>1. Problem: Linus's belittling and antagonism for years

>Solution: Linus writes a formal, personal apology. Quoting all of his derogatory comments that were later proved false (in a concise way), and saying, "mea culpa", please accept our community apology and come back into the fold and work with us in good faith.

And once Linus has apologised for being Finnish, grsecurity can apologise for being American.

Ripples from Stack Clash

Posted Jul 3, 2017 18:33 UTC (Mon) by BenHutchings (subscriber, #37955) [Link]

I don't see what nationality has to do with this. (Also, Linus is a naturalised American.)

Ripples from Stack Clash

Posted Jun 30, 2017 9:44 UTC (Fri) by paulj (subscriber, #341) [Link]

I think it's fair to say there's blame on both sides, and both sides feel the unreasonableness of the other makes it impossible to work with them. As a result, I don't think any proposed solution that involves requiring one party to unilaterally accept all blame and apologise to the other will work. Working /around/ assignation of blame and first starting to (re-)build some kind of working relationship (via intermediaries perhaps), while avoiding getting into blame and who is wrong for what, is usually step 1 in conflict resolution.

Find a good intermediary, and finding the resources, would be the first steps, I'd imagine.

Ripples from Stack Clash

Posted Jun 29, 2017 10:53 UTC (Thu) by SLi (subscriber, #53131) [Link] (3 responses)

What's going on with the mail archive links? It seems to me links to mails in LWN articles are broken in maybe 20% of the time, long term. Is it just that there is no non-broken mail archive service?

Mail archives

Posted Jun 29, 2017 13:38 UTC (Thu) by corbet (editor, #1) [Link] (2 responses)

There hasn't been a reliable mail archive, especially one that we could link to in an automated way, since gmane went away. Sometimes I think we're just going to have to make our own.

Mail archives

Posted Jun 29, 2017 14:49 UTC (Thu) by pjones (subscriber, #31722) [Link]

I would totally contribute to a kickstarter for funding this :)

Mail archives

Posted Jul 1, 2017 6:12 UTC (Sat) by marcH (subscriber, #57642) [Link]

Since the recent and unfortunate gmane events I saw myself using more and more the services from this small and promising startup:

https://groups.google.com/forum/#!msg/ciekawe-papierki/Ox...

> Sometimes I think we're just going to have to make our own.

Errr... of which subset of the whole universe of lists?

This isn't fixed until the compilers are

Posted Jun 29, 2017 17:05 UTC (Thu) by ebiederm (subscriber, #35028) [Link] (5 responses)

Let's be clear no amount of kernel effort alone will ever fix this. This is not fixed until the compilers are updated and user space is rebuilt.

This isn't fixed until the compilers are

Posted Jun 29, 2017 18:33 UTC (Thu) by cesarb (subscriber, #6266) [Link]

For arbitrary userspace, yes, but if you know how much stack space your userspace can jump over in the worst case, and it's on the initial thread, having a stack gap bigger than that amount can be a fix. For instance, if your worst function allocates a 256K buffer in the stack, having a 260K or bigger stack gap should be enough to prevent it from being exploited.

This isn't fixed until the compilers are

Posted Jul 2, 2017 0:57 UTC (Sun) by areilly (subscriber, #87829) [Link] (3 responses)

Yes, this is not really an OS issue, although I don't really understand how OSes on 64-bit systems can't help by arranging for swapping death to occur long before the "clash."

This is an application security bug issue, just like every other form of security violation via type violation bug. Everyone has written a program that crashes (eventually) by doing infinite recursion. It's not difficult or clever, it's coding an algorithm bug.

All of the examples in the original Stack Clash report aren't system bugs, they're user-input validation bugs in software that has security implications, and -duh- appropriately unexpected input can tickle a bug that has bad results.

The contiguously-allocated stack is just a convenient data structure that's used for the very good reason that it's efficient. It is just a data structure that's used. Use can include misuse. Some languages and systems trade this efficiency for other structures and other trade-offs: some allocate stack-frames on the heap as a traditional linked structure, because that supports a very large number of threads in a small address space, at the cost of more expensive allocation.

The way this issue is being discussed makes it sound as though Red Hat or Linus or the FreeBSD foundation are responsible for input-validation bugs in Exim. That's simply not the case.

This isn't fixed until the compilers are

Posted Jul 3, 2017 7:29 UTC (Mon) by dlang (guest, #313) [Link] (2 responses)

you can never count on the compilers 'fixing' a problem like this.

If there is a way to crash/corrupt your machine from a sequence of instructions, attackers will use asm() to insert those specific instructions, the fact that the rest of your machine was compiled with a compiler that does something to avoid this bug isn't going to help you.

This isn't fixed until the compilers are

Posted Jul 3, 2017 8:06 UTC (Mon) by Jandar (subscriber, #85683) [Link]

If an attacker can write the executed code with asm(), he/she is already past the defense this fix should create.

This isn't fixed until the compilers are

Posted Jul 4, 2017 3:15 UTC (Tue) by Aaron1011 (guest, #115128) [Link]

> If there is a way to crash/corrupt your machine from a sequence of instructions, attackers will use asm() to insert those specific instructions

Stack Clash isn't a vulnerability in the kernel - it's a vulnerability that allows an attacker to gain control over a process that they wouldn't normally be able to (e.g. a guid/suid'd program like 'sudo'). The kernel change simply makes it less likely for this kind of vulnerability to be exploited (though stack probing is needed to truly fix the issue).