Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 12:56 UTC (Thu) by a13xp0p0v (guest, #118926)
Parent article: Trying to get STACKLEAK into the kernel

Thanks for the article, Jake!
You've done a very good job since grsecurity doesn't complain about crediting them right.

Let me correct this:

> In January 2018, he was alerted to the page-table isolation (PTI) patches, so he rebased on top
> of those and, once Meltdown and Spectre were announced, changed STACKLEAK to deal with
> return trampolines (retpolines).

STACKLEAK has nothing to do with Spectre and retpolines.

PTI patches introduced the trampoline stack on x86_64. Kernel switches to
this stack from a thread stack just before returning to the userspace. However
there are cases when we return to the userspace directly from the thread stack,
without switching to this trampoline stack.

So during rebasing I adjusted the stack erasing. It detects which stack is used and:
- if it is called from the trampoline stack, erasing goes up to the thread stack top,
- if it is called from the thread stack, erasing goes up to the stack pointer.

----

Now let me give the technical details according to Brad Spengler's comments in twitter.

First, I should say that having any technical feedback from him about my patches
is a VERY rare event. So I'm glad that he posted it, let's talk about that.

> 'Some false quotes from LWN's regurgibloid article on STACKLEAK: "some assertions
> in the stack tracking and alloca() checks were wrong and have been corrected"

1. The original assertion in track_stack() for detecting stack exhaustion
looks like that:
+ if (unlikely((sp & ~(THREAD_SIZE - 1)) < (THREAD_SIZE / 16)))
+ BUG();

After a careful look you can see that this check will never work because of erroneous '~'.

But if we remove this '~', we get into another trouble: when kernel stack is exhausted
and this check is hit, we get into recursive BUG(), since the functions which handle
BUG() are instrumented and call track_stack() themselves.

As a result, this recursive BUG() handling hits the guard page below the thread stack or
corrupts the neighbor memory (if CONFIG_VMAP_STACK or similar feature is disabled).

That's why I say that the original assertion in stack tracking was wrong.

2. Now about assertion in check_alloca().
In v4 I also fixed the surplus and erroneous code for calculating stack_left in check_alloca()
on x86_64. That code repeats the work which is already done in get_stack_info() and it
misses the fact that different exception stacks on x86_64 have different size.
https://www.openwall.com/lists/kernel-hardening/2017/10/0...

> "STACKLEAK was missing places where the stack needed erasing"

I added missing erase_kstack() call at ret_from_fork() for x86_32.

Anyway, if Brad will prove that I'm wrong, I'll be only happy to learn it.
By the way, he never thanked me for the STACKLEAK fixes which I shared with him.

> In fact, the current upstream-proposed STACKLEAK is weaker in a number
> of areas where it matters

I absolutely agree with Brad. Linus and Ingo made me do several changes
which I'm not happy about. It's the price for our compromise. In fact, Linus
doesn't want STACKLEAK at all, but I fight for it.

> (It's also slower for reasons that serve no security purpose at all,

Yes, technically that's true. However, I didn't see the noticeable difference during
performance tests. Original stack erasing in assembly language and my stack erasing
in C show the same numbers.

> and their manual VLA removal has resulted in slower/buggier code in general
> -- what's faster, a simple check inserted by the compiler to make sure a VLA use
> is safe, or a whole kmalloc/kfree in a function?)'

I don't have a strong opinion on that.
Please see Kees' talk at Linux Security Summit, which gives more details:
https://www.youtube.com/watch?v=XfNt6MsLj0E

Anyway, I don't like dropping check_alloca() forced by Linus.
Let's imagine that all VLAs are removed from the mainline kernel.
But how about VLAs in non-upstream code? STACKLEAK with check_alloca()
could protect it from Stack Clash!

----

Let's see what will happen with my patches in the next merge window.
The current v15 fits Linus' requirements.
I wish Kees Cook the best of luck to negotiate with Linus.

Trying to get STACKLEAK into the kernel

Posted Sep 13, 2018 14:02 UTC (Thu) by jake (editor, #205) [Link]

> STACKLEAK has nothing to do with Spectre and retpolines.

ah, sorry about that ... my brain badly wanted to turn 'return trampoline' into 'retpoline' for some reason ... i have adjusted the article ...

thanks,

jake

Trying to get STACKLEAK into the kernel

Posted Sep 14, 2018 14:48 UTC (Fri) by a13xp0p0v (guest, #118926) [Link] (2 responses)

Ok, Brad Spengler has replied with a very detailed post about STACKLEAK:
https://grsecurity.net/~spender/stackleak_response.txt

Cool, I appreciate that! Normally such discussions happen at the Linux Kernel
Mailing List (LKML), but in our case I have to reply here at LWN.

I also see that Brad is desperately trying to be always right. Actually he can relax:
he is genius, he is always right, I'm not protesting :)

I just do my work: I'm pushing this feature into the mainline.

Let me reply Brad.

============

> RE: the assertion in track_stack, the flaw in it (the added ~) was found by Tycho Andersen,
> not Alex (though it's not claimed directly, it's implied as Alex is taking credit for the STACKLEAK
> upstreaming work). This was properly credited below:
> commit 12927d314b2763dd791ef11e56c42184fba4d3f8
> Author: Brad Spengler <spender@grsecurity.net>
> Date: Tue Aug 15 07:11:47 2017 -0400
>
> Fix 32bit stackleak stack_left test present in grsec only, as spotted
> by Tycho Andersen

Oh, Brad, come on! Tycho is my good friend, he supports my upstreaming efforts very
much, I always praise his work. Yes, he was the first who spotted your mistake with '~':
https://www.openwall.com/lists/kernel-hardening/2017/08/15/1

And then later I've investigated the recursive BUG() trouble caused by this check and finally
dropped it in v6:
https://www.openwall.com/lists/kernel-hardening/2017/12/0...

> This test however doesn't have to do with PAX_STACKLEAK as mentioned there -- you can look
> at any PaX patch and see that the test doesn't exist. What the check in grsecurity (tried) to do was
> piggy-back off being called in useful places throughout the entire kernel and in the lack of
> KSTACKOVERFLOW wanted to avoid a recursion-based stack overflow from being able to cleanly
> overwrite its intended target. In fact, the same day I made the above change I added an #ifndef
> to make this explicit:
> commit 16e1332faabc9f270fde9787ddb23e95cb2aad9c
> Author: Brad Spengler <spender@grsecurity.net>
> Date: Tue Aug 15 07:16:30 2017 -0400
>
> Make 32bit stack_left check depend on !KSTACKOVERFLOW to improve performance a bit
>
> So it is correct that there was a bug in the check I added that caused it to be a no-op, but it's
> not part of the STACKLEAK defense and I don't believe we ever advertised that particular added check.

Ok, I see. I know that it is your code, not PaX Team's.

But I didn't know that it is NOT a part of STACKLEAK feature. That's because all we have
is a single giant grsecurity patch and NOT the git history which you quote here.
N.B. I don't blame you.

> A further claim was "STACKLEAK was missing PLACES where the stack needed erasing" (emphasis mine).
> Alex identified one location in ret_from_fork which in our 4.9 patch was missing instrumentation
> for x86_32. This instrumentation wasn't missing in the original STACKLEAK code, nor in our stable
> patches for 3.2 or 3.14. Alex is free to verify this as we have done. It was introduced during
> some upstream churn in the entry code via the following commit:
> commit 39e8701f33d65c7f51d749a5d12a1379065e0926
> Author: Andy Lutomirski <luto@kernel.org>
> Date: Mon Oct 5 17:48:13 2015 -0700
>
> x86/entry/32: Open-code return tracking from fork and kthreads
>
> This open-coding changed ret_from_fork from following a path that would perform the stack clearing
> to one that would not, and since we didn't have any comment-based guards in place there, it slipped
> our notice. As mentioned, this only affected i386, and would be rendered benign by having
> RANDKSTACK enabled (as it is by default in autoconfig) which would clear the full stack on entry to
> the following syscall (as the lowest_stack field is set to the end of the stack for the new
> process). Further, the newly-created process' stack would already be cleared in the presence of
> CONFIG_DEBUG_STACK_USAGE or in any kernel with commit e01e80634ecdde1dd113ac43b3adad21b47f3957
> "fork: unconditionally clear stack on fork". Further, it is likely (possibly guaranteed, I'd have
> to confirm this) that the presence of PAX_MEMORY_SANITIZE (which would be auto-enabled in every
> instance where STACKLEAK was auto-enabled) would ensure the newly-allocated stack would be cleared
> with the SANITIZE poison value.

Ok, so it's evil Andy Lutomirski who has broken your patch :)
Ok, it's not security relevant, that is good.

> There was no other location identified by Alex (and I went ahead and confirmed with v14 that no
> other location has been identified), so the claim that "places" (a plural) were missing
> instrumentation is false.

Oh, I'm sorry for 'PLACES', it's definitely my fault...
In v6 I added two missing erase_kstack() calls:
https://www.openwall.com/lists/kernel-hardening/2017/12/0...
But later one of them turned out to be surplus (kudos to Dmitry V. Levin).

So let me sum up:
- it is not your mistake, it is evil upstream which has broken your patch :)
- there was only ONE erase_kstack() missing, which I fixed.

> Regarding errors in the alloca() checking, Alex's claims there are false. get_stack_info didn't
> exist when STACKLEAK was first written, but when it was introduced we did convert to using it.
> We don't needlessly duplicate functionality of get_stack_info, we only have some additional code
> for correctly computing the amount of stack space left, and our checks there are correct.

Hm... Let's compare the code. That's funny to do it here and not at LKML.

Here is my version (ouch, lwn breaks the identation):

+void __used stackleak_check_alloca(unsigned long size)
+{
+ unsigned long sp = (unsigned long)&sp;
+ struct stack_info stack_info = {0};
+ unsigned long visit_mask = 0;
+ unsigned long stack_left;
+
+ BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));
+
+ stack_left = sp - (unsigned long)stack_info.begin;
+
+ if (size >= stack_left) {
+ /*
+ * Kernel stack depth overflow is detected, let's report that.
+ * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
+ * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
+ * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
+ * panic() in a similar situation, so let's do the same if that
+ * option is on. Otherwise just use BUG() and hope for the best.
+ */
+#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
+ panic("alloca() over the kernel stack boundary\n");
+#else
+ BUG();
+#endif
+ }
+}

And here is grsecurity code for x86_64 from the last public patch (there is a
separate implementation for x86_32):

+void __used pax_check_alloca(unsigned long size)
+{
+ struct stack_info stack_info = {0};
+ unsigned long visit_mask = 0;
+ unsigned long sp = (unsigned long)&sp;
+ unsigned long stack_left;
+
+ BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));
+
+ switch (stack_info.type) {
+ case STACK_TYPE_TASK:
+ stack_left = sp & (THREAD_SIZE - 1);
+ break;
+
+ case STACK_TYPE_IRQ:
+ stack_left = sp & (IRQ_STACK_SIZE - 1);
+ break;
+
+ case STACK_TYPE_EXCEPTION ... STACK_TYPE_EXCEPTION_LAST:
+ stack_left = sp & (EXCEPTION_STKSZ - 1);
+ break;
+
+ case STACK_TYPE_SOFTIRQ:
+ default:
+ BUG();
+ }
+
+ BUG_ON(stack_left < 256 || size >= stack_left - 256);
+}

First, I think there is no reason for this 'switch', since get_stack_info() calculates
the stack size itself, so we can simply do:
+ stack_left = sp - (unsigned long)stack_info.begin;

Moreover, I have previously stated that different exception stacks have different
size at x86_64. All of them are 4K, except the debug stack which is 8K:
static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
[0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
[DEBUG_STACK - 1] = DEBUG_STKSZ
};

Maybe it's not critical for alloca check... Anyway, I prefer to rely on get_stack_info()
to follow "Don't Repeat Yourself" rule. And if it changes, we don't have to patch
check_alloca().

Now the second difference: grsecurity code uses this '256' magic value in BUG_ON().

Mark Rutland and I had a long discussion about it in this thread:
http://openwall.com/lists/kernel-hardening/2018/05/11/12

I think that this '256' is useless here, since we don't know how much of stack space
the BUG_ON() handling consumes. So it can overflow these 256 bytes and corrupt the
neighbour memory.

And here is our solution:

+ if (size >= stack_left) {
+ /*
+ * Kernel stack depth overflow is detected, let's report that.
+ * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
+ * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
+ * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
+ * panic() in a similar situation, so let's do the same if that
+ * option is on. Otherwise just use BUG() and hope for the best.
+ */
+#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
+ panic("alloca() over the kernel stack boundary\n");
+#else
+ BUG();
+#endif

At the same time I see one tricky aspect in my code.
We don't know in which order the compiler puts the local variables on the stack.
So calculating the stack pointer with this:
+ unsigned long sp = (unsigned long)&sp;
can make alloca size check incorrect (but 256 magic value mitigates that).

If one day I will come up with the "check_alloca() add-on" to my v15, I will use
'current_stack_pointer' instead to avoid that aspect.

> If Alex would like us to explain to him how his change there is incorrect and our checks are correct,

Brad, you perfectly know all my arguments, I've already posted them at LKML,
and I always put you in CC. My development process is completely open.

> I'd be happy to explain it in full provided he agree to donate $1000 to a charity of my choosing.
> Now that I've stated he's wrong, he's able to either figure out the reason on his own and correct
> his statement publicly, or if he's so certain he's correct, has nothing to lose by entering
> into this challenge. In the case that we're wrong (not possible as we re-confirmed it prior to
> writing this), I'll be happy to admit defeat and donate $1000 to charity, providing full proof to
> the public and correcting this statement.

Huh. Organizing such a bet to donate money to charity?
We do it on a regular basis without such bets. I'm sure, Grsecurity is a company with
social responsibility, which regularly donates to charity, doesn't it?

Anyway, let me thank you once again for all the information that you've already shared with us.
All my arguments and patches are open, feel free to use them for your version of STACKLEAK.

By the way, you keep silence about my fixes in the gcc plugin... Didn't you apply them?

> So to sum up:
> Yes there was a bug in an added check in grsecurity that depended on STACKLEAK being enabled, but
> which wasn't advertised and wasn't part of the STACKLEAK defense. This was found by Tycho
> Andersen, not Alex, and credited properly in our changelogs.

You are absolutely right. I should only add that your check also causes the recursive BUG()
which corrupts the memory below the stack bottom or hits the guard page.

> Yes, in some newer versions of grsecurity (after the commit mentioned above) we were missing
> explicit STACKLEAK clearing in returning from fork on i386 in a newly-forked process. Due to the
> other factors mentioned above, this likely had 0 real-life impact.

I absolutely agree. I've fixed this SINGLE flaw.

> No, our alloca() tests aren't wrong and don't needlessly duplicate code. We have made a public
> offer to donate $1000 to charity if we're wrong on this point (with us offering to provide all the
> details to easily determine the truth of the statement) provided that Alex agrees to the same
> terms, as we won't do the KSPP's work for free.

Sorry Brad, I don't like this idea. I'm not going to donate to charity because of such a bet.
I've already described all my technical arguments above.
But I'm also not going to force you "do the KSPP's work for free".

I sincerely thank you for such an interesting discussion!
Maybe later I will post the link or digest to LKML, since discussing Linux kernel patches
in twitter-lwn-something doesn't work well.

Best regards,
Alexander

Trying to get STACKLEAK into the kernel

Posted Sep 17, 2018 15:31 UTC (Mon) by shemminger (subscriber, #5739) [Link]

This is great technical feedback. And it is why security fixes should be done in the open on mailing lists instead of flame wars and twitter storms.

Trying to get STACKLEAK into the kernel

Posted May 27, 2019 15:27 UTC (Mon) by a13xp0p0v (guest, #118926) [Link]

[resurrecting an old thread]

> Moreover, I have previously stated that different exception stacks have different
> size at x86_64. All of them are 4K, except the debug stack which is 8K:
> static unsigned long exception_stack_sizes[N_EXCEPTION_STACKS] = {
> [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
> [DEBUG_STACK - 1] = DEBUG_STKSZ
> };

This particular statement was wrong - my mistake!
Brad has revealed (thanks to him) why calculating the stack size from grsecurity was correct:
"The debug IST stack is actually two separate debug stacks to handle #DB recursion"
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.g...
Great!

I will *not* make a patch for the upstream kernel since alloca() checking didn't get to the mainline -
all VLA (Variable Length Arrays) should be removed instead.

Best regards,
Alexander