A GCC -fstack-protector vulnerability on arm64

[Posted September 12, 2023 by corbet]

The GCC stack-protector feature detects stack-based buffer overruns by putting a canary value on the stack and noticing if that value is changed. It turns out, though, that dynamically allocated local variables (such as variable-length arrays and space obtained with alloca()) are placed beyond the canary, so overflows of those variables will not be detected. As a result, arm64 binaries built with vulnerable versions of GCC are not as protected as they should be and need to be rebuilt.

Dynamic allocations are just as susceptible to overflows as other locals. In fact, they're arguably more susceptible because they're almost always arrays, whereas fixed locals are often integers, pointers, or other types to which variable-length data is never written. GCC's own heuristics for when to use a stack guard reflect this.

Kees Cook, meanwhile, has pointed out that the kernel no longer uses variable-length arrays, so kernel builds should not be affected by this vulnerability.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 12, 2023 22:21 UTC (Tue) by dvdeug (guest, #10998) [Link] (16 responses)

Kees Cook also suggests that people shouldn't use VLAs in C. That seems excessive; that you can toss stuff on the stack and automatically get it deallocated helps compensate for C's weakness in handling memory. Possibly not appropriate in the kernel or similar system, but most systems don't have that limited amount of stack.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 1:06 UTC (Wed) by hmh (subscriber, #3838) [Link]

You might want to check out the "cleanup()" variable attribute that exists in gcc and clang, if you don't know about it already.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 5:58 UTC (Wed) by Vipketsh (guest, #134480) [Link] (13 responses)

VLA's have a hidden problem: the size of the stack is limited and thus VLA's are limited in size too. If the biggest VLA you can have doesn't hit stack limits then you may as well just do a static allocation (i.e. constant-sized array) of its max size on the stack instead. If the VLA size is bigger than your stack size or unbounded, then your code has a bug and the fix is to remove the VLA.

In conclusion: there is no case where VLAs are an advantage.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 7:59 UTC (Wed) by PengZheng (subscriber, #108006) [Link] (2 responses)

> In conclusion: there is no case where VLAs are an advantage.

Large stack allocation is not advisable. Especially for embedded systems with swap turned off.
https://stackoverflow.com/questions/14389525/linux-stack-...

If VLA could be as large as stack limit, it should not exist at the first place.

Support that we have a function call hierarchy A->B->C, and that each function uses VLA, whose size depends on various factors.
If they use constant-sized array rather than VLA as you suggested, then the stack usage is *unconditionally* larger than the sum of the sizes of these constant-sized arrays.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 15:01 UTC (Wed) by geofft (subscriber, #59789) [Link] (1 responses)

I think this is actually an argument against VLAs! In the scenario you describe, how do you guard against each of A, B, and C using VLAs that are all their largest possible size? There is no benefit in converting a program that unconditionally overruns the stack to one that conditionally overruns the stack. The goal is to unconditionally not overrun the stack.

(Even if the program is processing trusted input - e.g., it's part of something like 'make' whose purpose is running code anyway, and so there cannot really be security vulnerabilities - there still isn't a point in having it conditionally overrun the stack and crash. Just detect when the inputs are too big and conditionally throw an error at the beginning of the program. The effect for the end user is no worse, and probably a bit better really.)

This scenario only makes sense, I think, if you can somehow guarantee that when A makes a large VLA, B and C definitely will not, etc. But I'm having trouble thinking of how you'd end up with code like that. Most of the time, if you are processing large input in one function and call another, that second function is going to also process large input too, or at best process data of constant size. It isn't going to get smaller.

Maybe your logic sometimes does lots of work in B, and sometimes lots of work in C instead, but only in one or the other? But you can solve that by just creating a stack array in B (or A) and passing a pointer to it down to C, instead of doing another allocation in C. Pointers to stack variables remain valid as long you're somewhere deeper on the stack.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 16, 2023 7:02 UTC (Sat) by ssmith32 (subscriber, #72404) [Link]

Your approach ensures the program always uses the worst-case amount of memory.

For most programs, having a very rare, badly performing worst case is better than having an always occurring worst case that isn't quite as bad, but is still a bit worse than the common case in the "rarely very bad" scenario.

See: usage of quicksort O(n^2) vs mergesort O(nlgn).

Since quicksort is *usually* faster, it often is the better choice, despite having a much, much worse worst case.

In fact, if one always just allocated the worst case statically, there'd actually be no point for heap memory whatsoever - just allocate for the worst case, for anything.

>There is no benefit in converting a program that unconditionally overruns the stack to one that conditionally overruns the stack.

Yes, there is: if the condition is very rare, of course it's far far better to have a program that only (rarely) conditionally overruns the stack, instead of one that always overruns it. The only benefit of having a program that always runs out of memory is if you're selling memory (or trying to convince someone who needs 99.9999% uptime that the worst case will crash on the given hardware, i.e. as a test program)

In fact, the space of programs where it's not preferred to rarely crash instead of always crash is rather small indeed, I would imagine.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 8:47 UTC (Wed) by ballombe (subscriber, #9523) [Link]

Furthermore the default stacksize is 8MB, which was huge 20 years ago but nowadays is really small for data storage.

(An even longer time, before Linux existed, I tried to copy the screen 32k bytes frame buffer in the C stack. It did not end well.)

Stacks are quite useful data structure, but one should use a separate stack for data with a much larger hard limit.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 19:16 UTC (Wed) by epa (subscriber, #39769) [Link] (2 responses)

In user space what is the reason for limiting stack size? I would expect it by default to grow and allocate more memory as needed.

Sure, you might want to set a limit, to catch infinite recursion bugs or unexpectedly high stack memory usage, but that should be something you opt into with ‘ulimit’ or similar.

And if the stack really cannot exceed a few megabytes, the C compiler ought to be capable of allocating variable length arrays somewhere else (and freeing them as the stack unwinds). The classical C alloc() was effectively a stack, as you had to free memory in the reverse order of allocating it.

In kernel space I totally get why the stack size has to be strictly controlled.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 20:06 UTC (Wed) by cmm (guest, #81305) [Link]

> In user space what is the reason for limiting stack size?

In a multi-threaded process, stack space of any thread apart from the main one is limited because the stack has to be mmapped at a particular virtual address upon thread creation and cannot move.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 20:40 UTC (Wed) by excors (subscriber, #95769) [Link]

> And if the stack really cannot exceed a few megabytes, the C compiler ought to be capable of allocating variable length arrays somewhere else (and freeing them as the stack unwinds)

I guessed that might cause problems with setjmp/longjmp, which will restore the stack pointer (effectively undoing any stack allocations) but won't know how to deallocate any automatic heap allocations. But it turns out the C standard already says that any VLAs allocated between the setjmp and longjmp may be leaked; apparently the original Cray Research implementation of VLAs used the heap as a fallback when it ran out of stack space (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n317.pdf). So it's okay to use the heap and just tell programmers not to combine VLAs and longjmp.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 17, 2023 4:06 UTC (Sun) by ianmcc (subscriber, #88379) [Link] (5 responses)

"Dynamic memory has a hidden problem: the size of memory is limited and thus dynamic arrays are limited in size too. If the biggest array you can have doesn't hit memory limits then you may as well just do a static allocation (i.e. constant-sized array) of its max size instead. If the array size is bigger than your memory size or unbounded, then your code has a bug and the fix is to remove the dynamic array. In conclusion: there is no case where dynamic arrays are an advantage."

A GCC -fstack-protector vulnerability on arm64

Posted Sep 17, 2023 10:31 UTC (Sun) by excors (subscriber, #95769) [Link]

I think the significant difference is that heap limits are system-wide, while stack limits are per-thread and much smaller. If you're developing for a multitasking environment, then overallocating heap will harm the user's ability to run other tasks, so you should try to minimise your allocations. But overallocating stack will have negligible impact on the system (unless you have thousands of threads, but you shouldn't do that anyway).

If you're developing for a single-application embedded environment, then I think it often *is* a good idea to calculate your worst-case heap usage and statically allocate that, so you can be sure the application will meet its specification and won't crash from resource exhaustion when given a valid input. Limited dynamicity can be done with statically-sized pool allocators, where higher-level code allocates a whole complex data structure and allocation failure can either be prevented (e.g. by verifying the resource requirements of a request before accepting it, or applying backpressure to a message queue before you get overloaded, etc) or handled gracefully (unwinding the operation and returning a meaningful error to the user), in contrast to a global heap which might fail in any of your many thousands of low-level std::vector/etc operations where it's practically impossible to recover except by crashing and restarting the whole application.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 18, 2023 6:11 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

A typical userspace stack is, what, 8Mb? And a kernel stack is 16kb (yes, with a "k").

And a typical system now has gigabytes of RAM.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 18, 2023 8:31 UTC (Mon) by geert (subscriber, #98403) [Link]

But with "B"s, not "b"s. And actually "Ki" instead of "k" ;-)

A GCC -fstack-protector vulnerability on arm64

Posted Sep 18, 2023 9:13 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> And a kernel stack is 16kb (yes, with a "k").

Hmm. So 4 pages on typical builds. Is the stack still this small even with something like 64k-sized pages?

A GCC -fstack-protector vulnerability on arm64

Posted Sep 18, 2023 9:57 UTC (Mon) by geert (subscriber, #98403) [Link]

arch/powerpc/Kconfig:config THREAD_SHIFT
arch/powerpc/Kconfig- int "Thread shift" if EXPERT
arch/powerpc/Kconfig- range 13 15
arch/powerpc/Kconfig- default "15" if PPC_256K_PAGES
arch/powerpc/Kconfig- default "14" if PPC64
arch/powerpc/Kconfig- default "13"
arch/powerpc/Kconfig- help
arch/powerpc/Kconfig- Used to define the stack size. The default is almost always what you
arch/powerpc/Kconfig- want. Only change this if you know what you are doing.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 13, 2023 7:40 UTC (Wed) by linusw (subscriber, #40300) [Link]

Scoped guards as introduced recently in <linux/cleanup.h> will provide this using compiler extensions, enjoy.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 14, 2023 9:48 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (1 responses)

Ada already in 1981 had variable-sized arrays that not only could be allocated on the stack but also returned to the caller. Moreover, stack overflow errors were properly detected and reported as exceptions. Compilers typically implemented that using a second stack.

I always wondered why C cannot just use something similar? Instead the standard came up with a solution with no possibility to check for stack overflow making VLA unusable in practice.

A GCC -fstack-protector vulnerability on arm64

Posted Sep 14, 2023 13:02 UTC (Thu) by khim (subscriber, #9252) [Link]

> I always wondered why C cannot just use something similar?

The fact that you could do something doesn't mean that you should do something.

> Compilers typically implemented that using a second stack.

At this point it's just better to stop pretending that you are using stack and use heap where you know you are doing that and not have compiler hide these gotches for you.

> Ada already in 1981 had variable-sized arrays that not only could be allocated on the stack but also returned to the caller.

And back then it was the only sensible way to avoid use of heap. And since Ada was unable to use heap safely for decades it was valuable tool. But Ada eventually took ideas from Rust and now that's possible. At that point trying to pretend that you are using stack when you are using heap instead just stopped being good idea.