Cook: Security things in Linux v5.9

[Posted April 6, 2021 by corbet]

Kees Cook has posted a long list of security-related improvements that made it into the 5.9 kernel release. "Sasha Levin, Andy Lutomirski, Chang S. Bae, Andi Kleen, Tony Luck, Thomas Gleixner, and others landed the long-awaited FSGSBASE series. This provides task switching performance improvements while keeping the kernel safe from modules accidentally (or maliciously) trying to use the features directly (which exposed an unprivileged direct kernel access hole)."

Zero initialization

Posted Apr 7, 2021 10:32 UTC (Wed) by epa (subscriber, #39769) [Link] (10 responses)

That zero initialization fix looks really useful. Why oh why can't the C standard make this the default, and for those who are convinced that leaving an int uninitialized conveys some kind of performance benefit, create an explicit syntax 'int x = uninitialized;'?

Zero initialization

Posted Apr 9, 2021 18:47 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

If you were sure you should change the C language definition, introducing a new keyword ("uninitialized") then it seems more sensible to have the language definition outlaw uninitialized access entirely, like Rust.

I think most people who want zero initialization want it for the same "Do What I Mean" reason they think unsafe NULL checks [a NULL check after the pointer has been dereferenced may be elided by optimising compilers because the language definition says that can't happen, so no need to do it] should work - they're sure their program is correct, so the fact it doesn't actually work must be somebody else's fault, not theirs. This is wrong, every time I write nonsense the compiler should give me a diagnostic saying "That's nonsense", not emit some program that almost inevitably isn't what I intended. My favourite thing about the period of my career when I wrote a lot of Java was that most of the times when I wrote code that's nonsense it didn't compile. Hooray. That doesn't happen enough in C.

Zero initialization

Posted Apr 9, 2021 19:33 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

> Why oh why can't the C standard make this the default

My first thought is "probably because it breaks an even bigger pile of C89 code, some of which is probably functioning perfectly, than any of the previous breaking changes did".

Zero initialization

Posted Apr 10, 2021 6:16 UTC (Sat) by milesrout (subscriber, #126894) [Link] (7 responses)

Accessing uninitialised variables is wrong. It's a bug. If they don't exist, how is a tool that detects bugs meant to know the difference between forgetting to initialise a variable and intentionally accessing a variable that was intentionally left to be automatically zero-initialised by the compiler?

I'd much rather have a compiler warning telling me that the compiler couldn't guarantee that every code path leading to a particular line of code initialised the variable - or some other tool other than the compiler. Then I could either initialise the variable to something meaningful (which for a pointer or integer may indeed be zero, but for many structs would not be!) or I could somehow assert that in fact I know something the compiler doesn't and the compiler can either use that information to figure out the variable is initialised or, at worst, the compiler could just let me assert (maybe with a pragma?) that the variable is actually always initialised at that point.

Zero initialization

Posted Apr 10, 2021 12:38 UTC (Sat) by excors (subscriber, #95769) [Link] (6 responses)

> Accessing uninitialised variables is wrong. It's a bug.

In C you can access a static variable without explicitly initialising it, and that is well-defined behaviour (it's automatically initialised to zero before program startup). Is that wrong and a bug? Why should it be different for stack variables that aren't explicitly initialised?

I can't see any good reason for the difference, in terms of helping programmers write correct code. I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack) and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences), and then C/C++ kept the same behaviour mainly because it's what C/C++ programmers were already used to, not because it's actually a good design choice.

> I'd much rather have a compiler warning telling me that the compiler couldn't guarantee that every code path leading to a particular line of code initialised the variable

That sounds like the existing -Wuninitialized flag, though that has the challenges mentioned in https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings . It can never be perfect, even in the simple case where a variable is initialised in an "if (function_that_always_returns_true())" block, because the compiler doesn't always have visibility into that function (it might be in a separately-compiled file/library). But it's hard to even do a decent job - if the compiler doesn't do e.g. constant propagation and dead code elimination before the warnings then the programmer might get thousands of false positives in code that's obviously never going to be executed, which will make them unhappy and they'll probably just remove the warning flag; but if the compiler does do some optimisation then it's tricky to keep track of the uninitialisedness correctly.

Despite the compiler developers' best efforts, evidently the warnings aren't accurate enough for the kernel to rely on, since it still wants the automatic initialisation as a fallback.

Zero initialization

Posted Apr 10, 2021 16:14 UTC (Sat) by hummassa (subscriber, #307) [Link] (3 responses)

> In C you can access a static variable without explicitly initialising it, and that is well-defined behaviour (it's automatically initialised to zero before program startup). Is that wrong and a bug? Why should it be different for stack variables that aren't explicitly initialised?

A static variable is initialized even if it doesn't have a explicit initializer. A stack variable isn't. You can say "ooh, it's not orthogonal" (it's not). But it's not what milesrout was referring to.

> I can't see any good reason for the difference, in terms of helping programmers write correct code.

This was not one of the prioritized goals of the C (and/or the C++) language.

> I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack)

The current reason is still performance. Just because you have a 1990s supercomputer-level CPU/RAM/storage with ample global connectivity in your pocket doesn't mean every C program will run in similar conditions (or C++, if you remember Ingenuity is running C++ on Mars right now)

> and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences)

Now you are just being facetious. People might, even when they have ample understanding of the security consequences, opt for performance. Especially if the alternative is inviabilizing a project.

> and then C/C++ kept the same behaviour mainly because it's what C/C++ programmers were already used to, not because it's actually a good design choice.

C kept the same behaviour for the reasons I stated above.

C++ actually has the same reasons, plus "upgrade path from C" and "one should not pay for what one does not use."

Zero initialization

Posted Apr 10, 2021 18:58 UTC (Sat) by excors (subscriber, #95769) [Link] (2 responses)

> A static variable is initialized even if it doesn't have a explicit initializer. A stack variable isn't.

That's currently true but the thread was discussing a hypothetical change to C where stack variables would be automatically initialised in the same way, and it sounded like milesrout thought code that relied on the automatic initialisation would be "wrong" even in that new language where its behaviour is well-defined, so I was wondering why it'd be any wronger that existing C code that relies on the automatic initialisation of statics (which seems to be widely accepted as a reasonable and safe thing to do).

> The current reason is still performance.

Modern compilers are pretty good at optimising, so most of the zero initialisations will be eliminated when the compiler realises they're guaranteed to be overwritten later. Microsoft did that for the Windows kernel (limited to POD types, not arrays or C++ classes) and says the performance regression was "noise-level for most tests", with potential for more compiler optimisation to let them remove the POD limitation. (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)

Compilers were much less sophisticated when C was originally designed, so the tradeoffs were different then.

In Microsoft's version, and in the original suggestion in this thread, there are still ways to opt out in code that's particularly performance sensitive.

> Just because you have a 1990s supercomputer-level CPU/RAM/storage with ample global connectivity in your pocket doesn't mean every C program will run in similar conditions (or C++, if you remember Ingenuity is running C++ on Mars right now)

Ingenuity uses Snapdragon 801, i.e. a quad-core 2.5GHz CPU with 2GB RAM (plus the Hexagon DSP that runs most of the flight code), so it's not the best example of a resource-constrained device.

Zero initialization

Posted Apr 10, 2021 20:10 UTC (Sat) by hummassa (subscriber, #307) [Link]

> I was wondering why it'd be any wronger that existing C code that relies on the automatic initialisation of statics

> Compilers were much less sophisticated when C was originally designed, so the tradeoffs were different then.

I can concede this point, with the caveat that there are microcontroller (and even microcomputing) platforms FAR less powerful than the Snapdragon (think 8 or 16-bit processors, with RAM as low as 4Kbytes) and the compilers to such platforms not always are on par with the advances of gcc/clang/msvc.

Zero initialization

Posted Apr 11, 2021 4:23 UTC (Sun) by milesrout (subscriber, #126894) [Link]

>it sounded like milesrout thought code that relied on the automatic initialisation would be "wrong" even in that new language where its behaviour is well-defined, so I was wondering why it'd be any wronger that existing C code that relies on the automatic initialisation of statics (which seems to be widely accepted as a reasonable and safe thing to do).

Of course code that relies on the automatic initialisation wouldn't be wrong. The problem is that wrong code that fails to initialise a variable has no way of giving warnings, because the compiler or static analysis tool has no way to detect that 'zero' is an invalid or unwanted value for that variable in that bit of code.

If I write 'struct foo f;' and then a code path fails to initialise f somewhere, at present the compiler can at least attempt to warn me that I've failed to do so. If It's implicitly zero-initialised then the compiler has no way to know whether:

1. I intended to not initialise it, because I'm relying on automatic zero-initialisation of variables, OR
2. I forgot to initialise it, but it's okay because zero is what I would have initialised it to anyway, OR
3. I forgot to initialise it, and it being zero means there's a gaping security hole in my code.

My concerns have nothing to do with performance.

Zero initialization

Posted Apr 11, 2021 4:18 UTC (Sun) by milesrout (subscriber, #126894) [Link] (1 responses)

>In C you can access a static variable without explicitly initialising it, and that is well-defined behaviour (it's automatically initialised to zero before program startup). Is that wrong and a bug? Why should it be different for stack variables that aren't explicitly initialised?

Sometimes it is a bug and sometimes it is not and the language gives you no way to tell whether or not it is. I consider that a bad thing. Static variables should never have been implicitly initialised, but it's obviously far too late to change that behaviour now so it's an irrelevant consideration really.

>I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack)

Indeed the original reason was performance.

>and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences),

There were no security consequences for undefined behaviour. The concept that *all* undefined behaviour is inherently a massive security hole that lets the compiler do literally any arbitrary thing is a relatively new invention based on a peculiar interpretation of the C and C++ standards by compiler developers keen to improve their scores on microbenchmarks. The contents of uninitialised variables being undefined was never meant to mean that the compiler could simply assume that code paths that didn't initialise those variables would and could never happen before other code that used those variables. Today your compiler might see 'int x; if (condition1) { x = 1; } else if (condition2) { x = 2; } else { } printf("%d\n", x);' and feel that it has the right to assume (condition1 || condition2) andis a r never even check the second condition. The standard was written as a generalisation over what existing implementations did. What is actually intended behaviour is what if you run that code it will print 1, or it will print 2, or it will print some random garbage on the stack. That might be a security hole, of course, if the function executed immediately before this one had, say, a secret key on the stack in that position. But to use a different example, it was never intended to give compilers carte blanche to do things like eliding explicit null checks for safety in code because some other code that happened to be inlined nearby assumed incorrectly the same pointer was able to be dereferenced.

>That sounds like the existing -Wuninitialized flag, though that has the challenges mentioned in https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings . It can never be perfect, even in the simple case where a variable is initialised in an "if (function_that_always_returns_true())" block, because the compiler doesn't always have visibility into that function (it might be in a separately-compiled file/library). But it's hard to even do a decent job - if the compiler doesn't do e.g. constant propagation and dead code elimination before the warnings then the programmer might get thousands of false positives in code that's obviously never going to be executed, which will make them unhappy and they'll probably just remove the warning flag; but if the compiler does do some optimisation then it's tricky to keep track of the uninitialisedness correctly.

Yes it has challenges as does any kind of static code analysis. Nonetheless it immediately becomes completely useless when 'uninitialised' ceases to be a category of variable entirely, being replaced with 'implicitly initialised to zero'.

Zero initialization

Posted Apr 11, 2021 6:59 UTC (Sun) by dtlin (subscriber, #36537) [Link]

https://en.wikipedia.org/wiki/Tagged_architecture

Reading uninitialized memory was always undefined in C to allow for these architectures, that's not a modern invention.