Zero initialization
Zero initialization
Posted Apr 10, 2021 12:38 UTC (Sat) by excors (subscriber, #95769)In reply to: Zero initialization by milesrout
Parent article: Cook: Security things in Linux v5.9
In C you can access a static variable without explicitly initialising it, and that is well-defined behaviour (it's automatically initialised to zero before program startup). Is that wrong and a bug? Why should it be different for stack variables that aren't explicitly initialised?
I can't see any good reason for the difference, in terms of helping programmers write correct code. I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack) and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences), and then C/C++ kept the same behaviour mainly because it's what C/C++ programmers were already used to, not because it's actually a good design choice.
> I'd much rather have a compiler warning telling me that the compiler couldn't guarantee that every code path leading to a particular line of code initialised the variable
That sounds like the existing -Wuninitialized flag, though that has the challenges mentioned in https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings . It can never be perfect, even in the simple case where a variable is initialised in an "if (function_that_always_returns_true())" block, because the compiler doesn't always have visibility into that function (it might be in a separately-compiled file/library). But it's hard to even do a decent job - if the compiler doesn't do e.g. constant propagation and dead code elimination before the warnings then the programmer might get thousands of false positives in code that's obviously never going to be executed, which will make them unhappy and they'll probably just remove the warning flag; but if the compiler does do some optimisation then it's tricky to keep track of the uninitialisedness correctly.
Despite the compiler developers' best efforts, evidently the warnings aren't accurate enough for the kernel to rely on, since it still wants the automatic initialisation as a fallback.
Posted Apr 10, 2021 16:14 UTC (Sat)
by hummassa (subscriber, #307)
[Link] (3 responses)
A static variable is initialized even if it doesn't have a explicit initializer. A stack variable isn't. You can say "ooh, it's not orthogonal" (it's not). But it's not what milesrout was referring to.
> I can't see any good reason for the difference, in terms of helping programmers write correct code.
This was not one of the prioritized goals of the C (and/or the C++) language.
> I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack)
The current reason is still performance. Just because you have a 1990s supercomputer-level CPU/RAM/storage with ample global connectivity in your pocket doesn't mean every C program will run in similar conditions (or C++, if you remember Ingenuity is running C++ on Mars right now)
> and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences)
Now you are just being facetious. People might, even when they have ample understanding of the security consequences, opt for performance. Especially if the alternative is inviabilizing a project.
> and then C/C++ kept the same behaviour mainly because it's what C/C++ programmers were already used to, not because it's actually a good design choice.
C kept the same behaviour for the reasons I stated above.
C++ actually has the same reasons, plus "upgrade path from C" and "one should not pay for what one does not use."
Posted Apr 10, 2021 18:58 UTC (Sat)
by excors (subscriber, #95769)
[Link] (2 responses)
That's currently true but the thread was discussing a hypothetical change to C where stack variables would be automatically initialised in the same way, and it sounded like milesrout thought code that relied on the automatic initialisation would be "wrong" even in that new language where its behaviour is well-defined, so I was wondering why it'd be any wronger that existing C code that relies on the automatic initialisation of statics (which seems to be widely accepted as a reasonable and safe thing to do).
> The current reason is still performance.
Modern compilers are pretty good at optimising, so most of the zero initialisations will be eliminated when the compiler realises they're guaranteed to be overwritten later. Microsoft did that for the Windows kernel (limited to POD types, not arrays or C++ classes) and says the performance regression was "noise-level for most tests", with potential for more compiler optimisation to let them remove the POD limitation. (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)
Compilers were much less sophisticated when C was originally designed, so the tradeoffs were different then.
In Microsoft's version, and in the original suggestion in this thread, there are still ways to opt out in code that's particularly performance sensitive.
> Just because you have a 1990s supercomputer-level CPU/RAM/storage with ample global connectivity in your pocket doesn't mean every C program will run in similar conditions (or C++, if you remember Ingenuity is running C++ on Mars right now)
Ingenuity uses Snapdragon 801, i.e. a quad-core 2.5GHz CPU with 2GB RAM (plus the Hexagon DSP that runs most of the flight code), so it's not the best example of a resource-constrained device.
Posted Apr 10, 2021 20:10 UTC (Sat)
by hummassa (subscriber, #307)
[Link]
Ok
> Compilers were much less sophisticated when C was originally designed, so the tradeoffs were different then.
I can concede this point, with the caveat that there are microcontroller (and even microcomputing) platforms FAR less powerful than the Snapdragon (think 8 or 16-bit processors, with RAM as low as 4Kbytes) and the compilers to such platforms not always are on par with the advances of gcc/clang/msvc.
Posted Apr 11, 2021 4:23 UTC (Sun)
by milesrout (subscriber, #126894)
[Link]
Of course code that relies on the automatic initialisation wouldn't be wrong. The problem is that wrong code that fails to initialise a variable has no way of giving warnings, because the compiler or static analysis tool has no way to detect that 'zero' is an invalid or unwanted value for that variable in that bit of code.
If I write 'struct foo f;' and then a code path fails to initialise f somewhere, at present the compiler can at least attempt to warn me that I've failed to do so. If It's implicitly zero-initialised then the compiler has no way to know whether:
1. I intended to not initialise it, because I'm relying on automatic zero-initialisation of variables, OR
My concerns have nothing to do with performance.
Posted Apr 11, 2021 4:18 UTC (Sun)
by milesrout (subscriber, #126894)
[Link] (1 responses)
Sometimes it is a bug and sometimes it is not and the language gives you no way to tell whether or not it is. I consider that a bad thing. Static variables should never have been implicitly initialised, but it's obviously far too late to change that behaviour now so it's an irrelevant consideration really.
>I guess the original reason was performance (zeroing .bss is much cheaper than zeroing every stack)
Indeed the original reason was performance.
>and a lack of interest in minimising undefined behaviour (because of a lack of understanding of the security consequences),
There were no security consequences for undefined behaviour. The concept that *all* undefined behaviour is inherently a massive security hole that lets the compiler do literally any arbitrary thing is a relatively new invention based on a peculiar interpretation of the C and C++ standards by compiler developers keen to improve their scores on microbenchmarks. The contents of uninitialised variables being undefined was never meant to mean that the compiler could simply assume that code paths that didn't initialise those variables would and could never happen before other code that used those variables. Today your compiler might see 'int x; if (condition1) { x = 1; } else if (condition2) { x = 2; } else { } printf("%d\n", x);' and feel that it has the right to assume (condition1 || condition2) andis a r never even check the second condition. The standard was written as a generalisation over what existing implementations did. What is actually intended behaviour is what if you run that code it will print 1, or it will print 2, or it will print some random garbage on the stack. That might be a security hole, of course, if the function executed immediately before this one had, say, a secret key on the stack in that position. But to use a different example, it was never intended to give compilers carte blanche to do things like eliding explicit null checks for safety in code because some other code that happened to be inlined nearby assumed incorrectly the same pointer was able to be dereferenced.
>That sounds like the existing -Wuninitialized flag, though that has the challenges mentioned in https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings . It can never be perfect, even in the simple case where a variable is initialised in an "if (function_that_always_returns_true())" block, because the compiler doesn't always have visibility into that function (it might be in a separately-compiled file/library). But it's hard to even do a decent job - if the compiler doesn't do e.g. constant propagation and dead code elimination before the warnings then the programmer might get thousands of false positives in code that's obviously never going to be executed, which will make them unhappy and they'll probably just remove the warning flag; but if the compiler does do some optimisation then it's tricky to keep track of the uninitialisedness correctly.
Yes it has challenges as does any kind of static code analysis. Nonetheless it immediately becomes completely useless when 'uninitialised' ceases to be a category of variable entirely, being replaced with 'implicitly initialised to zero'.
Posted Apr 11, 2021 6:59 UTC (Sun)
by dtlin (subscriber, #36537)
[Link]
Reading uninitialized memory was always undefined in C to allow for these architectures, that's not a modern invention.
Zero initialization
Zero initialization
Zero initialization
Zero initialization
2. I forgot to initialise it, but it's okay because zero is what I would have initialised it to anyway, OR
3. I forgot to initialise it, and it being zero means there's a gaping security hole in my code.
Zero initialization
Zero initialization