The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Posted Jun 2, 2022 12:41 UTC (Thu) by Wol (subscriber, #4433)In reply to: The Linux Foundation's "security mobilization plan" by mathstuf
Parent article: The Linux Foundation's "security mobilization plan"
At the end of the day, UB is a *BUG* waiting to bite people, and when statements like "if I detect an access to *freed_ptr it's UB so I can format the hard drive if I like" are literally true, it shows up the C spec as the farce it really is.
C was (and is still seen as) a glorified assembler. It should not be deleting safety checks because "this isn't supposed to happen" when the programmer has determined it can - and does - happen.
I'm quite happy with the C spec offloading responsibility - 6-bit bytes and all that. I'm quite happy with the spec saying "this shouldn't be possible - here be dragons". The problem is when compiler writers spot UB in the spec whether by accident or looking for it, and then CHANGE the behaviour of the compiler.
If the C spec is changed to say "there is no such thing as UB, it is now as a minimum Implementation-Defined, even if the IDB is simply "Don't do it, it's not valid C"", then it will stop C being a quicksand of nasty surprises every time the compiler is upgraded.
Otherwise we will soon end up (and quite possibly a lot quicker than people think possible) where the core of the Linux kernel is written in Rust, and the only C left will be legacy drivers.
Cheers,
Wol
Posted Jun 2, 2022 14:04 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
The problem is the third case: the behavior *may* be deterministic or non-deterministic but the compiler doesn't have enough information to be sure. It may depend on other parts of the program, such as separately-compiled library code, or on runtime inputs. It's not necessarily unreasonable to say that every program (every translation unit) must be provably well-defined regardless of the input it receives or what any other part of the program might do—some languages do take that approach—but the result would not look much like C, and likely would not be suitable for system-level programming due to dependencies on platform-specific behavior the compiler knows nothing about. Most, if not all, currently valid C programs would fail to compile under those rules.
When the behavior of a piece of code is known to *always* be undefined the compiler generally produces a warning, which you can escalate into an error (with -Werror) if you so choose.
Posted Jun 2, 2022 14:36 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
Note that no compiler (anyone wants to use) will actually insert such code on its own. But, if you have code in your program that does that, there is nothing saying that it won't be called by some consequence of the UB (say it flips a bit in a function pointer or jump table). I mean, I guess you could make a C compiler that is absolutely paranoid and actually believes that *freed_ptr can happen at any time and therefore inject pointer verification code around all pointer uses, but I suspect that you then lose the "glorified assembly" property because you have all this invisible sanity checking.
The rules for UB are there because they are assumptions that need to be there to reasonably contemplate what code does in any instance. If race conditions exist, every read is a hazard because any thread could write to it behind your back (meaning that you need to load from memory on every access instead of hoisting outside the loop). If free'd pointers are usable in any way, any new allocation has to be considered potentially aliased (again, pessimizing access optimizations). If integer overflow is defined, you cannot promote small integers to bigger ones if the instructions are faster because you need to get some specific answer instead of "whatever the hardware does". Or you need to always promote even if the smaller code would have offset the slower instruction because it all fits in the instruction cache now.
Saying that these behaviors are UB and that, if they happen, all bets are off allows you to reason locally about some code. What you want may be possible, but I don't think it looks like what you think it would look like. I mean, I guess you could just say C "translates to the hardware", but that means that you have zero optimizations available because you cannot say "this code acts as-if this other code" because that addition is inside the loop, so it needs to be done on every iteration instead of factored out and done with a simple multiply.
Posted Jun 2, 2022 18:06 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (4 responses)
This is an interesting one, because whether that's true is a function of exactly what calculation you're trying to perform, what the machine's behaviour on overflow is, and what results you're trying to extract from it.
Posted Jun 3, 2022 14:46 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (3 responses)
Thing is, most UB has an obvious outcome. Most UB was because different compilers did different, apparently sensible, things. So where we can, let's codify what it does, even if we have conflicting behaviour between compilers.
And if there isn't a sensible, codifiable behaviour (like accessing *freed_ptr), we simply call it out as something that - if it happens - it's a bug.
In practice, pretty much everything that happens now is "whatever x86_64 does" now, anyway, so let's codify that as the default. If you're compiling for a ones-complement CPU, you know that the hardware is going to behave "strangely" (to modern eyes, at least), and any compiler for that architecture should default to ones-complement.
If you're shifting code between architectures, you expect behaviour to change. You should not get nasty surprises when the only thing that's changed is the compiler version.
Cheers,
Posted Jun 3, 2022 19:17 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Much to the amusement of C developers fielding questions from others using different compilers I'm sure. At least today we can say "C doesn't let anyone do that" even if different compilers do different things in response to the code; there's a common base to code against (the C abstract machine). But now with this proposal, one could say "GCC-C works here, use a better compiler".
I'm also sure all of the embedded developers will enjoy having x86-isms forced upon them for their resource-constrained devices instead of considering that maybe *their* behavior should be preferred given that beefy x86 machines tend to have the horsepower to do something special.
Posted Jun 4, 2022 10:22 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
So we have switches that say "I want the hardware-defined behaviour, not the C default behaviour". Which I said should be the case several posts ago!
As others have said, the problem with UB is that - GCC IN PARTICULAR - they take advantage of UB to change the compiler behaviour without warning.
And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR and PROVIDE SWITCHES IF BEHAVIOUR CHANGES is going to force x86isms onto embedded developers? Surely it's going to HELP because they WON'T have x86isms stealth-thrust on them!
Cheers,
Posted Jun 6, 2022 14:33 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
I'm all for better warning emission, but I think the number of false positives would be overwhelming (unless you truly never use macros).
> And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR
You're talking about emergent behavior here. No one consciously said "let's trash user code". People wrote optimization transformations that are valid given code that follows C's rules. When putting these together, you can end up making mush of code that wasn't valid to begin with. What optimization rule should have this detection attached to it?
I mean, I guess you could just use `-O0` since that seems to be what you want anyways: take the code I wrote, as I wrote it, and output instructions for the CPU. But I suspect you'll be sad at how much work *you'll* now have to do to get better performing code (not that it matters in a lot of cases, but when it does, it does).
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
