The Linux Foundation's "security mobilization plan"

Posted Jun 2, 2022 12:41 UTC (Thu) by Wol (subscriber, #4433)
In reply to: The Linux Foundation's "security mobilization plan" by mathstuf
Parent article: The Linux Foundation's "security mobilization plan"

Let's rephrase that then. "The behaviour is non-deterministic". In which case, you are either writing a statistical program in which case the compiler should know that non-derministic behaviour is acceptable, OR the compiler has detected a BUG in which case it should fail to produce an executable! After all, isn't that pretty much what ALL compilers apart from C do?

At the end of the day, UB is a *BUG* waiting to bite people, and when statements like "if I detect an access to *freed_ptr it's UB so I can format the hard drive if I like" are literally true, it shows up the C spec as the farce it really is.

C was (and is still seen as) a glorified assembler. It should not be deleting safety checks because "this isn't supposed to happen" when the programmer has determined it can - and does - happen.

I'm quite happy with the C spec offloading responsibility - 6-bit bytes and all that. I'm quite happy with the spec saying "this shouldn't be possible - here be dragons". The problem is when compiler writers spot UB in the spec whether by accident or looking for it, and then CHANGE the behaviour of the compiler.

If the C spec is changed to say "there is no such thing as UB, it is now as a minimum Implementation-Defined, even if the IDB is simply "Don't do it, it's not valid C"", then it will stop C being a quicksand of nasty surprises every time the compiler is upgraded.

Otherwise we will soon end up (and quite possibly a lot quicker than people think possible) where the core of the Linux kernel is written in Rust, and the only C left will be legacy drivers.

Cheers,
Wol

The Linux Foundation's "security mobilization plan"

Posted Jun 2, 2022 14:04 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> "The behaviour is non-deterministic". In which case, you are either writing a statistical program in which case the compiler should know that non-derministic behaviour is acceptable, OR the compiler has detected a BUG in which case it should fail to produce an executable!

The problem is the third case: the behavior *may* be deterministic or non-deterministic but the compiler doesn't have enough information to be sure. It may depend on other parts of the program, such as separately-compiled library code, or on runtime inputs. It's not necessarily unreasonable to say that every program (every translation unit) must be provably well-defined regardless of the input it receives or what any other part of the program might do—some languages do take that approach—but the result would not look much like C, and likely would not be suitable for system-level programming due to dependencies on platform-specific behavior the compiler knows nothing about. Most, if not all, currently valid C programs would fail to compile under those rules.

When the behavior of a piece of code is known to *always* be undefined the compiler generally produces a warning, which you can escalate into an error (with -Werror) if you so choose.

The Linux Foundation's "security mobilization plan"

Posted Jun 2, 2022 14:36 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (5 responses)

> when statements like "if I detect an access to *freed_ptr it's UB so I can format the hard drive if I like" are literally true, it shows up the C spec as the farce it really is.

Note that no compiler (anyone wants to use) will actually insert such code on its own. But, if you have code in your program that does that, there is nothing saying that it won't be called by some consequence of the UB (say it flips a bit in a function pointer or jump table). I mean, I guess you could make a C compiler that is absolutely paranoid and actually believes that *freed_ptr can happen at any time and therefore inject pointer verification code around all pointer uses, but I suspect that you then lose the "glorified assembly" property because you have all this invisible sanity checking.

The rules for UB are there because they are assumptions that need to be there to reasonably contemplate what code does in any instance. If race conditions exist, every read is a hazard because any thread could write to it behind your back (meaning that you need to load from memory on every access instead of hoisting outside the loop). If free'd pointers are usable in any way, any new allocation has to be considered potentially aliased (again, pessimizing access optimizations). If integer overflow is defined, you cannot promote small integers to bigger ones if the instructions are faster because you need to get some specific answer instead of "whatever the hardware does". Or you need to always promote even if the smaller code would have offset the slower instruction because it all fits in the instruction cache now.

Saying that these behaviors are UB and that, if they happen, all bets are off allows you to reason locally about some code. What you want may be possible, but I don't think it looks like what you think it would look like. I mean, I guess you could just say C "translates to the hardware", but that means that you have zero optimizations available because you cannot say "this code acts as-if this other code" because that addition is inside the loop, so it needs to be done on every iteration instead of factored out and done with a simple multiply.

The Linux Foundation's "security mobilization plan"

Posted Jun 2, 2022 18:06 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (4 responses)

> If integer overflow is defined, you cannot promote small integers to bigger ones if the instructions are faster because you need to get some specific answer instead of "whatever the hardware does".

This is an interesting one, because whether that's true is a function of exactly what calculation you're trying to perform, what the machine's behaviour on overflow is, and what results you're trying to extract from it.

The Linux Foundation's "security mobilization plan"

Posted Jun 3, 2022 14:46 UTC (Fri) by Wol (subscriber, #4433) [Link] (3 responses)

I guess this is where the C spec should say "x86_64 follows IEEEwotsit rules, therefore when compiling for x86_64 that's what you get".

Thing is, most UB has an obvious outcome. Most UB was because different compilers did different, apparently sensible, things. So where we can, let's codify what it does, even if we have conflicting behaviour between compilers.

And if there isn't a sensible, codifiable behaviour (like accessing *freed_ptr), we simply call it out as something that - if it happens - it's a bug.

In practice, pretty much everything that happens now is "whatever x86_64 does" now, anyway, so let's codify that as the default. If you're compiling for a ones-complement CPU, you know that the hardware is going to behave "strangely" (to modern eyes, at least), and any compiler for that architecture should default to ones-complement.

If you're shifting code between architectures, you expect behaviour to change. You should not get nasty surprises when the only thing that's changed is the compiler version.

Cheers,
Wol

The Linux Foundation's "security mobilization plan"

Posted Jun 3, 2022 19:17 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

> So where we can, let's codify what it does, even if we have conflicting behaviour between compilers.

Much to the amusement of C developers fielding questions from others using different compilers I'm sure. At least today we can say "C doesn't let anyone do that" even if different compilers do different things in response to the code; there's a common base to code against (the C abstract machine). But now with this proposal, one could say "GCC-C works here, use a better compiler".

I'm also sure all of the embedded developers will enjoy having x86-isms forced upon them for their resource-constrained devices instead of considering that maybe *their* behavior should be preferred given that beefy x86 machines tend to have the horsepower to do something special.

The Linux Foundation's "security mobilization plan"

Posted Jun 4, 2022 10:22 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

> I'm also sure all of the embedded developers will enjoy having x86-isms forced upon them for their resource-constrained devices instead of considering that maybe *their* behavior should be preferred given that beefy x86 machines tend to have the horsepower to do something special.

So we have switches that say "I want the hardware-defined behaviour, not the C default behaviour". Which I said should be the case several posts ago!

As others have said, the problem with UB is that - GCC IN PARTICULAR - they take advantage of UB to change the compiler behaviour without warning.

And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR and PROVIDE SWITCHES IF BEHAVIOUR CHANGES is going to force x86isms onto embedded developers? Surely it's going to HELP because they WON'T have x86isms stealth-thrust on them!

Cheers,
Wol

The Linux Foundation's "security mobilization plan"

Posted Jun 6, 2022 14:33 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> they take advantage of UB to change the compiler behaviour without warning.

I'm all for better warning emission, but I think the number of false positives would be overwhelming (unless you truly never use macros).

> And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR

You're talking about emergent behavior here. No one consciously said "let's trash user code". People wrote optimization transformations that are valid given code that follows C's rules. When putting these together, you can end up making mush of code that wasn't valid to begin with. What optimization rule should have this detection attached to it?

I mean, I guess you could just use `-O0` since that seems to be what you want anyways: take the code I wrote, as I wrote it, and output instructions for the CPU. But I suspect you'll be sad at how much work *you'll* now have to do to get better performing code (not that it matters in a lot of cases, but when it does, it does).