The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Posted May 26, 2022 11:51 UTC (Thu) by wtarreau (subscriber, #51152)In reply to: The Linux Foundation's "security mobilization plan" by nickodell
Parent article: The Linux Foundation's "security mobilization plan"
I'm writing MORE bugs since gcc emits tons of bogus warnings, because shutting them down requires tricks that are all but obvious and that are easier to get wrong than the original code itself. Worse, opensource development is collaborative, and everyone gets inspired by others. As such, the tricks used to work around a compiler's stubbornness are not understood but are happily copied by those who face the same issue. These ones can as well result in quite some vulnerabilities.
The real problem of C right now (and the Rust guys don't deny it) is the vast amount of UB. Just defining an UB-less variant of the standard would be quickly adopted, would allow to get rid of many ugly tricks, would allow compilers to emit much more accurate warnings and would eliminate many bugs.
And this is much cheaper than inventing new languages or porting 30 million lines of code to $language_of_the_year.
Posted May 26, 2022 12:20 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
There have been several attempts at doing variants of C or coding standards to try to ease the transition (Misra C, Cyclone etc) and while atleast some have seen some limited adoption, I don't think they are sufficient. Otherwise, why hasn't it been done already in several decades.
Posted May 26, 2022 18:21 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
What wtarreau seems to propose is a C variant where all UB is transmuted into some combination of "the behaviour of this construct is implementation defined" and "this construct is forbidden". I imagine a conservative version of that looking like:
Signed integer overflow? Well, it can trap, or it can wrap, or it can saturate, and it can refuse to compile code that provably causes signed integer overflow, but the implementation must say what it does. Caveats like "unless someone has altered the CPU control registers behind the implementation's back" are taken as read.
Assigning the value of a double of great magnitude to a float? Well, it can trap, or store Inf, or store a qNaN, or store an sNaN, or do whatever the OS has configured the CPU to do, and it can refuse to compile code that provably does it, but the implementation has to say what it does. Caveats like "unless someone has altered the CPU control registers behind the implementation's back" are taken as read.
Read or write through a null pointer? Well, it can trap, or it can return the contents of the address corresponding to the bit pattern used for the stored representation of NULL, or whatever, but the implementation must say what it does. ("What the hardware and OS decide to do when you dereference the address represented by the all-zeroes bit pattern" is an acceptable answer.)
memcpy() between overlapping regions? It can copy forward, or backward, or from the middle out, or from the outside in, and it can refuse to compile code that provably does it, but whatever the result, the implementation has to document what it does.
Invoking a function without having a prototype in scope is forbidden.
Defining a function without a prototype is forbidden.
If two pointers have fully identical bit patterns, they refer to the same object.
etc.
Posted May 28, 2022 8:59 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link]
First it has places where it requires certain constructions that the authors believe (in some cases with good justification from studies showing this works, in others it seems like just a weird style preference) are safer or encourage safer programming.
For example MISRA insists you use C's switch statement to write exhaustive and irrefutable matches. You must write break after each case clause, and you must provide a default case and these are to be presented in a particular order.
Second it has a bunch of unenforceable rules about engineering practice. For example your code should be documented and tested. C doesn't actually provide you with any help actually doing this, but MISRA does at least tell you that you should do it.
I'm sure MISRA or anything like it would make wtarreau miserable while providing very little benefit.
Posted May 26, 2022 20:00 UTC (Thu)
by roc (subscriber, #30627)
[Link] (22 responses)
Logic errors introduced by working around gcc warnings don't predict similar errors will occur in a language like Rust. Of course patching up a broken language like C requires tradeoffs that are unnecessary in languages that are better designed from the start.
Posted May 27, 2022 11:39 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link] (21 responses)
That's exactly why I think it must not be done via a committee nor by seeking a consensus, but it must
The difficulty would be to make projects like gcc or clang accept to merge a patch implementing such a standard variant, mostly because it would ruin their efforts that consist in eliminating non-perfectly compliant code for the sole purpose of showing higher numbers than the competitor :-(
Posted May 27, 2022 12:59 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
IMO, the better path would to get compilers to make noise when they do things that rely on UB to justify the transformation (I mean, you're looking to change the behavior in these cases anyways…). If this means that every `if (ptr)` check in a macro makes noise because it is expanded after a `*ptr` usage in a function, then so be it…maybe you should use fewer macros ;) .
Posted May 27, 2022 23:35 UTC (Fri)
by roc (subscriber, #30627)
[Link]
Again, people have been talking about this problem for a long time. No-one has done anything about it. I think that's because it's incredibly hard and has a low likelihood of success. If you think it's not hard, it may be up to you to do something about it.
Posted May 27, 2022 23:38 UTC (Fri)
by roc (subscriber, #30627)
[Link] (13 responses)
Posted May 28, 2022 9:57 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
You could define the language as-if there's really just this huge fixed memory space and the allocator assigns you pieces of that space, but you can just ignore it whenever you like and directly index into the space. The allocator probably has to be defined in a pretty suboptimal way to have this work, and obviously your runtime performance is not good, but hey, no more Undefined Behaviour.
For example in this model use-after-free does what they "expect", if the allocator didn't give this memory to anybody else yet it just works. Buffer overflows likewise, if the vector of 32-bit integers is right next to the output string, overflowing it just scribbles on the string. Everything has this defined (but often horrible) behaviour in exchange for giving up performance.
All the cases where an object can have invalid representation need resolving (because now it can be accessed via a pointer with the wrong type), again I think the way to do this is sacrifice performance. For example probably we want to just carve off some pointers as invalid, we can insert an arithmetic check for every pointer dereference to accomplish this. For enumerated types we explicitly define that all the invalid representations are fine, this sort of C programmer always thought of enumerated types as just integers anyway so they'd probably be astonished it matters. Why shouldn't day_of_week be 46 sometimes instead of WEDNESDAY?
Posted May 28, 2022 21:33 UTC (Sat)
by roc (subscriber, #30627)
[Link]
Posted May 30, 2022 21:56 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (10 responses)
That again is easy to define. That memory does not belong to the program. Therefore it's a bug, and the consequences of executing faulty code are undefineable, not undefined.
The whole point of getting rid of UB is to enable the programmer to reason about the code, and draw reasonable inferences therefrom. A bug is asking a program to perform an action which cannot be reasoned about. As soon as you define that, your UB becomes "undefineable behaviour", and outside the scope of a valid program.
You need to take the attitude that "if it CAN be defined, it MUST be defined. If it CAN'T be defined, then any program attempting it is invalid".
Cheers,
Posted May 31, 2022 13:08 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (8 responses)
Could you please be specific what you mean here? What is the difference? What should the compiler do if it detects "undefineable" behavior at compile time? If the compiler sees this, you have an invalid program (per your words), so what's wrong with the compiler saying "assume that doesn't happen" (because you can do anything from such a situation) and then optimizing based on it? At that point…what's the difference?
Posted Jun 2, 2022 12:41 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (7 responses)
At the end of the day, UB is a *BUG* waiting to bite people, and when statements like "if I detect an access to *freed_ptr it's UB so I can format the hard drive if I like" are literally true, it shows up the C spec as the farce it really is.
C was (and is still seen as) a glorified assembler. It should not be deleting safety checks because "this isn't supposed to happen" when the programmer has determined it can - and does - happen.
I'm quite happy with the C spec offloading responsibility - 6-bit bytes and all that. I'm quite happy with the spec saying "this shouldn't be possible - here be dragons". The problem is when compiler writers spot UB in the spec whether by accident or looking for it, and then CHANGE the behaviour of the compiler.
If the C spec is changed to say "there is no such thing as UB, it is now as a minimum Implementation-Defined, even if the IDB is simply "Don't do it, it's not valid C"", then it will stop C being a quicksand of nasty surprises every time the compiler is upgraded.
Otherwise we will soon end up (and quite possibly a lot quicker than people think possible) where the core of the Linux kernel is written in Rust, and the only C left will be legacy drivers.
Cheers,
Posted Jun 2, 2022 14:04 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
The problem is the third case: the behavior *may* be deterministic or non-deterministic but the compiler doesn't have enough information to be sure. It may depend on other parts of the program, such as separately-compiled library code, or on runtime inputs. It's not necessarily unreasonable to say that every program (every translation unit) must be provably well-defined regardless of the input it receives or what any other part of the program might do—some languages do take that approach—but the result would not look much like C, and likely would not be suitable for system-level programming due to dependencies on platform-specific behavior the compiler knows nothing about. Most, if not all, currently valid C programs would fail to compile under those rules.
When the behavior of a piece of code is known to *always* be undefined the compiler generally produces a warning, which you can escalate into an error (with -Werror) if you so choose.
Posted Jun 2, 2022 14:36 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
Note that no compiler (anyone wants to use) will actually insert such code on its own. But, if you have code in your program that does that, there is nothing saying that it won't be called by some consequence of the UB (say it flips a bit in a function pointer or jump table). I mean, I guess you could make a C compiler that is absolutely paranoid and actually believes that *freed_ptr can happen at any time and therefore inject pointer verification code around all pointer uses, but I suspect that you then lose the "glorified assembly" property because you have all this invisible sanity checking.
The rules for UB are there because they are assumptions that need to be there to reasonably contemplate what code does in any instance. If race conditions exist, every read is a hazard because any thread could write to it behind your back (meaning that you need to load from memory on every access instead of hoisting outside the loop). If free'd pointers are usable in any way, any new allocation has to be considered potentially aliased (again, pessimizing access optimizations). If integer overflow is defined, you cannot promote small integers to bigger ones if the instructions are faster because you need to get some specific answer instead of "whatever the hardware does". Or you need to always promote even if the smaller code would have offset the slower instruction because it all fits in the instruction cache now.
Saying that these behaviors are UB and that, if they happen, all bets are off allows you to reason locally about some code. What you want may be possible, but I don't think it looks like what you think it would look like. I mean, I guess you could just say C "translates to the hardware", but that means that you have zero optimizations available because you cannot say "this code acts as-if this other code" because that addition is inside the loop, so it needs to be done on every iteration instead of factored out and done with a simple multiply.
Posted Jun 2, 2022 18:06 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (4 responses)
This is an interesting one, because whether that's true is a function of exactly what calculation you're trying to perform, what the machine's behaviour on overflow is, and what results you're trying to extract from it.
Posted Jun 3, 2022 14:46 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (3 responses)
Thing is, most UB has an obvious outcome. Most UB was because different compilers did different, apparently sensible, things. So where we can, let's codify what it does, even if we have conflicting behaviour between compilers.
And if there isn't a sensible, codifiable behaviour (like accessing *freed_ptr), we simply call it out as something that - if it happens - it's a bug.
In practice, pretty much everything that happens now is "whatever x86_64 does" now, anyway, so let's codify that as the default. If you're compiling for a ones-complement CPU, you know that the hardware is going to behave "strangely" (to modern eyes, at least), and any compiler for that architecture should default to ones-complement.
If you're shifting code between architectures, you expect behaviour to change. You should not get nasty surprises when the only thing that's changed is the compiler version.
Cheers,
Posted Jun 3, 2022 19:17 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Much to the amusement of C developers fielding questions from others using different compilers I'm sure. At least today we can say "C doesn't let anyone do that" even if different compilers do different things in response to the code; there's a common base to code against (the C abstract machine). But now with this proposal, one could say "GCC-C works here, use a better compiler".
I'm also sure all of the embedded developers will enjoy having x86-isms forced upon them for their resource-constrained devices instead of considering that maybe *their* behavior should be preferred given that beefy x86 machines tend to have the horsepower to do something special.
Posted Jun 4, 2022 10:22 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
So we have switches that say "I want the hardware-defined behaviour, not the C default behaviour". Which I said should be the case several posts ago!
As others have said, the problem with UB is that - GCC IN PARTICULAR - they take advantage of UB to change the compiler behaviour without warning.
And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR and PROVIDE SWITCHES IF BEHAVIOUR CHANGES is going to force x86isms onto embedded developers? Surely it's going to HELP because they WON'T have x86isms stealth-thrust on them!
Cheers,
Posted Jun 6, 2022 14:33 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
I'm all for better warning emission, but I think the number of false positives would be overwhelming (unless you truly never use macros).
> And I completely fail to see why requiring compiler developers to DOCUMENT CURRENT BEHAVIOUR
You're talking about emergent behavior here. No one consciously said "let's trash user code". People wrote optimization transformations that are valid given code that follows C's rules. When putting these together, you can end up making mush of code that wasn't valid to begin with. What optimization rule should have this detection attached to it?
I mean, I guess you could just use `-O0` since that seems to be what you want anyways: take the code I wrote, as I wrote it, and output instructions for the CPU. But I suspect you'll be sad at how much work *you'll* now have to do to get better performing code (not that it matters in a lot of cases, but when it does, it does).
Posted May 31, 2022 19:15 UTC (Tue)
by roc (subscriber, #30627)
[Link]
Claiming to have eliminated undefined behavior by defining some of it and choosing another name for the rest would be rather misleading.
Posted May 30, 2022 21:44 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (4 responses)
> That's exactly why I think it must not be done via a committee nor by seeking a consensus, but it must be done exactly the same way the problems were introduced in the first place: purely centralized and imposed.
Or you simply require that each compiler must declare what it does, and provide a switch to enable other options, possibly with a majority vote of the committee saying which options must be provided.
That way, UB becomes "hardware defined" or "OS defined" or "implementation defined", but UNdefined becomes forbidden by the standard. And if a new version of the standard defines the "majority behaviour" as the new standard (or even if it doesn't), then ALL compilers MUST provide a switch to enable old behaviour whenever a compiler changes its default.
Cheers,
Posted May 31, 2022 13:06 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
> And if a new version of the standard defines the "majority behaviour" as the new standard (or even if it doesn't), then ALL compilers MUST provide a switch to enable old behaviour whenever a compiler changes its default.
Do you know how many C compilers were found by the committee some time ago? It's in the hundreds. How do you propose to get a majority of these to declare any meaningful behavior of `*freed_ptr` other than "good luck" without some (probably serious) performance impairments?
Posted Jun 2, 2022 12:19 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
Well, I think the following definition would work pretty well ... "accessing *freed_ptr accesses memory that no longer belongs to the program, with potentially disastrous consequences". Isn't that good enough? It's pretty accurate!
Likewise "the result of accessing *null_ptr either behaves as defined by the hardware/OS, or must be defined by the compiler".
If you take the attitude "it's too big a problem so I'm not going to try" then of course you're going to fail. All *I'M* asking is to take the attitude "let's make a start and see how far we get". And as a first step, simply saying "there will be no such thing as undefined behaviour in the C spec, all behaviour MUST be DOCUMENTED", will get us a long way. Even if the C spec says "whatever the hardware does", at least we have a reference document so the programmers know what to *expect*. (And if C *is* a glorified assembler, as many people think of it, then "whatever the hardware does" is indeed a good, and valid, definition).
And while *freed_ptr is an easy example of "you shouldn't do this" there's a lot more less obvious examples. At least if the C spec says "you must document what you do, and provide flags if you change it", then it's going to reign in pretty much all the compiler writers doing stupid things. And then if the compiler users start saying "what the brass monkeys do you think you're doing!!!" we might start getting compilers that actually stop shooting their users in the feet.
Cheers,
Posted Jun 2, 2022 13:53 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
int *const p = (int*)malloc(sizeof(int));
Say the second malloc() happens to return the same address as the first, which is perfectly valid since that memory was freed. The memory _does_ belong to the program, but now it's part of a different object. The consequences of writing to it cannot be defined by the hardware or the compiler as they depend on the rest of the program, implementation details of the C library, and the particular circumstances present at runtime (e.g. other threads allocating or freeing memory).
I'll grant you "potentially disastrous consequences"—but that's just another way of saying "undefined behavior".
> Likewise "the result of accessing *null_ptr either behaves as defined by the hardware/OS, or must be defined by the compiler".
This is a slightly more reasonable request, since it can be accomplished portably by inserting checks for null pointers at every point where a potentially-null pointer might be dereferenced. Of course this comes at a considerable cost in performance in situations where there is no reliable hardware trap for dereferencing a null pointer, such as most systems without MMUs. If you're doing away with the strict aliasing rules as well then you'll need even more checks since previously valid pointers could be changed to null pointers as a side effect of seemingly unrelated memory accesses.
Posted Jun 16, 2022 15:02 UTC (Thu)
by nix (subscriber, #2304)
[Link]
You don't seem to realise what "undefined behaviour" means. Undefined behaviour is simply any behaviour that has no definition within the standard. The set of such behaviours is, of course, almost infinite: almost every C implementation has at least some things that go beyond the C standard. Any call to a library function not defined within the standard is undefined behaviour, so as soon as you have separate compilation you are doing a lot of stuff that (from the perspective of a compiler running over some other translation unit) is not defined (which is why it has to assume that e.g. arbitrary things can happen to memory across function calls). It is quite possible that such functions are not written in C at all.
POSIX is similarly undefined: the C standard does not define it (though POSIX is more or less a superset of it and includes the entire C standard by reference, this is certainly not true of *everything* that impacts systems that run C code but is actually rather rare).
In places, the standard does specifically call out that some behaviours are not defined in addition to simply not defining them, but even if you changed all those places to explicitly nail down a definition, there would still be countless places *with no corresponding text in the standard* where some element of the behaviour of a system running C code was not defined in the standard (as you'd expect when there was no text there). A good thing too, or C wouldn't be much use anywhere.
Posted May 29, 2022 23:36 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
be done exactly the same way the problems were introduced in the first place: purely centralized and imposed. Most of the choices would either follow what any relevant hardware does (e.g. integer overflows just wrap) or what costs the least (e.g. deciding whether a shift by nbits gives zero or nop, let's just compare output code for both cases on x86 and ARM and pick the cheapest one). Integer promotion is a total mess (how come an unsigned char be promoted to signed int?), and will need to be more intuitive (probably by respecting signedness first), etc. As long as the choices follow what some implementations already do, only already broken code would require adaptation, and that broken code is currently getting sabottaged by gcc and clang anyway.
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
Wol
The Linux Foundation's "security mobilization plan"
free(p);
int *const q = (int*)malloc(sizeof(int));
*q = 5;
*p = 7; /* use after free */
if (*q != 5) launch_missiles();
The Linux Foundation's "security mobilization plan"
The Linux Foundation's "security mobilization plan"
