Who's afraid of a big bad optimizing compiler?

Posted Jul 25, 2019 14:13 UTC (Thu) by jerojasro (guest, #98169)
Parent article: Who's afraid of a big bad optimizing compiler?

all of these caveats seem to me such a huge, and needless, burden on C programmers.

I know, the first thing the article says is: "the C standard grants the compiler the right to make some assumptions that end up causing these weird/non-obvious things if you don't guard against them", but I must ask: why can't the compiler grow an optimization mode that does the right thing (and thus avoid all of the issues documented here) in the presence of global variables and concurrent execution? that situation (globals+concurrence) is so common, that solving it in the compiler would benefit most, if not all, C users, in both kernel and user space.

are there any reasons for not fixing this issue (once) in the compiler, other than the amount of work involved (which I guess must be ... not trivial, of course), and this new optimization mode not being standards-compliant?

(I'm not attempting to troll anybody; it just seems to me that what I propose is an obvious solution, and since it's not being used, I'd like to know what obvious thing I'm missing that prevents us from using it)

Who's afraid of a big bad optimizing compiler?

Posted Jul 25, 2019 18:09 UTC (Thu) by excors (subscriber, #95769) [Link] (3 responses)

I expect one of the main reasons is that it's extremely difficult to define "the right thing". Firstly because different people will have different ideas of what's right; and secondly because concurrency and memory models are inherently complex and subtle topics, so it's always difficult to reason about them precisely. See how modern C++ tries to specify it with terminology like "is sequenced before", "carries a dependency to", "inter-thread happens before", etc, all with precise meanings that are almost impossible for a normal human to remember. Java tried to specify a memory model from the start, but got it wrong, and it took a decade to understand the mistakes and fix them.

Probably the other main issue is performance. Particularly with concurrency where CPUs often require explicit memory barriers if you want guaranteed ordering, and if the compiler added memory barriers around every memory access just in case you were using the same memory in another thread, it would be pretty slow. Even if it was only a tiny performance regression, many C programmers care a lot about performance (especially microbenchmark performance) and won't be happy with a new compiler that's measurably slower, and users won't be happy if their application runs slower after updating their kernel. They'd rather have a dangerous but fast compiler, and rely on the programmer being smart enough to avoid those dangerous cases.

Example, read once may or may not be the "right thing".

Posted Apr 9, 2020 5:45 UTC (Thu) by gmatht (guest, #58961) [Link] (2 responses)

We even have an example in the article of different code making incompatible assumptions. Here the code assumes the variable need_to_stop will be read many times.

1 while (!need_to_stop) /* BUGGY!!! */
2     do_something_quickly();

The following code is instead assuming that global_ptr won't change. This could be ensured by only reading global_ptr once.

2 if (global_ptr != NULL &&
3     global_ptr < high_address)
4         do_low(global_ptr);

In general it might be hard to determine which of these two contradictory assumptions the code is making.

Example, read once may or may not be the "right thing".

Posted Apr 9, 2020 8:30 UTC (Thu) by geert (subscriber, #98403) [Link] (1 responses)

Isn't this the reason for the existence of the "volatile" keyword, and why need_to_stop should be annotated with it?

Example, read once may or may not be the "right thing".

Posted Apr 10, 2020 17:22 UTC (Fri) by zlynx (guest, #2285) [Link]

The "volatile" keyword is almost completely useless. It's good for reading memory mapped hardware and signal handlers. And not even signal handlers if you're using multiple threads.

In any thread situation you want an atomic access. Which might be implemented using volatile, but requires more than that such as memory barrier operations.

Who's afraid of a big bad optimizing compiler?

Posted Jul 25, 2019 19:01 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

The short answer is performance; each of the eight transforms in the article permits the generated code to be optimized into much, much faster code, as long as the original C code did what the programmer intended it to do.

That statement ("as long as the original C code did what the programmer intended it to do") is the source of all the pain. The C standard specifies an abstract C machine that directly executes C code, and the job of a compiler is to translate C code into a machine code that has the same semantics as C running on the abstract C machine. However, most C programmers are ignorant of the abstract C machine (not all, but most, including me); instead, they either rely on "my compiler turns it into machine code that does what I want", or "the obvious translation to my preferred machine code does what I want". Each of these interpretations of "what my C means" leads to its own set of problems:

"My compiler turns it into machine code that does what I want" ignores compiler bugs, portability issues, optimizations that produce reasonable results for your test vectors but not inputs you're not testing etc. In other words, writing a compiler that complies with this level of expectation means being bug-for-bug compatible with all old versions; if that's what you want/need, why are you updating your compiler to begin with? Even new CPU support can break this level.
"The obvious translation to my preferred machine code does what I want" is more insidious; the issue is that they're reading their C code as-if it's implemented as assembly macros that generate the machine code they want. The problem here is that C is not a macro assembler - it's a full blown portable programming language - and while their assumptions probably hold true for a specific compiler version and optimization settings on a single processor model, they fall over on other processor models and with other optimizations that are allowed by the C language.

In short, it's hard, because of C's legacy as "basically one step above a macro assembler for the PDP-11".

Who's afraid of a big bad optimizing compiler?

Posted Jul 26, 2019 15:08 UTC (Fri) by jerojasro (guest, #98169) [Link] (1 responses)

hmm, I was thinking that the compiler only had to look for "global variables" (things syntactically declared as global), and add barriers/avoid reordering/etc.

but after reading both of your comments, I realized (hope I'm correct), that globals are *both*: things declared as global, *and also* anything allocated in the heap. And that last part is what makes automating all of these checks/guards such a performance hit, and worth bothering the programmer with handling manually the unsafe situations.

Man I'm glad I just concatenate strings for a living.

Who's afraid of a big bad optimizing compiler?

Posted Jul 26, 2019 15:32 UTC (Fri) by farnz (subscriber, #17727) [Link]

It's a bit worse than that - things in C are global in the sense of "need concurrency-awareness baked-in" if they are syntactically global, or if there is ever a pointer to the thing. That last covers all heap allocations, and any stack allocations whose address you take with &, and in turn means that you have to add barriers etc to all heap allocations plus some stack allocations.

And, of course, this is only going to help code that does not execute correctly on the C machine, as it ignores the semantics of the C machine. We don't want to strengthen the semantic guarantees of the C machine, since that results in the output code running slower (it needs more barriers, as more of the "internal" effects are now externally visible). So it only actually helps developers who dont actually understand what they're telling the computer to do - while this may be a common state, it's not necessarily something to encourage.

Who's afraid of a big bad optimizing compiler?

Posted Jul 26, 2019 18:18 UTC (Fri) by andresfreund (subscriber, #69562) [Link] (1 responses)

> I know, the first thing the article says is: "the C standard grants the compiler the right to make some assumptions that end up causing these weird/non-obvious things if you don't guard against them", but I must ask: why can't the compiler grow an optimization mode that does the right thing (and thus avoid all of the issues documented here) in the presence of global variables and concurrent execution? that situation (globals+concurrence) is so common, that solving it in the compiler would benefit most, if not all, C users, in both kernel and user space.

I'd say that C11/C++11 made a large step in that direction, by having a formalized memory model, and builtin atomics. Before that there really was no way to not rely on compiler implementation details to get correctly working (even though formally undefined) concurrent programs.

It does require you however to actually use the relevant interfaces.

I don't quite see how you'd incrementally get to a language that doesn't have any of these issues, without making it close to impossible to ever incrementally move applications towards that hypothetical version of C. I mean there's basically no language that allows to use shared memory and doesn't require escape hatches from its safety mechanisms to implement fast concurrent datastructures (e.g. rust needing to go to unsafe for core pieces). And the languages that get closes require enough of a different approach that it's hard to imagine C going towards it.

That's not to say that C/C++ have sufficiently progressed towards allowing to at least opt into safety. The C11/C++11 memory model and the atomics APIs are a huge step, but it's happened at the very least 10 years too late (while some of the formalisms where developed somewhat more recently, there ought to at least have been some progress before then). And there's plenty other issues where no proper ways are provided (e.g. signed integer overflow handling, mentioned in nearby comments).

Who's afraid of a big bad optimizing compiler?

Posted Jul 29, 2019 16:44 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

I would say 30 years too late rather than just 10, but yes. There was a surprisingly large amount of concurrent C code written during that 30 years prior to C11. It is only natural for people to want to discount that code as "broken" so it can be ignored, but some of that code is rather important. The problem is that we do not have a reasonable way to identify the problem areas, even for the small fraction of that code for which source code is publicly available.

So there are great opportunities for innovations in the area of locating old concurrent C code in need of an upgrade! :-)