LWN: Comments on "Who's afraid of a big bad optimizing compiler?"

Who's afraid of a big bad optimizing compiler?

farnz — Sun, 10 May 2020 21:16:37 +0000

Couple of reasons:

volatile is not necessarily defined clearly enough on all architectures to be what you want. In the event that it's not sufficient on a given architecture, you can make ACCESS_ONCE do the right thing. I don't believe this applies to the kernel, but it's something to be aware of.
Even on architectures where volatile does do the right thing, it causes pessimisation of all accesses to that object, not just those that need ACCESS_ONCE semantics. '

A different coding style wouldn't need to worry about the second case, as it would cast away volatile instead of using ACCESS_ONCE for cases that don't need ACCESS_ONCE semantics, but for better or worse, the Linux coding style doesn't work that way.

Who's afraid of a big bad optimizing compiler?

rep_movsd — Sun, 10 May 2020 16:39:35 +0000

Forgive me for being so dense, but why is there an ACCESS_ONCE macro rather than declaring objects that are accessed by multiple threads as volatile?

Example, read once may or may not be the "right thing".

zlynx — Fri, 10 Apr 2020 17:22:13 +0000

The "volatile" keyword is almost completely useless. It's good for reading memory mapped hardware and signal handlers. And not even signal handlers if you're using multiple threads.

In any thread situation you want an atomic access. Which might be implemented using volatile, but requires more than that such as memory barrier operations.

Example, read once may or may not be the "right thing".

geert — Thu, 09 Apr 2020 08:30:50 +0000

Isn't this the reason for the existence of the "volatile" keyword, and why need_to_stop should be annotated with it?

Example, read once may or may not be the "right thing".

gmatht — Thu, 09 Apr 2020 05:45:29 +0000

We even have an example in the article of different code making incompatible assumptions. Here the code assumes the variable need_to_stop will be read many times.

1 while (!need_to_stop) /* BUGGY!!! */
2     do_something_quickly();

The following code is instead assuming that global_ptr won't change. This could be ensured by only reading global_ptr once.

2 if (global_ptr != NULL &&
3     global_ptr < high_address)
4         do_low(global_ptr);

In general it might be hard to determine which of these two contradictory assumptions the code is making.

Who's afraid of a big bad optimizing compiler?

plugwash — Fri, 25 Oct 2019 15:14:33 +0000

Just curious, why did you "not want to rely on casting to unsigned"?

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Fri, 16 Aug 2019 15:06:39 +0000

It will be interesting to see Writing A Kernel Driver in Rust at the upcoming Linux Plumbers Conference. From what I have seen, things like split counters, sequence locks, and base RCU (as opposed to RCU use cases like RcuCell) have been carefully avoided in earlier Rust-language Linux-kernel drivers. Maybe they will take them head-on in this talk. Either way, looking forward to it!

I agree that encapsulating difficult-to-get-right things into libraries is a most excellent strategy where it applies, but there are often things that don't fit well into a safe API. Again, I find a number of Rust's approaches to be of interest, but I do not believe that Rust is quite there yet. As I have said in other forums, my current feeling is that current Rust is to the eventual low-level concurrency language as DEC Alpha is to modern microprocessors.

Who's afraid of a big bad optimizing compiler?

mathstuf — Fri, 16 Aug 2019 14:10:19 +0000

> I imagine there are cases where it's infeasible to provide a guaranteed-Safe interface to an interesting bit of Unsafe code

I suspect things like RcuCell are fine because you can still teach Rust about them through an API. For my keyutils library[1], it's a bit harder because, while there's no pointer juggling, the lifetime of things are tied to kernel objects outside of the Rust code's control. Luckily synchronization is handled by the kernel as well, so I don't have to worry about *that*, but it does mean that your "owned" data is not really such since they're just handles. It does mean that pretty much every function needs to return a Result<> because basically any operation can end up with a `ENOKEY` result if something else ends up deleting it out from underneath you.

[1] https://github.com/mathstuf/rust-keyutils

Who's afraid of a big bad optimizing compiler?

mathstuf — Fri, 16 Aug 2019 14:06:17 +0000

> I spend longer getting code to actually compile than I did with C11 or C++17, but it then works (both when tested and in production) more often than my C11 or C++17 did.

A note is that the amount of *thinking* is decreased overall. Rust front-loads it on the compilation phase (with a decent amount of hand holding) while C likes to back-load it on the debugging phase with a possibly unbounded timesink (where I, at times, feel like gdb is watching me like the Chesire cat). The number of times our Rust stuff has fallen over (mostly "unresponsive"[1] or hook deserialization errors[2]) is less than a dozen. But, an error (not a crash!) in the deserializer is much easier to diagnose than "someone threaded a `NULL` *here*‽" questions I find myself asking in C/C++.

[1] Someone pushed a branch as a backport that caused the robot to go checking ~1000 commits with ~30 checks each. The thinking here was actually about how to detect that we should avoid checking so much work (since the branch is obviously too new to backport). It was churning through it as best it could, but it was still taking ~20 minutes to do the work.
[2] GitLab (and GitHub) webhook docs are so anemic around the types they are using for stuff, so that a field is nullable is sometimes surprising. I can't wait for GraphQL to become more widespread :) .

Who's afraid of a big bad optimizing compiler?

farnz — Fri, 16 Aug 2019 13:24:13 +0000

In terms of evidence for unsafety not growing, there's Redox OS, which has so far been successful at encapsulating unsafe code (at least in terms of apparent safety - it'd take a formal audit to be confident that there are no erroneous safe wrappers). Similarly, Servo (and the parts backported to Firefox) show that it's also practical to keep unsafe to a small area when working in a big project like a web browser.

My practical experience of Rust is that I have yet to encounter a piece of code which has to be unsafe and cannot provide a safe interface to the rest of your project; I have encountered several places where unsafe is used because it's easier to violate UB guarantees than to design a good interface. As an example, I've written several FFIs to internal interfaces where the C code has a "flags tell you which pointers in this structure are safe to access" design, and I've ended up creating a new datastructure that has nested Rust structs and makes use of Option to identify which sub-structs are valid right now.

Personally, I see "unsafe" as a review marker; it tells you that in this block of code, the programmer is making use of features that could lead to UB if abused (the Rust compiler warns about unnecessary use of unsafe), and that you need to be confident that they are upholding the "no UB via Safe code" guarantee that the compiler expects. It's by no means a perfect system, and as Actix shows, it can be abused, but it does show you where you need to hold the code author to a higher standard than normal.

FWIW, having spent the last 2 years working in Rust, I think it's a decent advance on C; I spend longer getting code to actually compile than I did with C11 or C++17, but it then works (both when tested and in production) more often than my C11 or C++17 did.

Who's afraid of a big bad optimizing compiler?

excors — Fri, 16 Aug 2019 12:54:33 +0000

> safe Rust contains no way to trigger the dreaded Undefined Behaviour

It may be more accurate to say it can't *directly* trigger Undefined Behaviour. It can still indirectly trigger UB if it calls into imperfectly-implemented Unsafe code with an interface that's marked as Safe.

I expect it takes a lot of discipline and skill to write non-trivial Unsafe modules that guarantee it's absolutely impossible to trigger UB through their Safe interface. https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html gives the example of BTreeMap with a Safe-but-incorrect Ord, which is the kind of subtlety that seems likely to trip up many programmers writing their own Unsafe code in pursuit of performance. And there's the situation described in https://dev.to/deciduously/the-trials-and-tribulations-of... where a popular Rust framework had "an understanding gap regarding what unsafe means in Rust", leading to much unhappiness.

(It still seems better than C/C++ where you have to apply discipline and skill to *all* code, if you don't want your application to crash all the time. Rust tells you where that attention needs to be focused. But you still need good programmers and good review processes to avoid safety problems sneaking in.)

Based on zero practical experience of Rust, I imagine there are cases where it's infeasible to provide a guaranteed-Safe interface to an interesting bit of Unsafe code (like particularly clever synchronisation primitives, though RcuCell may be a counterexample). You can give it an Unsafe interface and make it the caller's responsibility to guarantee safety, because it's easier to make guarantees at a higher level, but the tradeoff is you've now got more code running unsafely. Rust seems to rely on the hypothesis that, in practice, unsafe code can be kept small and isolated, so a large majority of the codebase gets the benefits of guaranteed safety - is there much evidence for or against that yet? (particularly when writing kernel-like code that has to deal with a lot of fundamentally unsafe hardware)

Who's afraid of a big bad optimizing compiler?

farnz — Fri, 16 Aug 2019 10:10:44 +0000

I wonder if there's a human language barrier here; Rust uses "unsafe" code to mean "the human in front of the computer is responsible for proving absence of data races and invalid pointer dereferences". The Nomicon" describes what, exactly, unsafe opens up to you - basically, safe Rust contains no way to trigger the dreaded Undefined Behaviour, while unsafe Rust provides for 6 ways to trigger UB if you misuse language features. The idea is that you carefully encapsulate the risk of UB into well-validated and tested code, and provide a safe Rust interface on top that is guaranteed to not have UB no matter how it's used.

For example, the implementation of a RwLock is full of unsafe code. It has to be to implement a concurrency primitive - the underlying platform is a C ABI, which is implicitly unsafe in Rust, as the compiler cannot verify that your C is free of UB. However, as a user of RwLocks, I don't need to care about the unsafety of RwLock internals - I can trust that the humans who've reviewed and tested that code ensured that safety guarantees are met.

I would expect the same to apply to advanced synchronization - you have to use unsafe to implement it, because you're doing things the compiler cannot verify are defined behaviour, but the resulting primitives are safe because there's no way to misuse them. There is an implementation of RCU in Rust that shows how this works - internally, it's unsafe; for starters, it uses the bottom 2 bits of a pointer as tag values, and it keeps raw pointers around, converting them back and forth to references (effectively safe pointers), but the external interface is entirely safe Rust.

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Thu, 15 Aug 2019 19:19:00 +0000

I will feel better about Rust's concurrency model when I start hearing something other than "unsafe" in response to queries about implementing advanced synchronization techniques. Don't get me wrong, there is much to like about Rust's notion of ownership, but Rust appears to also need the notion of existence.

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Thu, 15 Aug 2019 19:15:40 +0000

There were some concerns about specific machines that have since proved groundless, so the atomic_thread_fence() wording has (quite) recently been upgraded.

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Thu, 15 Aug 2019 19:11:48 +0000

But both C and C++ do now allow shared variables to be designated as such, for example, using C _Atomic and C++ atomic<T>. Although these are not perfect, they can be quite helpful. Especially given that they allow easy use of lighter-weight and more-scalable synchronization mechanisms than MVars and TVars appear to enable.

The problem is that these C and C++ features were not introduced into either standard until 2011, some decades after both languages were first used to write concurrent code. So there is quite a bit of production code that does not mark shared variables, because there was no way to do so at the time that code was written.

So again, this is a golden opportunity for people who can come up with ways of locating vulnerable concurrent code in large production concurrent C and C++ code bases!

Who's afraid of a big bad optimizing compiler?

jezuch — Tue, 13 Aug 2019 15:17:18 +0000

> A language which gets imperative programming with shared mutable variables right, ironically, is the functional language Haskell.

And, I would guess, Rust. Which is not surprising because this was an explicit design goal.

Who's afraid of a big bad optimizing compiler?

dcoutts — Sat, 10 Aug 2019 13:28:38 +0000

It would seem that the fundamental problem here is that C does not distinguish between private variables and variables that may be shared with other threads.

You have this combination
* C semantics doesn't talk about shared variables
* almost all variables are private
* C compilers want to optimise everything
* assuming variables are shared would destroy many optimisations

Which ends up as a mess when you do have shared variables. If they were explicitly identified then the compiler could do the right things with them (given the machine's memory model) and since there's so few of them it would not affect optimisations in general.

A language which gets imperative programming with shared mutable variables right, ironically, is the functional language Haskell. It has two types of shared mutable variables, MVars which are like a mutable variable protected by a lock, and TVars which are part of the Software Transactional Memory concurrency feature. Of course these also are library APIs, that are able to enforce safe access (reading the MVar and taking the lock are inseparable, and TVars can only be read inside STM transactions).

But even in a low level language where you don't try to enforce safe concurrency, simply distinguishing shared variables from other variables seems like it would go a long way to resolving this confusion and the C compiler's difficulties.

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Mon, 29 Jul 2019 16:44:43 +0000

I would say 30 years too late rather than just 10, but yes. There was a surprisingly large amount of concurrent C code written during that 30 years prior to C11. It is only natural for people to want to discount that code as "broken" so it can be ignored, but some of that code is rather important. The problem is that we do not have a reasonable way to identify the problem areas, even for the small fraction of that code for which source code is publicly available.

So there are great opportunities for innovations in the area of locating old concurrent C code in need of an upgrade! :-)

Who's afraid of a big bad optimizing compiler?

Wol — Sat, 27 Jul 2019 09:36:30 +0000

Do you mean the compiler optimised away the tests for overflow, on the basis that overflow couldn't happen? Crazy! :-)

Surely a simple way for the standards committee would simply be to define a bunch of pragmas that said "this implementation-defined feature is required", and say that the compiler must terminate with a fatal error if it's not supported.

Okay, that then means that the poor programmer needs to add a bunch of requirements at the start of his program, but it also means that if he attempts to port to a different architecture that doesn't support his assumptions, the compiler will refuse to compile. Much better than everything appearing to work and then falling over in a heap in production.

Cheers,
Wol

Who's afraid of a big bad optimizing compiler?

andresfreund — Fri, 26 Jul 2019 18:18:02 +0000

> I know, the first thing the article says is: "the C standard grants the compiler the right to make some assumptions that end up causing these weird/non-obvious things if you don't guard against them", but I must ask: why can't the compiler grow an optimization mode that does the right thing (and thus avoid all of the issues documented here) in the presence of global variables and concurrent execution? that situation (globals+concurrence) is so common, that solving it in the compiler would benefit most, if not all, C users, in both kernel and user space.

I'd say that C11/C++11 made a large step in that direction, by having a formalized memory model, and builtin atomics. Before that there really was no way to not rely on compiler implementation details to get correctly working (even though formally undefined) concurrent programs.

It does require you however to actually use the relevant interfaces.

I don't quite see how you'd incrementally get to a language that doesn't have any of these issues, without making it close to impossible to ever incrementally move applications towards that hypothetical version of C. I mean there's basically no language that allows to use shared memory and doesn't require escape hatches from its safety mechanisms to implement fast concurrent datastructures (e.g. rust needing to go to unsafe for core pieces). And the languages that get closes require enough of a different approach that it's hard to imagine C going towards it.

That's not to say that C/C++ have sufficiently progressed towards allowing to at least opt into safety. The C11/C++11 memory model and the atomics APIs are a huge step, but it's happened at the very least 10 years too late (while some of the formalisms where developed somewhat more recently, there ought to at least have been some progress before then). And there's plenty other issues where no proper ways are provided (e.g. signed integer overflow handling, mentioned in nearby comments).

Who's afraid of a big bad optimizing compiler?

andresfreund — Fri, 26 Jul 2019 18:11:12 +0000

I agree. Although I still don't understand why the C11/C++11 memory model chose to provide barriers not tied to variables in such an oddly defined way. IIRC - and it's been a while since I tried to parse the standard's language - you can't really use them to incrementally upgrade from compiler-specific barriers, without also converting all other memory to go through C11/C++11 atomics.

Who's afraid of a big bad optimizing compiler?

andresfreund — Fri, 26 Jul 2019 18:03:54 +0000

> Spent a week recently re-implementing __builtin_add_overflow (et al) on Windows to work around the compiler eliding someone's if checks. It's... complicated to get right. I found two satisfactory solutions:

Indeed. I did the same for postgres, and it's quite hard to get it right *and* performant. Especially when you want to support 64bit integers and you do not want to rely on casting to unsigned. While I think I got it right on a high-level, I'm far from certain that I avoided all subtly possible breakages. IIRC we have three different implementations of e.g. signed 64bit integer multiplication (__builtin_mul_overflow, cast to 128 bit integers, pre-multiplication check for all dangers). I really hate having to do that, when it's imo the languages job to provide something reasonable.

Who's afraid of a big bad optimizing compiler?

andresfreund — Fri, 26 Jul 2019 17:57:27 +0000

> E.g. for any shared data structure, there's probably some single-threaded initialisation code that sets it up before it's exposed to other threads. If the structure was declared with volatile/atomic fields, the compiler may add barriers in the initialisation code that the programmer knows are unnecessary. So the programmer might choose to explicitly use READ_ONCE/WRITE_ONCE, improving performance but increasing the risk of missing a case where it's actually required. Which way is "better" depends on how you weigh performance vs correctness.

More important than initialization are probably all the accesses inside sections of code holding a lock, where you likely do not want unnecessary repeat accesses to memory, just because a variable is referenced twice. Because acquiring spinlock/mutex/whatever commonly acts as a barrier of some sorts (depending on the implementation somewhere between a full memory barrier, acquire/release barrier, compiler barrier and nothing), it is *often* unnecessary to use READ_ONCE / WRITE_ONCE from within. Although if there's other accesses working without that lock, or if you have more fine grained locking schemes (say exclusive, exclusive write, read), it's possibly still necessary to use them even within a locked section.

There's also the fact that volatile on datastructures itself often ends up requiring annoying casts to get rid of it, which then makes the code more fragile too (lest you use some other type of smart macro that keeps the type the same except for removing the volatile).

Who's afraid of a big bad optimizing compiler?

excors — Fri, 26 Jul 2019 17:18:01 +0000

I think that's related to the answer to Quick Quiz 8. Not all accesses to a shared variable require this protection, and the protection is not free. If the variable is declared as volatile or atomic then every access will be effectively READ_ONCE/WRITE_ONCE, which prevents a sufficiently clever programmer from only paying the performance cost of READ_ONCE/WRITE_ONCE when it is strictly necessary.

E.g. for any shared data structure, there's probably some single-threaded initialisation code that sets it up before it's exposed to other threads. If the structure was declared with volatile/atomic fields, the compiler may add barriers in the initialisation code that the programmer knows are unnecessary. So the programmer might choose to explicitly use READ_ONCE/WRITE_ONCE, improving performance but increasing the risk of missing a case where it's actually required. Which way is "better" depends on how you weigh performance vs correctness.

Who's afraid of a big bad optimizing compiler?

farnz — Fri, 26 Jul 2019 15:32:28 +0000

It's a bit worse than that - things in C are global in the sense of "need concurrency-awareness baked-in" if they are syntactically global, or if there is ever a pointer to the thing. That last covers all heap allocations, and any stack allocations whose address you take with &, and in turn means that you have to add barriers etc to all heap allocations plus some stack allocations.

And, of course, this is only going to help code that does not execute correctly on the C machine, as it ignores the semantics of the C machine. We don't want to strengthen the semantic guarantees of the C machine, since that results in the output code running slower (it needs more barriers, as more of the "internal" effects are now externally visible). So it only actually helps developers who dont actually understand what they're telling the computer to do - while this may be a common state, it's not necessarily something to encourage.

Who's afraid of a big bad optimizing compiler?

jerojasro — Fri, 26 Jul 2019 15:08:44 +0000

hmm, I was thinking that the compiler only had to look for "global variables" (things syntactically declared as global), and add barriers/avoid reordering/etc.

but after reading both of your comments, I realized (hope I'm correct), that globals are *both*: things declared as global, *and also* anything allocated in the heap. And that last part is what makes automating all of these checks/guards such a performance hit, and worth bothering the programmer with handling manually the unsafe situations.

Man I'm glad I just concatenate strings for a living.

Who's afraid of a big bad optimizing compiler?

topimiettinen — Fri, 26 Jul 2019 13:09:54 +0000

The article mentions that 'volatile' and atomic variables are exempt from the optimizations. If the variables are accessed concurrently, wouldn't it be better to use 'volatile' or atomic variables then instead of using these macros? If use of 'volatile' or atomic variables pessimizes the code too much, couldn't that be improved instead (also for the benefit of non-kernel code)? Maybe the effects of 'volatile' could be split to a number more fine grained GCC attributes which could be used in place of 'volatile', so the accesses obey desired rules?

-Topi

Who's afraid of a big bad optimizing compiler?

ksandstr — Fri, 26 Jul 2019 10:53:02 +0000

>[...] the C11/C++11 memory model where objects which are accessed concurrently are to be appropriately marked at the source level.

I'd like to point out that it's not sufficient to tag values subject to concurrent write, but rather all consumers of those values should be marked by use of explicitly atomic primitives. Such as those in stdatomic.h, which will presumably stay consistent[0] despite e.g. future ultra-LTO exposing damage previously hidden by an intermodule boundary. This is because the semantics of concurrently-written data are generally different from those of data in a single-threaded or "classic C" program where it isn't intended that changes to shared state are visible outside of the current thread or the reverse, so it's necessary to indicate non-classic usage to the reader at point of interaction and declaration both.

>The right thing, particularly for user space code, is to learn the C11/C++11 memory model and use those facilities.

The C11 memory model can also be studied as first the "classic C" of single-threaded programs, where the compiler may yield an unrecognizably bizarre but correct result for reasons in the article; and by then laying over a set of diffs for "concurrently interacting C", which is just as the old but allows source to indicate that certain parts expect to publish or consume and should appear accordingly from out-of-thread[1]. Most importantly the latter allows execution of the former unmodified when it's known that concurrent interactions won't occur, such as between callsites of POSIX-style mutexes where synchronization is hidden behind a function call boundary, or indicated to a deeply LTO compiler by syscall or atomic primitive.

This is in contrast to ideas about forcing correct concurrent interactions through manipulation of the compiler output (sans memory-clobbering asm volatile), which seem like a sediment of bubblegum fixes compared to C11 atomics, or to even the C99 idea of the standard language as semantics that constrain the compiler over a naïve 1:1-ish mapping between statement and machine code.

[0] unimplementable ordering semantics aside; how come nobody smelled the cr^Wsufficiently smart compiler from a mile away? (and how come I used a born-defunct option in real code, thinking it'd one day do something besides introduce subtle breakage?)
[1] e.g. in a shared memory segment

Who's afraid of a big bad optimizing compiler?

farnz — Thu, 25 Jul 2019 19:01:43 +0000

The short answer is performance; each of the eight transforms in the article permits the generated code to be optimized into much, much faster code, as long as the original C code did what the programmer intended it to do.

That statement ("as long as the original C code did what the programmer intended it to do") is the source of all the pain. The C standard specifies an abstract C machine that directly executes C code, and the job of a compiler is to translate C code into a machine code that has the same semantics as C running on the abstract C machine. However, most C programmers are ignorant of the abstract C machine (not all, but most, including me); instead, they either rely on "my compiler turns it into machine code that does what I want", or "the obvious translation to my preferred machine code does what I want". Each of these interpretations of "what my C means" leads to its own set of problems:

"My compiler turns it into machine code that does what I want" ignores compiler bugs, portability issues, optimizations that produce reasonable results for your test vectors but not inputs you're not testing etc. In other words, writing a compiler that complies with this level of expectation means being bug-for-bug compatible with all old versions; if that's what you want/need, why are you updating your compiler to begin with? Even new CPU support can break this level.
"The obvious translation to my preferred machine code does what I want" is more insidious; the issue is that they're reading their C code as-if it's implemented as assembly macros that generate the machine code they want. The problem here is that C is not a macro assembler - it's a full blown portable programming language - and while their assumptions probably hold true for a specific compiler version and optimization settings on a single processor model, they fall over on other processor models and with other optimizations that are allowed by the C language.

In short, it's hard, because of C's legacy as "basically one step above a macro assembler for the PDP-11".

Who's afraid of a big bad optimizing compiler?

excors — Thu, 25 Jul 2019 18:09:09 +0000

I expect one of the main reasons is that it's extremely difficult to define "the right thing". Firstly because different people will have different ideas of what's right; and secondly because concurrency and memory models are inherently complex and subtle topics, so it's always difficult to reason about them precisely. See how modern C++ tries to specify it with terminology like "is sequenced before", "carries a dependency to", "inter-thread happens before", etc, all with precise meanings that are almost impossible for a normal human to remember. Java tried to specify a memory model from the start, but got it wrong, and it took a decade to understand the mistakes and fix them.

Probably the other main issue is performance. Particularly with concurrency where CPUs often require explicit memory barriers if you want guaranteed ordering, and if the compiler added memory barriers around every memory access just in case you were using the same memory in another thread, it would be pretty slow. Even if it was only a tiny performance regression, many C programmers care a lot about performance (especially microbenchmark performance) and won't be happy with a new compiler that's measurably slower, and users won't be happy if their application runs slower after updating their kernel. They'd rather have a dangerous but fast compiler, and rely on the programmer being smart enough to avoid those dangerous cases.

Who's afraid of a big bad optimizing compiler?

jerojasro — Thu, 25 Jul 2019 14:13:42 +0000

all of these caveats seem to me such a huge, and needless, burden on C programmers.

I know, the first thing the article says is: "the C standard grants the compiler the right to make some assumptions that end up causing these weird/non-obvious things if you don't guard against them", but I must ask: why can't the compiler grow an optimization mode that does the right thing (and thus avoid all of the issues documented here) in the presence of global variables and concurrent execution? that situation (globals+concurrence) is so common, that solving it in the compiler would benefit most, if not all, C users, in both kernel and user space.

are there any reasons for not fixing this issue (once) in the compiler, other than the amount of work involved (which I guess must be ... not trivial, of course), and this new optimization mode not being standards-compliant?

(I'm not attempting to troll anybody; it just seems to me that what I propose is an obvious solution, and since it's not being used, I'd like to know what obvious thing I'm missing that prevents us from using it)

Who's afraid of a big bad optimizing compiler?

rweikusat2 — Mon, 22 Jul 2019 12:06:33 +0000

The C standard uses an abstract machine to define the semantics of the language. But real code is certainly not written to run on "abstract machines" and the C standard also contains the nice statement that "a conforming program is one acceptable to a conforming implementation". That's not the same as a strictly conformimg program (another term from the C standard) which would restrict itself to the functionality defined in this document.

Who's afraid of a big bad optimizing compiler?

PaulMcKenney — Sun, 21 Jul 2019 11:11:31 +0000

I was using ones-complement systems in the 1980s, and there were ones-/two-complement debates at that time, but fair enough on your point about the introduction dates.

If I remember correctly, the usual way to do unsigned arithmetic on ones-complement-only systems was to force the sign bit clear after each addition and subtraction operation. And the ones-complement CDC 6600 systems used floating point to do integer multiplication and division, which required tricks normally used in multiple-precision arithmetic to get a 60-bit unsigned result out of floating-point operations with 48-bit mantissas. One reason that this worked was that there was a separate instruction for normalizing floating-point numbers, as well as another instruction that produced the lower 48 bits of the 96-bit product of two 48-bit mantissas.

Fun times! ;-)

Who's afraid of a big bad optimizing compiler?

anton — Sat, 20 Jul 2019 13:31:35 +0000

You could say that you would prefer that the compiler _never_ make optimization choices based on a deduction of "undefined behavior means that this can never happen", but it turns out that this deduction has been useful in quite a lot of ways (only one of which is making small loops fast) and people in practice turn out to object strenuously to significant performance regressions from their compiler.

You are insinuating that not performing such "optimizations" produces significant performance regressions. Funnily, for all their talk about the performance advantages of such "optimizations", the advocates of "optimizations" hardly ever produce numbers (and the one time I have seen numbers, they were from an unnamed source for another compiler). One would expect that, if there were significant performance benefits from "optimizations", their advocates would parade them up and down.

If there is no significant performance benefit from "optimizations", there is also no need to keep any "knowledge" around that is derived from the assumption that undefined behaviour does not occur.

In any case, the better approach would be to give optimization hints to programmers. That requires changes in just a few places instead of "sanitizing" the whole program all the time, and the consequences of not doing it are far less severe. E.g., if the use of wraparound int for a local variable would lead to sign-extension operations in hot code, the compiler could suggest to change that local variable to, say, (wraparound) long.

Who's afraid of a big bad optimizing compiler?

anton — Sat, 20 Jul 2019 13:07:48 +0000

So from the viewpoint of someone in 1990, requiring wrap on signed integer overflow would have required substantial clairvoyance.

Not at all. No significant new architecture with something other than twos-complement representation for signed integers was introduced after 1970. They sure saw during C standardization that the future was twos-complement , but they wanted to support the (then) present which included descendants of architectures from the 1960s.

That being said, even on these architectures implementing twos-complement would not be that expensive. They already support unsigned integers with wraparound (or C would not have standardized that). For +, -, and *, the same operations can be used for twos-complement arithmetics; inequality comparisons and possibly division would become a little more expensive, though.

Who's afraid of a big bad optimizing compiler?

pbonzini — Thu, 18 Jul 2019 21:38:13 +0000

RHEL6 has newer versions of GCC via the Developer Toolset add-on.

Who's afraid of a big bad optimizing compiler?

law@redhat.com — Thu, 18 Jul 2019 21:19:17 +0000

Red Hat has made newer compilers available to its customers free of charge via the developer toolset (DTS) for several years now. So if you want newer compilers on RHEL 6, that's trivially possible. The current version is DTS 8 which includes gcc-8.

Who's afraid of a big bad optimizing compiler?

k8to — Thu, 18 Jul 2019 19:33:44 +0000

Indeed, I think RHEL6 is still the majority in my experience, but I don't think it's the majority of compile-environments.

The point is still a concern, of course.

Who's afraid of a big bad optimizing compiler?

mpr22 — Thu, 18 Jul 2019 17:48:03 +0000

In the particular case of signed integer overflow, of course, compilers already provide two "sensible C" dialects (gcc calls them -fwrapv and -ftrapv).

Who's afraid of a big bad optimizing compiler?

mathstuf — Thu, 18 Jul 2019 09:01:02 +0000

> Since we don't know what hardware will do on signed integer overflow

You're writing C for a C abstract machine. That machine has undefined behavior for signed integer overflow. Undefined behavior in C means "all bets are off". If this is not what you want, don't use C, convince compilers to provide a "sensible C" dialect, or change C via the standards committee.