Who's afraid of a big bad optimizing compiler?

Posted Jul 17, 2019 20:14 UTC (Wed) by law@redhat.com (guest, #31677)
Parent article: Who's afraid of a big bad optimizing compiler?

Sadly, most user space code should be looking at the C11/C++11 memory model where objects which are accessed concurrently are to be appropriately marked at the source level. That marking in turn drives what the compiler is allowed to do with those memory references and should avoid the multitude of problems referenced by this article.

Twiddling optimization flags, or worse yet, writing source that is specific to a set of transformations the compiler does or does not do today is a losing proposition. Such techniques may work today, they may work tomorrow, but they're really a case of writing code to a particular compiler implementation and could well fail in the future as compilers continue to evolve and more aggressively optimize.

The right thing, particularly for user space code, is to learn the C11/C++11 memory model and use those facilities. Do this and you're light years ahead on making code that works today, tomorrow and into the future regardless of what the compiler developers do.

I realize the kernel is special and kernel developers are going to do their own thing, but what they do should not be held up as an example of "the right way" in the general case.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 4:31 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (3 responses)

This could be fine when developing new software that doesn't care much about deployed systems. But for existing software and for software having to run on servers this is not yet an option. C11 was introduced in gcc 4.7 which is fairly recent when it comes to the server world (RHEL6 ships 4.4 and is still not that rare in field).

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 19:33 UTC (Thu) by k8to (guest, #15413) [Link]

Indeed, I think RHEL6 is still the majority in my experience, but I don't think it's the majority of compile-environments.

The point is still a concern, of course.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 21:19 UTC (Thu) by law@redhat.com (guest, #31677) [Link]

Red Hat has made newer compilers available to its customers free of charge via the developer toolset (DTS) for several years now. So if you want newer compilers on RHEL 6, that's trivially possible. The current version is DTS 8 which includes gcc-8.

Who's afraid of a big bad optimizing compiler?

Posted Jul 18, 2019 21:38 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

RHEL6 has newer versions of GCC via the Developer Toolset add-on.

Who's afraid of a big bad optimizing compiler?

Posted Jul 26, 2019 10:53 UTC (Fri) by ksandstr (guest, #60862) [Link]

>[...] the C11/C++11 memory model where objects which are accessed concurrently are to be appropriately marked at the source level.

I'd like to point out that it's not sufficient to tag values subject to concurrent write, but rather all consumers of those values should be marked by use of explicitly atomic primitives. Such as those in stdatomic.h, which will presumably stay consistent[0] despite e.g. future ultra-LTO exposing damage previously hidden by an intermodule boundary. This is because the semantics of concurrently-written data are generally different from those of data in a single-threaded or "classic C" program where it isn't intended that changes to shared state are visible outside of the current thread or the reverse, so it's necessary to indicate non-classic usage to the reader at point of interaction and declaration both.

>The right thing, particularly for user space code, is to learn the C11/C++11 memory model and use those facilities.

The C11 memory model can also be studied as first the "classic C" of single-threaded programs, where the compiler may yield an unrecognizably bizarre but correct result for reasons in the article; and by then laying over a set of diffs for "concurrently interacting C", which is just as the old but allows source to indicate that certain parts expect to publish or consume and should appear accordingly from out-of-thread[1]. Most importantly the latter allows execution of the former unmodified when it's known that concurrent interactions won't occur, such as between callsites of POSIX-style mutexes where synchronization is hidden behind a function call boundary, or indicated to a deeply LTO compiler by syscall or atomic primitive.

This is in contrast to ideas about forcing correct concurrent interactions through manipulation of the compiler output (sans memory-clobbering asm volatile), which seem like a sediment of bubblegum fixes compared to C11 atomics, or to even the C99 idea of the standard language as semantics that constrain the compiler over a naïve 1:1-ish mapping between statement and machine code.

[0] unimplementable ordering semantics aside; how come nobody smelled the cr^Wsufficiently smart compiler from a mile away? (and how come I used a born-defunct option in real code, thinking it'd one day do something besides introduce subtle breakage?)
[1] e.g. in a shared memory segment

Who's afraid of a big bad optimizing compiler?

Posted Jul 26, 2019 18:11 UTC (Fri) by andresfreund (subscriber, #69562) [Link] (1 responses)

I agree. Although I still don't understand why the C11/C++11 memory model chose to provide barriers not tied to variables in such an oddly defined way. IIRC - and it's been a while since I tried to parse the standard's language - you can't really use them to incrementally upgrade from compiler-specific barriers, without also converting all other memory to go through C11/C++11 atomics.

Who's afraid of a big bad optimizing compiler?

Posted Aug 15, 2019 19:15 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

There were some concerns about specific machines that have since proved groundless, so the atomic_thread_fence() wording has (quite) recently been upgraded.