User: Password:
Subscribe / Log in / New account



Posted Aug 2, 2012 17:08 UTC (Thu) by PaulMcKenney (subscriber, #9624)
In reply to: ACCESS_ONCE() by tvld
Parent article: ACCESS_ONCE()

I believe that the features in C11 will prove quite helpful, though opinions do vary rather widely (disclaimer: I was involved in the C11 process). But please keep in mind that there has been concurrent code written in C for some decades: Sequent started in 1983, which was almost three decades ago, and Sequent was probably not the first to write parallel code in C. Linux added SMP support in the mid-1990s, which is well over a decade ago.

So what did us parallel programmers do during the decades before C11? Initially, we relied on the fact that early C compilers did not do aggressive optimizations, though register reloads did sometimes trip us up. More recently, we have relied on non-standard extensions (e.g., the barrier() macro) and volatile casts. Those of us living through that time (and I am a relative newcomer, starting shared-memory parallel programming only in 1990) have been shifting our coding practices as the compiler optimizations become more aggressive.

And I am very sorry to report that I really have seen compiler writers joyfully discuss how new optimizations will break existing code. In their defense, yes, the code that they were anticipating breaking was relying on undefined behavior. Then again, until production-quality C11 compilers become widely available, all multithreaded C programs that allow concurrent read-write access to any given variable will continue to rely on undefined behavior.

(Log in to post comments)


Posted Aug 2, 2012 19:20 UTC (Thu) by tvld (guest, #59052) [Link]

Yes we've relied on assumptions about compiler implementations. And that sure works, but it isn't ideal, and people should be aware of that. With memory models (that specify multi-threaded executions) we can at least reduce that to assuming that the compiler implements the model correctly. Compiler and application writers need to have a common understanding of the model, but at least we have formalizations of the model.
Those of us living through that time (and I am a relative newcomer, starting shared-memory parallel programming only in 1990) have been shifting our coding practices as the compiler optimizations become more aggressive.
Considering the example in the article, hoisting a load out of a loop is not something that I'd call an aggressive optimization.
And I am very sorry to report that I really have seen compiler writers joyfully discuss how new optimizations will break existing code. In their defense, yes, the code that they were anticipating breaking was relying on undefined behavior.
And I wouldn't have been joyful about that either. But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all. IMHO, the point here shouldn't be kernel vs. compilers or such, but pointing out the trade-offs between compiler optimizations of single-threaded pieces of code vs. synchronizing code, why we have to make this distinction, how we can draw the line, what memory models do or can't do, etc. Shouldn't we be rather discussing how we can get compilers to more quickly reach production-quality support of memory models, and how to ensure and test that? Or starting from the C/C++ memory model, whether there are limitations of it that are bad for the kernel, and whether compilers could offer variations that would be better (e.g., with additional memory orders)?


Posted Aug 2, 2012 21:29 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Yep, having the compiler understand concurrency is a good thing, no argument from me. If you wish to argue that point, you will need to argue it with someone else. ;-)

And indeed, there are any number of optimizations that are profoundly unaggressive by 2012 standards. However, the fewer registers the target system has (and yes, I have used single-register machines), the less memory the compiler has available to it (I have used systems with 4Kx12bits of core, and far smaller systems were available), and the smaller the ratio of memory latency to CPU clock period (1-to-1 on many systems 30 years ago), the less likely the compiler will do certain optimizations. All of these system attributes have changed dramatically over the past few decades, which in turn has dramatically changed the types of optimizations that compilers commonly carry out.

I, too, favor looking for solutions. But few people are going to look for a solution until they believe that there is a problem. The fact that you already realize that there is a problem is a credit to you, but it is unwise to assume that others understand (or even care about) that problem. And then there is of course the likely disagreements over the exact nature of the problem, to say nothing of over the set of permissible solutions.


Posted Aug 3, 2012 11:29 UTC (Fri) by tvld (guest, #59052) [Link]

I agree that the underlying problem might not be easy to see. But exactly this is a reason for not wrapping it in jokes and other things that can distract. Also, in this case here, the funny bit doesn't even hint at the problem, so it won't help to explain.

And I've too often seen people actually understand such (supposedly) funny comments literally (compilers are evil! they don't read my mind!) to really like such comments. And yes, I realize that this is just anecdotal evidence :)


Posted Aug 3, 2012 16:53 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

I cannot say that I found your concerns convincing, and your further commentary isn't helping to convince me.

So rather that continue that sterile debate, let me ask a question on the examples in the article. In C11/C++11, would it be sufficient to make the ->owner field be atomic with memory_order_relaxed accesses, or would the volatile cast still be necessary to prevent the compiler from doing those optimizations?


Posted Aug 3, 2012 20:59 UTC (Fri) by tvld (guest, #59052) [Link]

That's a question about forward progress, which isn't specified in much detail in at least C++11.

On the one hand, mo_relaxed loads can read from any write in the visible sequence of side effects. So this would allow to hoist the load. Same for mo_acquire loads actually, I believe.

On the other hand, there's C++11 1.10.2 and 1.10.25. You could interpret those as saying that the atomics in the abstract machine would eventually load the most recent value (in modification order). Assuming that the standard's intent is that this is an additional constraint on reads-from, then compilers wouldn't be allowed to hoist the load out of loops. I didn't see an equivalent of 1.10.2 / 1.10.25 in C11. (You said you've been involved in C11; is there any, or if not, why not?)

Either way, I bet you've thought this through before. So what's your detailed answer to your question? What's your suggestion for how to specify progress guarantees in more detail in the standards?


Posted Aug 3, 2012 22:13 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

C++11's 1.10.24 says that any thread may be assumed to eventually either terminate, invoke a library I/O function, access or modify a volatile object, or perform a synchronization or atomic operation. This was a late add, and it replaced some less-well-defined language talking about loop termination. I could read this as saying that the compiler is not allowed to hoist atomic operations out of infinite loops, in other words, that load combining is allowed, but the implementation is only allowed to combine a finite number of atomic loads. How would you interpret it?

I believe that this wording will be going into C11 as well, but will find out in October. Not so sure about 1.10.2 -- C gets to support a wider variety of environments than C++, so is sometimes less able to make guarantees.

How about the second issue raised in the original article? If an atomic variable is loaded into a temporary variable using a memory_order_relaxed load, is the compiler allowed to silently re-load from that same atomic variable?

My approach would be to continue using things like ACCESS_ONCE() until such time as all the compiler people I know of told me that it was not necessary, and with consistent rationales for why it was not necessary. By the way, this is one of the reasons I resisted the recent attempt to get rid of the "volatile" specifier for atomics -- the other being the need to interact with interrupt handlers (in the kernel) and signal handlers (in user space).


Posted Aug 6, 2012 12:05 UTC (Mon) by tvld (guest, #59052) [Link]

I don't interpret 1.10.24 the same way. To me, it just ensures the compiler that there cannot be an infinite loop without any synchronization or side-effects in it. The only purpose of that that I see is similar to the note in 1.10.24 -- removal of empty loops (e.g., after stuff has been hoisted out of the loop).
I don't see how it would restrict reads-from inside of infinite loops. Even if the relaxed loads would have to stay in the loop, they'd still be allowed to read from an "old" write. (That's why I mentioned 1.10.25, which could mean that they eventually should pick up a recent value, forever; in that case, the load would also have to stay in the loop).
Why are you reading 1.10.24 differently? What's the detailed reasoning?

The second issue, silently reloading from memory for a mo_relaxed load, would not be allowed, I think. The abstract machine wouldn't do it. It is allowed under as-if for nonatomic accesses (with no synchronization in-between etc.) because we can assume data-race-freedom and thus no concurrent modification by other threads.


Posted Aug 6, 2012 16:58 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Suppose that a loop depends on a relaxed atomic load. Suppose that the compiler unrolls the loop by (say) a factor of two. I don't see anything in 1.10.24 that tells me that the compiler is prohibited from actually performing the relaxed atomic load only once per iteration of the unrolled loop, that is to say, only once per two iterations of the loop as written in the source code. Such an optimization could be a problem in some situations.

Don't get me wrong, I would be happy to learn that I am being overly paranoid, and that something actually forces the compiler to actually perform each and every relaxed atomic load, but I have heard too many compiler writers discussing optimizations that are not consistent with this less-paranoid view of this matter.

Your argument that silent reloading from memory for a memory_order_relaxed load does sound more convincing. If I consistently hear the same thing from enough compiler writers over a sufficiently long period of time, I will start recommending ACCESS_ONCE() only to prevent the compiler from optimizing atomic loads out, not to prevent the compiler from reloading from atomics.


Posted Aug 2, 2012 22:54 UTC (Thu) by nix (subscriber, #2304) [Link]

But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all.
Quite. How dare Paul use self-deprecating humour on a mailing list! It should be forbidden in case it offends people too thick to recognise it as humour.

(Thanks, but no thanks. Humour is a balm, not a menace. And we do often have an evil gleam in our eyes when discussing optimizations, so it wasn't even untrue.)


Posted Aug 3, 2012 1:26 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

Besides which, they tell me that I have had an evil gleam in my eyes a time or two as well. ;-)


Posted Aug 3, 2012 2:40 UTC (Fri) by corbet (editor, #1) [Link]

Experts agree that gleam must have been evil indeed around tree RCU time...:)


Posted Aug 3, 2012 4:53 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

And if the gleam was evil around tree RCU time, it must have been positively satanic around the time of the first mainlined preemptible RCU...


Posted Aug 4, 2012 12:04 UTC (Sat) by nix (subscriber, #2304) [Link]

I wasn't sure whether to thank the Prince of Darkness or controlled substances for that one. I'm glad to see it was the former: cavalier use of controlled substances is, of course, illegal.


Posted Aug 4, 2012 21:45 UTC (Sat) by PaulMcKenney (subscriber, #9624) [Link]

;-) ;-) ;-)


Posted Aug 3, 2012 7:53 UTC (Fri) by dvdeug (subscriber, #10998) [Link]

Then again, as mentioned below in the comments section, there is a guaranteed reliable way of doing this in C; it's declaring the variable volatile. This little trick may be within the C standard, but I'm pretty sure the people adding volatile to C never expected "(*(volatile typeof(x) *)&(x))". Some programmers use every trick to optimize their code, then expect the compiler to maximally optimize their code while understanding all of their little tricks.


Posted Aug 4, 2012 12:14 UTC (Sat) by nix (subscriber, #2304) [Link]

What matters with volatile (and any other language construct, really, but volatile in particular) isn't what the people adding it to the Standard expected. It's whether the Standard states, explicitly or by implication, that it will work; what implementations do; and what the people writing those implementations expect.

Like const and restrict, volatile has been intended to be applied to pointers from the very start (that's always been its intended use: all the cv-quals are of limited use applied to non-pointer types). So saying that casting a normal pointer to volatile was not expected is unsupported by the facts. Of course they didn't expect tricks involving typeof(), because typeof() isn't in the Standard! So what matters is surely whether it works, and can be expected to continue to work, in compilers implementing typeof(). The answer to the first is, yes, it works, and if it stops working, well, the kernel hackers and compiler hackers do talk to each other, and it'll either be made to work again or another technique will be implemented to do the same thing (in which case ACCESS_ONCE() just becomes a compiler-version-specific macro).

But this doesn't strike me as something too terribly likely to break in future compiler versions. The precise semantics of volatile are distinctly underspecified, to be sure, but C11 atomic operations are specifically declared as taking objects of volatile-qualified type, and volatile's meaning as 'minimize optimizations applied to things manipulating anything of volatile type, do not duplicate, elide, move, fold, spindle or mutilate' is of long standing.


Posted Feb 11, 2015 22:00 UTC (Wed) by gmaxwell (guest, #30048) [Link]

Typeof here isn't important to whats being done. Absent typeof you'd just have ACCESS_INT_ONCE() (or use the volatile cast directly instead of via a macro); the use of typeof here just makes the code more tidy and allows using a macro to express the programmer's intent better to other programmers.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds