ACCESS_ONCE()
ACCESS_ONCE()
Posted Aug 2, 2012 19:20 UTC (Thu) by tvld (guest, #59052)In reply to: ACCESS_ONCE() by PaulMcKenney
Parent article: ACCESS_ONCE()
Yes we've relied on assumptions about compiler implementations. And that sure works, but it isn't ideal, and people should be aware of that. With memory models (that specify multi-threaded executions) we can at least reduce that to assuming that the compiler implements the model correctly. Compiler and application writers need to have a common understanding of the model, but at least we have formalizations of the model.
Those of us living through that time (and I am a relative newcomer, starting shared-memory parallel programming only in 1990) have been shifting our coding practices as the compiler optimizations become more aggressive.Considering the example in the article, hoisting a load out of a loop is not something that I'd call an aggressive optimization.
And I am very sorry to report that I really have seen compiler writers joyfully discuss how new optimizations will break existing code. In their defense, yes, the code that they were anticipating breaking was relying on undefined behavior.And I wouldn't have been joyful about that either. But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all. IMHO, the point here shouldn't be kernel vs. compilers or such, but pointing out the trade-offs between compiler optimizations of single-threaded pieces of code vs. synchronizing code, why we have to make this distinction, how we can draw the line, what memory models do or can't do, etc. Shouldn't we be rather discussing how we can get compilers to more quickly reach production-quality support of memory models, and how to ensure and test that? Or starting from the C/C++ memory model, whether there are limitations of it that are bad for the kernel, and whether compilers could offer variations that would be better (e.g., with additional memory orders)?
Posted Aug 2, 2012 21:29 UTC (Thu)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (6 responses)
And indeed, there are any number of optimizations that are profoundly unaggressive by 2012 standards. However, the fewer registers the target system has (and yes, I have used single-register machines), the less memory the compiler has available to it (I have used systems with 4Kx12bits of core, and far smaller systems were available), and the smaller the ratio of memory latency to CPU clock period (1-to-1 on many systems 30 years ago), the less likely the compiler will do certain optimizations. All of these system attributes have changed dramatically over the past few decades, which in turn has dramatically changed the types of optimizations that compilers commonly carry out.
I, too, favor looking for solutions. But few people are going to look for a solution until they believe that there is a problem. The fact that you already realize that there is a problem is a credit to you, but it is unwise to assume that others understand (or even care about) that problem. And then there is of course the likely disagreements over the exact nature of the problem, to say nothing of over the set of permissible solutions.
Posted Aug 3, 2012 11:29 UTC (Fri)
by tvld (guest, #59052)
[Link] (5 responses)
And I've too often seen people actually understand such (supposedly) funny comments literally (compilers are evil! they don't read my mind!) to really like such comments. And yes, I realize that this is just anecdotal evidence :)
Posted Aug 3, 2012 16:53 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (4 responses)
So rather that continue that sterile debate, let me ask a question on the examples in the article. In C11/C++11, would it be sufficient to make the ->owner field be atomic with memory_order_relaxed accesses, or would the volatile cast still be necessary to prevent the compiler from doing those optimizations?
Posted Aug 3, 2012 20:59 UTC (Fri)
by tvld (guest, #59052)
[Link] (3 responses)
On the one hand, mo_relaxed loads can read from any write in the visible sequence of side effects. So this would allow to hoist the load. Same for mo_acquire loads actually, I believe.
On the other hand, there's C++11 1.10.2 and 1.10.25. You could interpret those as saying that the atomics in the abstract machine would eventually load the most recent value (in modification order). Assuming that the standard's intent is that this is an additional constraint on reads-from, then compilers wouldn't be allowed to hoist the load out of loops. I didn't see an equivalent of 1.10.2 / 1.10.25 in C11. (You said you've been involved in C11; is there any, or if not, why not?)
Either way, I bet you've thought this through before. So what's your detailed answer to your question? What's your suggestion for how to specify progress guarantees in more detail in the standards?
Posted Aug 3, 2012 22:13 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (2 responses)
I believe that this wording will be going into C11 as well, but will find out in October. Not so sure about 1.10.2 -- C gets to support a wider variety of environments than C++, so is sometimes less able to make guarantees.
How about the second issue raised in the original article? If an atomic variable is loaded into a temporary variable using a memory_order_relaxed load, is the compiler allowed to silently re-load from that same atomic variable?
My approach would be to continue using things like ACCESS_ONCE() until such time as all the compiler people I know of told me that it was not necessary, and with consistent rationales for why it was not necessary. By the way, this is one of the reasons I resisted the recent attempt to get rid of the "volatile" specifier for atomics -- the other being the need to interact with interrupt handlers (in the kernel) and signal handlers (in user space).
Posted Aug 6, 2012 12:05 UTC (Mon)
by tvld (guest, #59052)
[Link] (1 responses)
The second issue, silently reloading from memory for a mo_relaxed load, would not be allowed, I think. The abstract machine wouldn't do it. It is allowed under as-if for nonatomic accesses (with no synchronization in-between etc.) because we can assume data-race-freedom and thus no concurrent modification by other threads.
Posted Aug 6, 2012 16:58 UTC (Mon)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
Don't get me wrong, I would be happy to learn that I am being overly paranoid, and that something actually forces the compiler to actually perform each and every relaxed atomic load, but I have heard too many compiler writers discussing optimizations that are not consistent with this less-paranoid view of this matter.
Your argument that silent reloading from memory for a memory_order_relaxed load does sound more convincing. If I consistently hear the same thing from enough compiler writers over a sufficiently long period of time, I will start recommending ACCESS_ONCE() only to prevent the compiler from optimizing atomic loads out, not to prevent the compiler from reloading from atomics.
Posted Aug 2, 2012 22:54 UTC (Thu)
by nix (subscriber, #2304)
[Link] (5 responses)
(Thanks, but no thanks. Humour is a balm, not a menace. And we do often have an evil gleam in our eyes when discussing optimizations, so it wasn't even untrue.)
Posted Aug 3, 2012 1:26 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (4 responses)
Posted Aug 3, 2012 2:40 UTC (Fri)
by corbet (editor, #1)
[Link] (3 responses)
Posted Aug 3, 2012 4:53 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (2 responses)
Posted Aug 4, 2012 12:04 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Aug 4, 2012 21:45 UTC (Sat)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
ACCESS_ONCE()
ACCESS_ONCE()
ACCESS_ONCE()
ACCESS_ONCE()
ACCESS_ONCE()
ACCESS_ONCE()
I don't see how it would restrict reads-from inside of infinite loops. Even if the relaxed loads would have to stay in the loop, they'd still be allowed to read from an "old" write. (That's why I mentioned 1.10.25, which could mean that they eventually should pick up a recent value, forever; in that case, the load would also have to stay in the loop).
Why are you reading 1.10.24 differently? What's the detailed reasoning?
ACCESS_ONCE()
ACCESS_ONCE()
But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all.
Quite. How dare Paul use self-deprecating humour on a mailing list! It should be forbidden in case it offends people too thick to recognise it as humour.
ACCESS_ONCE()
Experts agree that gleam must have been evil indeed around tree RCU time...:)
ACCESS_ONCE()
And if the gleam was evil around tree RCU time, it must have been positively satanic around the time of the first mainlined preemptible RCU...
ACCESS_ONCE()
ACCESS_ONCE()
ACCESS_ONCE()