ACCESS_ONCE()

Posted Aug 2, 2012 19:20 UTC (Thu) by tvld (guest, #59052)
In reply to: ACCESS_ONCE() by PaulMcKenney
Parent article: ACCESS_ONCE()

Yes we've relied on assumptions about compiler implementations. And that sure works, but it isn't ideal, and people should be aware of that. With memory models (that specify multi-threaded executions) we can at least reduce that to assuming that the compiler implements the model correctly. Compiler and application writers need to have a common understanding of the model, but at least we have formalizations of the model.

Those of us living through that time (and I am a relative newcomer, starting shared-memory parallel programming only in 1990) have been shifting our coding practices as the compiler optimizations become more aggressive.

Considering the example in the article, hoisting a load out of a loop is not something that I'd call an aggressive optimization.

And I am very sorry to report that I really have seen compiler writers joyfully discuss how new optimizations will break existing code. In their defense, yes, the code that they were anticipating breaking was relying on undefined behavior.

And I wouldn't have been joyful about that either. But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all. IMHO, the point here shouldn't be kernel vs. compilers or such, but pointing out the trade-offs between compiler optimizations of single-threaded pieces of code vs. synchronizing code, why we have to make this distinction, how we can draw the line, what memory models do or can't do, etc. Shouldn't we be rather discussing how we can get compilers to more quickly reach production-quality support of memory models, and how to ensure and test that? Or starting from the C/C++ memory model, whether there are limitations of it that are bad for the kernel, and whether compilers could offer variations that would be better (e.g., with additional memory orders)?

ACCESS_ONCE()

Posted Aug 2, 2012 21:29 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (6 responses)

Yep, having the compiler understand concurrency is a good thing, no argument from me. If you wish to argue that point, you will need to argue it with someone else. ;-)

And indeed, there are any number of optimizations that are profoundly unaggressive by 2012 standards. However, the fewer registers the target system has (and yes, I have used single-register machines), the less memory the compiler has available to it (I have used systems with 4Kx12bits of core, and far smaller systems were available), and the smaller the ratio of memory latency to CPU clock period (1-to-1 on many systems 30 years ago), the less likely the compiler will do certain optimizations. All of these system attributes have changed dramatically over the past few decades, which in turn has dramatically changed the types of optimizations that compilers commonly carry out.

I, too, favor looking for solutions. But few people are going to look for a solution until they believe that there is a problem. The fact that you already realize that there is a problem is a credit to you, but it is unwise to assume that others understand (or even care about) that problem. And then there is of course the likely disagreements over the exact nature of the problem, to say nothing of over the set of permissible solutions.

ACCESS_ONCE()

Posted Aug 3, 2012 11:29 UTC (Fri) by tvld (guest, #59052) [Link] (5 responses)

I agree that the underlying problem might not be easy to see. But exactly this is a reason for not wrapping it in jokes and other things that can distract. Also, in this case here, the funny bit doesn't even hint at the problem, so it won't help to explain.

And I've too often seen people actually understand such (supposedly) funny comments literally (compilers are evil! they don't read my mind!) to really like such comments. And yes, I realize that this is just anecdotal evidence :)

ACCESS_ONCE()

Posted Aug 3, 2012 16:53 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

I cannot say that I found your concerns convincing, and your further commentary isn't helping to convince me.

So rather that continue that sterile debate, let me ask a question on the examples in the article. In C11/C++11, would it be sufficient to make the ->owner field be atomic with memory_order_relaxed accesses, or would the volatile cast still be necessary to prevent the compiler from doing those optimizations?

ACCESS_ONCE()

Posted Aug 3, 2012 20:59 UTC (Fri) by tvld (guest, #59052) [Link] (3 responses)

That's a question about forward progress, which isn't specified in much detail in at least C++11.

On the one hand, mo_relaxed loads can read from any write in the visible sequence of side effects. So this would allow to hoist the load. Same for mo_acquire loads actually, I believe.

On the other hand, there's C++11 1.10.2 and 1.10.25. You could interpret those as saying that the atomics in the abstract machine would eventually load the most recent value (in modification order). Assuming that the standard's intent is that this is an additional constraint on reads-from, then compilers wouldn't be allowed to hoist the load out of loops. I didn't see an equivalent of 1.10.2 / 1.10.25 in C11. (You said you've been involved in C11; is there any, or if not, why not?)

Either way, I bet you've thought this through before. So what's your detailed answer to your question? What's your suggestion for how to specify progress guarantees in more detail in the standards?

ACCESS_ONCE()

Posted Aug 3, 2012 22:13 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

C++11's 1.10.24 says that any thread may be assumed to eventually either terminate, invoke a library I/O function, access or modify a volatile object, or perform a synchronization or atomic operation. This was a late add, and it replaced some less-well-defined language talking about loop termination. I could read this as saying that the compiler is not allowed to hoist atomic operations out of infinite loops, in other words, that load combining is allowed, but the implementation is only allowed to combine a finite number of atomic loads. How would you interpret it?

I believe that this wording will be going into C11 as well, but will find out in October. Not so sure about 1.10.2 -- C gets to support a wider variety of environments than C++, so is sometimes less able to make guarantees.

How about the second issue raised in the original article? If an atomic variable is loaded into a temporary variable using a memory_order_relaxed load, is the compiler allowed to silently re-load from that same atomic variable?

My approach would be to continue using things like ACCESS_ONCE() until such time as all the compiler people I know of told me that it was not necessary, and with consistent rationales for why it was not necessary. By the way, this is one of the reasons I resisted the recent attempt to get rid of the "volatile" specifier for atomics -- the other being the need to interact with interrupt handlers (in the kernel) and signal handlers (in user space).

ACCESS_ONCE()

Posted Aug 6, 2012 12:05 UTC (Mon) by tvld (guest, #59052) [Link] (1 responses)

I don't interpret 1.10.24 the same way. To me, it just ensures the compiler that there cannot be an infinite loop without any synchronization or side-effects in it. The only purpose of that that I see is similar to the note in 1.10.24 -- removal of empty loops (e.g., after stuff has been hoisted out of the loop).
I don't see how it would restrict reads-from inside of infinite loops. Even if the relaxed loads would have to stay in the loop, they'd still be allowed to read from an "old" write. (That's why I mentioned 1.10.25, which could mean that they eventually should pick up a recent value, forever; in that case, the load would also have to stay in the loop).
Why are you reading 1.10.24 differently? What's the detailed reasoning?

The second issue, silently reloading from memory for a mo_relaxed load, would not be allowed, I think. The abstract machine wouldn't do it. It is allowed under as-if for nonatomic accesses (with no synchronization in-between etc.) because we can assume data-race-freedom and thus no concurrent modification by other threads.

ACCESS_ONCE()

Posted Aug 6, 2012 16:58 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Suppose that a loop depends on a relaxed atomic load. Suppose that the compiler unrolls the loop by (say) a factor of two. I don't see anything in 1.10.24 that tells me that the compiler is prohibited from actually performing the relaxed atomic load only once per iteration of the unrolled loop, that is to say, only once per two iterations of the loop as written in the source code. Such an optimization could be a problem in some situations.

Don't get me wrong, I would be happy to learn that I am being overly paranoid, and that something actually forces the compiler to actually perform each and every relaxed atomic load, but I have heard too many compiler writers discussing optimizations that are not consistent with this less-paranoid view of this matter.

Your argument that silent reloading from memory for a memory_order_relaxed load does sound more convincing. If I consistently hear the same thing from enough compiler writers over a sufficiently long period of time, I will start recommending ACCESS_ONCE() only to prevent the compiler from optimizing atomic loads out, not to prevent the compiler from reloading from atomics.

ACCESS_ONCE()

Posted Aug 2, 2012 22:54 UTC (Thu) by nix (subscriber, #2304) [Link] (5 responses)

But repeating stereotypes and anecdotal "evidence" about this or that group of people doesn't help us at all.

Quite. How dare Paul use self-deprecating humour on a mailing list! It should be forbidden in case it offends people too thick to recognise it as humour.

(Thanks, but no thanks. Humour is a balm, not a menace. And we do often have an evil gleam in our eyes when discussing optimizations, so it wasn't even untrue.)

ACCESS_ONCE()

Posted Aug 3, 2012 1:26 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

Besides which, they tell me that I have had an evil gleam in my eyes a time or two as well. ;-)

ACCESS_ONCE()

Posted Aug 3, 2012 2:40 UTC (Fri) by corbet (editor, #1) [Link] (3 responses)

Experts agree that gleam must have been evil indeed around tree RCU time...:)

ACCESS_ONCE()

Posted Aug 3, 2012 4:53 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

And if the gleam was evil around tree RCU time, it must have been positively satanic around the time of the first mainlined preemptible RCU...

ACCESS_ONCE()

Posted Aug 4, 2012 12:04 UTC (Sat) by nix (subscriber, #2304) [Link] (1 responses)

I wasn't sure whether to thank the Prince of Darkness or controlled substances for that one. I'm glad to see it was the former: cavalier use of controlled substances is, of course, illegal.

ACCESS_ONCE()

Posted Aug 4, 2012 21:45 UTC (Sat) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

;-) ;-) ;-)