User: Password:
Subscribe / Log in / New account



Posted Aug 3, 2012 16:53 UTC (Fri) by PaulMcKenney (subscriber, #9624)
In reply to: ACCESS_ONCE() by tvld
Parent article: ACCESS_ONCE()

I cannot say that I found your concerns convincing, and your further commentary isn't helping to convince me.

So rather that continue that sterile debate, let me ask a question on the examples in the article. In C11/C++11, would it be sufficient to make the ->owner field be atomic with memory_order_relaxed accesses, or would the volatile cast still be necessary to prevent the compiler from doing those optimizations?

(Log in to post comments)


Posted Aug 3, 2012 20:59 UTC (Fri) by tvld (guest, #59052) [Link]

That's a question about forward progress, which isn't specified in much detail in at least C++11.

On the one hand, mo_relaxed loads can read from any write in the visible sequence of side effects. So this would allow to hoist the load. Same for mo_acquire loads actually, I believe.

On the other hand, there's C++11 1.10.2 and 1.10.25. You could interpret those as saying that the atomics in the abstract machine would eventually load the most recent value (in modification order). Assuming that the standard's intent is that this is an additional constraint on reads-from, then compilers wouldn't be allowed to hoist the load out of loops. I didn't see an equivalent of 1.10.2 / 1.10.25 in C11. (You said you've been involved in C11; is there any, or if not, why not?)

Either way, I bet you've thought this through before. So what's your detailed answer to your question? What's your suggestion for how to specify progress guarantees in more detail in the standards?


Posted Aug 3, 2012 22:13 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

C++11's 1.10.24 says that any thread may be assumed to eventually either terminate, invoke a library I/O function, access or modify a volatile object, or perform a synchronization or atomic operation. This was a late add, and it replaced some less-well-defined language talking about loop termination. I could read this as saying that the compiler is not allowed to hoist atomic operations out of infinite loops, in other words, that load combining is allowed, but the implementation is only allowed to combine a finite number of atomic loads. How would you interpret it?

I believe that this wording will be going into C11 as well, but will find out in October. Not so sure about 1.10.2 -- C gets to support a wider variety of environments than C++, so is sometimes less able to make guarantees.

How about the second issue raised in the original article? If an atomic variable is loaded into a temporary variable using a memory_order_relaxed load, is the compiler allowed to silently re-load from that same atomic variable?

My approach would be to continue using things like ACCESS_ONCE() until such time as all the compiler people I know of told me that it was not necessary, and with consistent rationales for why it was not necessary. By the way, this is one of the reasons I resisted the recent attempt to get rid of the "volatile" specifier for atomics -- the other being the need to interact with interrupt handlers (in the kernel) and signal handlers (in user space).


Posted Aug 6, 2012 12:05 UTC (Mon) by tvld (guest, #59052) [Link]

I don't interpret 1.10.24 the same way. To me, it just ensures the compiler that there cannot be an infinite loop without any synchronization or side-effects in it. The only purpose of that that I see is similar to the note in 1.10.24 -- removal of empty loops (e.g., after stuff has been hoisted out of the loop).
I don't see how it would restrict reads-from inside of infinite loops. Even if the relaxed loads would have to stay in the loop, they'd still be allowed to read from an "old" write. (That's why I mentioned 1.10.25, which could mean that they eventually should pick up a recent value, forever; in that case, the load would also have to stay in the loop).
Why are you reading 1.10.24 differently? What's the detailed reasoning?

The second issue, silently reloading from memory for a mo_relaxed load, would not be allowed, I think. The abstract machine wouldn't do it. It is allowed under as-if for nonatomic accesses (with no synchronization in-between etc.) because we can assume data-race-freedom and thus no concurrent modification by other threads.


Posted Aug 6, 2012 16:58 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Suppose that a loop depends on a relaxed atomic load. Suppose that the compiler unrolls the loop by (say) a factor of two. I don't see anything in 1.10.24 that tells me that the compiler is prohibited from actually performing the relaxed atomic load only once per iteration of the unrolled loop, that is to say, only once per two iterations of the loop as written in the source code. Such an optimization could be a problem in some situations.

Don't get me wrong, I would be happy to learn that I am being overly paranoid, and that something actually forces the compiler to actually perform each and every relaxed atomic load, but I have heard too many compiler writers discussing optimizations that are not consistent with this less-paranoid view of this matter.

Your argument that silent reloading from memory for a memory_order_relaxed load does sound more convincing. If I consistently hear the same thing from enough compiler writers over a sufficiently long period of time, I will start recommending ACCESS_ONCE() only to prevent the compiler from optimizing atomic loads out, not to prevent the compiler from reloading from atomics.

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds