Protecting control dependencies with volatile_if()

Posted Jun 22, 2021 3:38 UTC (Tue) by itsmycpu (guest, #139639)
In reply to: Protecting control dependencies with volatile_if() by itsmycpu
Parent article: Protecting control dependencies with volatile_if()

A follow-up question for ARM would be:

What if the load of R6 appears in both the 'then' path and the 'else' path of the BEQ conditional branch? Instead of after the conditional paths join.

Then there would be two load instructions, each dependent on the conditional. Would that mean that loading R6 is now ordered, or is the CPU allowed to internally join the two paths and remove the dependency? Is ARM's behavior defined in this case? I think it needs to be, and the straightforward definition would be that R6 is then ordered. (And that the compiler could use this for optimizations.)

Protecting control dependencies with volatile_if()

Posted Jun 22, 2021 9:47 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

First, thank you for reading my previous comment as intended, not as written - I did make a lot of mistakes trying to put together a case where ARM's ordering is weaker than store-acquire, but you caught my intentions.

Yes, if the load of R6 appears in both paths, then it's an ordered load, due to the data dependency affecting both cases, and thus ARM can't reorder in their memory model; in theory, the compiler can exploit this to get R6 ordered even though the load of R6 is unconditional, while still allowing future loads to be reordered before the load of R6.

The point of the ARM behaviour here is to give you a guarantee that code like the following Cish does what you intended without forcing full ordering between the two threads:


const int global42 = 42;
int global_int;
int *global_ptr;
void main() {
    global_ptr = &global42;
    full_memory_barrier();
    start_threads(thread0, thread1);
    wait_for_threads();
}

void thread0() {
    global_int = 123;
    atomic_store_explicit(global_ptr, &global_int, memory_order_release);
}

void thread1() {
    int *i = global_ptr;
    int j = *i;
    if (i == &global_int) {
        assert(j == 123);
    } else {
        assert(j == 42);
    }
}

Because the store to global_ptr is a release store, any thread which observes that store also observes the store to global_int. thread1 can thus use the value of i to determine which value it loaded.

On Alpha's memory model (which is as weak as you can get), it is permissible for i to point at global42, but for the load to read 123, or vice-versa. ARM disallows this particular case.

Another way of looking at it would be that you can fix thread1 to work correctly on any standards-compliant C compiler by making it:


void thread1() {
    int *i = atomic_load_explicit(global_ptr, memory_order_consume);
    int j = *i;
    if (i == &global_int) {
        assert(j == 123);
    } else {
        assert(j == 42);
    }
}

And on Alpha, the atomic_load_explicit needs to put appropriate memory ordering instructions into the instruction stream for thread1; on ARM, those ordering instructions are implicit in ordinary loads.

Protecting control dependencies with volatile_if()

Posted Jun 22, 2021 21:03 UTC (Tue) by itsmycpu (guest, #139639) [Link]

> The point of the ARM behaviour here ....

Makes sense, and gives a good idea of ARM (and Alpha) in this regard.