Differences between C11 and Linux atomics

Posted Oct 16, 2024 16:38 UTC (Wed) by pbonzini (subscriber, #60935)
Parent article: Using LKMM atomics in Rust

> checking that the compiled machine code respects the LKMM, regardless of what the source languages are. If we can have LKMM atomics in Rust, we should just use them, he said.

There only are a few important differences in the code generation between Linux and C11 atomics (the backend being the same for both C and Rust).

One is that relaxed atomics are different from READ_ONCE/WRITE_ONCE in terms of optimizability. In general more optimizations are possible on C atomics than what LKMM atomics will do, though in general this doesn't matter. Anyhow some examples can be found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

Another is that C compilers do not implement the Consume memory ordering, and treat it as Acquire. This matters for optimizing RCU, but it's conservative so it's also respecting the LKMM.

The more important one boils down to the fact that seq_cst *fences* are different from seq_cst *atomics* in the C memory model. The latter only form a total order among themselves, while the former (just like the familiar smp_mb() in Linux) order all memory accesses. It escapes me why the committee used the same name for two very different things, but it can cause surprising effects especially on ARM.

On ARM the "acquire" and "release" operations are slightly different from the usual semantics. Release (e.g. STRL) is normal, but acquire (LDAR) will not just order before subsequent operations: it will also order after previous release stores. This was done so that seq_cst operations didn't need an expensive memory barrier and could just wait for previous seq_cst stores or read-modify-write operations. Unfortunately, this optimizes for a memory ordering that is just as harmful as volatile and for very similar reasons, in that they hide the synchronizes-with relationship that the programmer wanted.

Anyhow, the result is that on Arm a cmpxchg in the C memory model does not have a trailing full seq_cst fence, which is there in the Linux kernel implementation of atomics. This has an effect if you need to implement something like

     r1 = cmpxchg(&a, 0, 1);
     // smp_mb() not needed here
     r2 = READ_ONCE(b);

Another issue with the C and Rust memory models is the underspecification of the compiler (aka signal) fence. It is said to establish ordering between a thread and a signal handler executed on the same thread, by suppressing reordering of the instructions by the compiler. The standard however does not answer whether this also establishes ordering 1) with other threads executed on the same processor (e.g. via pinning or on a uniprocessor system) and 2) with other processors as long as a memory barrier instruction is executed (as is the case with Linux's membarrier system call).

Differences between C11 and Linux atomics

Posted Oct 16, 2024 21:08 UTC (Wed) by foom (subscriber, #14868) [Link]

BTW, Aarch64 compilers these days will use the LDAPR instruction (instead of LDAR) for a C11 load-acquire operation, if the target supports it. The only difference between the two instructions is that LDAPR does not have an ordering constraint with STRL.

AFAIK this did not require any other changes (e.g. cmpxchg did not change, and still uses ldaxrb+stlxrb) -- it's just a simple relaxation. That is, by using LDAPR, the compiler has stopped requiring the hardware to enforce a constraint which the language's memory-model did not require it to enforce.