Load/Store tearing in this day and age?
Load/Store tearing in this day and age?
Posted Jul 16, 2019 22:11 UTC (Tue) by pr1268 (guest, #24648)Parent article: Who's afraid of a big bad optimizing compiler?
I don't understand how/why, in this day and age, that load tearing and storing still occur. At least when loading/storing an n-bit value on an n-bit architecture. Isn't that what the "beauty" of larger-bit architectures is for?
For example, the compiler could, in theory, compile the load from global_ptr on line 1 of the following code as a series of one-byte loads.
70 years of electronic computers, 64-bit architectures are the norm now, and we're still putzing around one byte at a time? Ugh.
IMO loads and stores should be atomic at the assembly level (if not at the metal) for an n-bit value on an n-bit architecture. Just my $0x02.
Thank you to all the contributors for this article. Very enlightening (if not depressing).
Posted Jul 16, 2019 22:58 UTC (Tue)
by excors (subscriber, #95769)
[Link] (2 responses)
Posted Jul 17, 2019 6:58 UTC (Wed)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
Posted Jul 18, 2019 4:21 UTC (Thu)
by ncm (guest, #165)
[Link]
On x86 they devote another few million transistors for each case to help avoid what would be a kernel trap. All the extra transistors strengthen Intel's (and, lately, AMD's and Samsung's) competitive position vs cheaper hardware. It's not free, because that enables a monopoly or oligopoly that may then extract rent or (worse) limit your choices.
In the US, only the former is ever considered actionable harm, despite the law recognizing both. It is artificially difficult to demonstrate the latter, where logically it should instead be assumed.
So, exercising care with alignment affords you more choice in hardware that can run your code fast enough, and safely, which might also enable saving money, too, and also power and heat, because those millions of transistors burn power.
Posted Jul 17, 2019 9:32 UTC (Wed)
by comex (subscriber, #71521)
[Link]
They typically *are* guaranteed to be atomic at the assembly level, but only if the pointer is properly aligned.
You can see this more easily if you use C11 atomics, which are more explicit about what guarantees they provide. For example, this source code:
compiles to this assembly (GCC targeting x86-64):
All C11 atomic loads/stores are guaranteed to be, well, atomic, but GCC has decided to emit a plain mov instruction. In fact this is valid, because x86 has an architectural guarantee that regular movs (64-bit and smaller) to and from aligned pointers are atomic. (And GCC is allowed to assume that `ptr` is aligned, because using it is undefined behavior if not.) Most common architectures work the same.
On the other hand, many uses of atomics want a stronger memory ordering, e.g. memory_order_acquire. On x86, this *still* just generates a plain mov instruction, because x86 has very strong ordering guarantees for all accesses. But on other architectures it tends to require additional synchronization or a different instruction.
Load/Store tearing in this day and age?
Load/Store tearing in this day and age?
Load/Store tearing in this day and age?
> IMO loads and stores should be atomic at the assembly level (if not at the metal) for an n-bit value on an n-bit architecture. Just my $0x02.
Load/Store tearing in this day and age?
#include <stdatomic.h>
int load(_Atomic int *ptr) {
return atomic_load_explicit(ptr, memory_order_relaxed);
}
load:
mov eax, DWORD PTR [rdi]
ret
