What applications benefit from that single-threaded optimization?
What applications benefit from that single-threaded optimization?
Posted Aug 24, 2024 0:45 UTC (Sat) by roc (subscriber, #30627)In reply to: What applications benefit from that single-threaded optimization? by pbonzini
Parent article: A review of file descriptor memory safety in the kernel
I suppose dezgeg answered this --- `make` :-(.
Maybe it's time for someone to wrap gcc in a service that forks threads or processes after startup.
Posted Aug 24, 2024 0:59 UTC (Sat)
by roc (subscriber, #30627)
[Link] (3 responses)
Posted Aug 24, 2024 10:07 UTC (Sat)
by intelfx (subscriber, #130118)
[Link] (2 responses)
So as I understood it, the context here is "why not drop the refcounting-elision optimization at all and save some complexity".
Posted Aug 24, 2024 11:21 UTC (Sat)
by roc (subscriber, #30627)
[Link] (1 responses)
Generally I assume cache-hitting relaxed atomic operations are very cheap as long as you don't have a data dependency on the result, and for refcounts, you don't. When decrementing there is a conditional branch but it should be well predicted. Maybe I'm wrong... if so, I'd like to know more.
Posted Aug 27, 2024 17:50 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
* Don't tear the result if it's bigger than a word (or if the architecture is otherwise susceptible to tearing values of this size).
So, here's the thing. This is (or at least should be) very cheap if all atomic operations are simple reads and writes. You just emit basic loads and stores, perhaps decorate the IR with some very lax optimization barriers, and don't think about synchronization at all. The architecture will almost certainly do something that is permitted by the existing rules, even without any flushes, fences, interlocking, or other nonsense. But I tend to expect that atomic read-modify-write (as seen in atomic reference counting) cannot assume away the ordering problem so easily. If you have two cores that both have cached copies of the variable, and they both concurrently try to do an atomic increment, the variable must ultimately increase by two, not one, and that implies a certain amount of synchronization between cores. They cannot both emit a simple (non-atomic) load/modify/store sequence and expect everything to be fine and dandy.
What applications benefit from that single-threaded optimization?
What applications benefit from that single-threaded optimization?
What applications benefit from that single-threaded optimization?
What applications benefit from that single-threaded optimization?
* If there would be a data race involving this value, don't produce UB, and instead just pick an arbitrary order at runtime.
* In cases where there would be a data race, the arbitrary order is per-variable and does not need to be consistent with anything else, including relaxed atomic operations on other variables.
* Because arbitrary ordering is permitted, cyclic data dependencies between multiple variables can create temporal paradoxes according to the spec (and, to a limited extent, on some real architectures), but it's still not UB for some weird reason. Reading through the literature, I'm having a hard time understanding why anyone would want to write code with such dependencies using relaxed-atomics in the first place. But we don't care, because we're not doing that anyway.
* Atomic load-modify-store (including atomic increment/decrement) is still a single atomic operation, not three operations, and it still has to produce coherent results when consistently used across all callsites. Ordering is relaxed, but atomicity is not.