LWN: Comments on "Lockless algorithms for mere mortals"

Lockless algorithms for mere mortals

alison — Mon, 15 Mar 2021 16:01:35 +0000

> It's not just Newtonian physics, it's all macroscopic physics. In relativistic space-time, about the only thing preserved is causation [0].

Sadly, Bell's Inequality shows that the combination of quantum mechanics and relativity spells trouble for causality, too. See, for example,

https://phys.org/news/2017-07-probability-quantum-world-l...

"By performing an essentially loophole-free Bell test, they have shown that two atoms separated by a distance of a quarter of a mile share correlations that should be impossible under the hypothesis of local realism, and are most likely explained by quantum entanglement."

"Local realism" here means that even Reality has only "eventual consistency." Senior, respected, widely published physicists believe this and have done so for decades. Quantum entanglement is harder to reason about then memory ordering and consistency. Einstein's famous comment that "God does not play dice" was in response to discussions of this topic.

Lockless algorithms for mere mortals

immibis — Tue, 13 Oct 2020 15:28:03 +0000

I'll try to give an intuitive explanation for why erasing data must cost energy. Premise: we know that the universe is reversible. If state A goes to state B, then state reverse(B) must go to state reverse(A).

Therefore you can't have states A and B which both go to C, because a universe in state reverse(C) wouldn't know whether to go to reverse(A) or reverse(B). Information cannot be deleted.

So how do computers delete information, then? Well, the universe is full of states we don't care about, so we just cheat and move the information to one of those. For example, writing 1 releases energy from a different point in the chip than writing 0, which causes a different vibration in the silicon lattice, which transfers to the plastic packaging, which causes an air molecule to bounce off the chip differently.

That means if there are 1 zillion different ways the air molecules would've bounced (if the chip was turned off), now there are 2 zillion, because all the bounces after 0 is written are different from all the bounces after 1 is written. Obviously air can bounce a lot of ways - but the problem is that all of the 1-zillion input states map to 1-zillion output states already. There's no way to get more output states than input states without giving the molecule more energy - which allows it to be in states it couldn't be in with its previous amount of energy. Therefore the chip must have transferred a bit of energy to the air molecule. There is no other way it could work.

All of this applies *on average*, by the way.

Or here's a different explanation: If you consider a computer that's known to be in 1 of 1000 states, and the air in the room that's in 1 of 1 zillion states, there are 1000 zillion states in total (the cartesian product). If the computer is in 1 of 500 states in the next timestep, there still must be 1000 zillion states for the computer+air system, because the universe is reversible. Therefore the air must be in 1 of 2 zillion states. This doesn't apply to reversible computers, because reversible computers *don't* decrease their number of states in any time step.

Lockless algorithms for mere mortals

Cyberax — Mon, 07 Sep 2020 20:07:22 +0000

> if you attempt to simulate the structure and evolution of the large-scale universe using MOND or similar theories, the results look very very different from what we observe
This is actually questionable.

Moreover, it has not been conclusively proven that galactic rotation curves can't be explained by general relativity alone. Exact solutions of Einstein field equations are too simplistic for that and computer simulation is way too complicated for something like a galaxy. There are people working on this, but this is very un-glamorous area of research.

Lockless algorithms for mere mortals

nix — Mon, 07 Sep 2020 19:19:40 +0000

> might not the gravitational law ALSO change with mass-energy density, so that gravity is stronger in regions of low mass density like interstellar and intergalactic space?

This is the basis of Mordehai Milgrom's MOND (and a number of other similar things, some of which, unlike MOND, are modifications of relativity rather than Newton). They're not widely accepted because while they get galaxies' rotational velocities right (it would be surprising if they didn't), unfortunately, if you attempt to simulate the structure and evolution of the large-scale universe using MOND or similar theories, the results look very very different from what we observe. The people involved in MOND are major figures (not least Jakob Bekenstein, who nobody could claim doesn't know his stuff where gravity is concerned), but MOND remains... problematic.

(But it has nothing to do with Boyle's law or with slowing of anything: MOND isn't even relativistic, let alone quantized, and doesn't mention gravitons at all. Its adjustments to Newton are ad-hoc to make the rotation curves of galaxies come out right.)

> Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?

Massless particles (including photons) travel at c, always: don't think of it as the speed of light, think of it as the speed of propagation of cause and effect, and massless particles max this out. In a medium, the apparent speed of light waves in particular (but no other massless particles) is reduced by coupling to the electromagnetic fields of charged particles in the medium (largely electrons): but this is a group effect on the wave as a whole, and the photons that make up the light are still moving at c!

I suppose the same class of effect in theory could apply to changes in gravity, since gravitational waves should I believe couple to all masses they pass in the same way and slow in regions with higher mass density, but gravity is such a weak force that the effect would be incredibly tiny: it would almost certainly be as unobservable as the graviton itself even if you were hanging out next to a black hole. (So far, nobody has thought up a way to produce a graviton detector which isn't so massive it collapses into a black hole. Gravity is *ridiculously* weak.)

Lockless algorithms for mere mortals

Wol — Mon, 07 Sep 2020 16:46:53 +0000

> Using a massless ultra-low-energy particle to explain away the majority of the mass-energy density in the universe seems to me to be putting the cart so far before the horse that it's in a different solar system.

Except that's not what I'm doing. Sorry if I've mangled my physics, but ?Boyle's Ideal Gas Law only applies to a hot gas where the atoms/molecules bounce off each other. As the gas cools, the Gas Law changes.

I'm saying that, just like with velocities where at slow speeds v1+v2 gives us the obvious answer but c+c gives us the extremely unintuitive answer of c, might not the gravitational law ALSO change with mass-energy density, so that gravity is stronger in regions of low mass density like interstellar and intergalactic space? After all, isn't that one possible explanation for what we observe?

Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?

Cheers,
Wol

Lockless algorithms for mere mortals

nix — Mon, 07 Sep 2020 12:38:49 +0000

Quite. Another way of putting it is that if gravitons exist, gravity is transmitted by *virtual* gravitons. Virtual particles are not real, thus cannot form a Bose-Einstein condensate. (And if real gravitons exist, which is debatable, they are going to be massless, only appear when masses *move* (just like photons in the electromagnetic field only appear when charged particles move), and incredibly low-energy: the best upper bound on the graviton's Compton wavelength is over a light year! Using a massless ultra-low-energy particle to explain away the majority of the mass-energy density in the universe seems to me to be putting the cart so far before the horse that it's in a different solar system. And while a BEC of gravitons is theoretically possible -- a BEC of photons has been produced, after all -- it seems massively unlikely to ever happen in the real universe.)

Also, since both move at lightspeed, it makes about as much sense to say that gravitons can "get colder" as it does to say that light can "get colder".

Lockless algorithms for mere mortals

nix — Mon, 07 Sep 2020 12:27:37 +0000

That's how black holes evaporate as the Universe gets colder ...

For the record, this is completely wrong. Black hole evaporation is a consequence of the differing appearance of the weave of spacetime to observers very close to, versus far from, the event horizon: a local version of the Unruh effect, as it were (or, rather, the Unruh effect is the same thing as Hawking evaporation applied to accelerating observers rather than observers in a gravitational field). It depends only on the mass of the hole and its event horizon radius. It happens even when the universe is hot: it happens even when the hole is still in the middle of its exploding progenitor star. It's just that until the universe is very old and cold (or the hole is exceptionally small, thus with a high Hawking temperature), the hole will gain more mass through absorbing intercepted microwave background radiation than it loses through Hawking radiation. But it's still losing mass through this effect all the time, nonetheless.

The process has nothing to do with the sign of gravitational potential energy at all.

(Yes, the Unruh effect is named after the same Bill Unruh who used to hang out on uk.comp.os.linux.)

Lockless algorithms for mere mortals

nix — Sun, 06 Sep 2020 17:43:28 +0000

> [2] Actually, it could well be the Maxwell daemon thing again. Sitting there, actively monitoring the room so you know when all the gas molecules are on one side of the room takes an external energy source.

This was figured out a decade or so ago. You don't need the cost of monitoring: even if that's zero, the mere fact that the demon has to make a decision about whether to allow a given molecule through or not is enough to ensure that the entropy of the system (demon + box) always increases, given that the demon's memory capacity is finite such that it eventually has to erase the states of some of its memory (increasing its entropy) in order to make more decisions about whether to let molecules through.

Lockless algorithms for mere mortals

briangordon — Wed, 12 Aug 2020 18:49:16 +0000

It depends on what you're doing with it. If you're implementing a JVM I'm sure the technical details of the spec aren't easy to grapple with, but the practical implications are reasonably intuitive for developers. That's for isolated snippets of code though. In a complex system with lots of moving parts, it can still become ferociously difficult to maintain your invariants, so you usually want to work with abstractions that wrap the low-level primitives to make things as dead simple as possible. For example, the actor model has your code running in single-threaded actors that can only communicate by sending messages to each other - there's concurrent code under the hood, but you don't have to think about it.

Lockless algorithms for mere mortals

farnz — Mon, 03 Aug 2020 10:12:40 +0000

I would say that machine-checkability is necessary but not sufficient. If the rules are so complex to encode that a computer can't verify that you're getting them right, then they are also too complex for a human to get right, too, and they are certainly too complex for a human code reviewer to reliably check.

That said, this is a minimum requirement, as you can have rules that a computer can reliably verify, but that humans get wrong - see DEC Alpha ordering for the classic example.

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 17:27:37 +0000

No it is not at all easy! Today's systems are quite complex, and their are variations from one system to other (ostensibly identical) systems. And variations in a single system over time.

There is a lot of existing code to do such measurement, however, but on the other hand creating your own can be quite instructive.

Lockless algorithms for mere mortals

itsmycpu — Sun, 02 Aug 2020 17:05:26 +0000

Currently working on a way to consistently precision-test execution times of short code fragments, as a side project...not as easy as it may sound. ;-)

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 16:56:19 +0000

Well, if the use of memory_order_relaxed was obvious and trivial, Hans and I probably would not have written P2055R0. :-)

It turns out that in C, all the atomic operations are already volatile, including atomic_load_explicit(), which is then pretty close to READ_ONCE().

In contrast, in C++, atomic_load_explicit() can take either a volatile or a non-volatile pointer, which means that use of C++ atomic_load_explicit() on non-volatile pointers (the common case) will be subject to the transformation called out in N4444. This working paper is for C++ rather than for C, in case you were wondering why it leaves out the C-language case.

And yes, JF and I are already calling for something like READ_ONCE() and WRITE_ONCE() in C++: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p...

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 16:45:21 +0000

Try it and measure the results!

Of course, the results will likely vary not only across CPU families, but also within CPU families.

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 16:43:30 +0000

Hans and I were expecting people to refer to the cited sections of the working paper "N2153: A simple and efficient memory model for weakly-ordered architectures", which covers this in detail, including step 3. But yes, that expectation might not apply to people unfamiliar with the committee. Plus N2153 was written before the C11 atomic API had been finalized, so some mapping is required to understand it.

I have therefore expanded this section to make step 3 explicit. Thank you for pointing this out.

There will be a P2055R1 at some point, but in the meantime you can access the LaTeX at https://github.com/paulmckrcu/WG21-relaxedguide.git.

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 15:39:28 +0000

My fond hope is that things like the Linux-kernel memory model can shorten that time in at least some cases. ;-)

Lockless algorithms for mere mortals

PaulMcKenney — Sun, 02 Aug 2020 15:35:27 +0000

No two ways about it, analogies with relativistic and quantum physics are way more cool than analogies against Newtonian physics. ;-)

Lockless algorithms for mere mortals

jezuch — Sun, 02 Aug 2020 07:02:32 +0000

It was probably very naive at the beginning, like a lot of Java. But I think the interesting thing is that, at least as I understand it, it's specified in very different terms than the C model, which uses esoteric jargon, while the Java model tried to make it at least understandable for mere mortals. Not sure how well it succeeds - usually you don't have to go into this level of detail unless you want to do something unusual, but at that point you're clearly very clever, so... ;)

Lockless algorithms for mere mortals

anton — Sat, 01 Aug 2020 17:23:44 +0000

Machine-checkability does seem important.

Or one might consider such a requirement (as well as the article) to be an indication that the concepts are too difficult to use. In other cases (e.g., wrt. page colouring), Linus Torvalds has written that he targets hardware that performs well without such complications, and that other hardware deserves what it gets. I think that is a sensible approach for memory ordering as well. Have a model that's easy to understand and performs ok on nice hardware (not sure if nice hardware already exists, but weak consistency certainly does not look nice to me); then hardware designers have an incentive to make their hardware nicer.

It seems to me that all this difficult-to-program memory ordering is there because multi-processors originally come from the supercomputing area, where hardware is still more expensive than software, and hardware designers can get away with making hardware that's hard to program correctly and efficiently.

Lockless algorithms for mere mortals

anton — Sat, 01 Aug 2020 16:35:51 +0000

All the data centers are at rest wrt each other, and wrt to the events they observe (Earth rotation may be an issue, but not at the ms resolution they work with). Differences in the gravity field are miniscule, but even if they were significant, they would only change the time scale, not the order of events; and scale can be corrected by scaling the measured time to correct for this effect.

Lockless algorithms for mere mortals

itsmycpu — Sat, 01 Aug 2020 14:32:06 +0000

To complete my thoughts about reference counting above:

If the reference counter is incremented relaxed, (for common use cases) it implies that read/write access is synchronized separately. So I'd expect that decrementing can be relaxed as well if the thread that encounters a zero reference count still has read/write access (otherwise it should be enough to use a simple load-acquire on the synchronization variable).

However the (separate) synchronization of read/write access, which can't be relaxed, probably makes the gain look small in comparison, even on those platforms where it is larger. Which makes me wonder how large it is.

Lockless algorithms for mere mortals

itsmycpu — Sat, 01 Aug 2020 12:47:34 +0000

Something I noticed in your article http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p...
In 2.3, it describes an atomic_thread_fence with acquire:

> The atomic_thread_fence() function can be used to order multiple sets of accesses, for example, by replacing a series of acquire loads with relaxed loads followed by an atomic_thread_fence(memory_order_acquire)

In its shortness, this sentence suggests the sequence:
1: A series of relaxed loads
2: An acquire fence

However reading https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence suggests:

1: Reading a single synchronization variable (probably atomic relaxed, as in the example)
2: An acquire fence
3: A series of relaxed or non-atomic loads.

If you don't mind me pointing it out. Probably more like a typo.

Lockless algorithms for mere mortals

pbonzini — Sat, 01 Aug 2020 09:25:54 +0000

Paul, you don't know how much it means to me that you confirm that my understanding makes sense! It only took 10 years. :-)

Thank you very much!

Lockless algorithms for mere mortals

itsmycpu — Sat, 01 Aug 2020 04:24:26 +0000

Partial answer: In my most critical use case, I am using the reference counter also as a synchronization point, so I need acquire/release there. However in many cases making the reference count atomic will perhaps simply be a way to avoid the need to get exclusive write access.

For example if thread A already increases the count before passing the pointer, in advance of decreasing its own reference, of course relaxed is sufficient. If thread B is the one to increase the count, I'd think there needs to be a direct or indirect synchronization point between B's increment and A's decrement. Otherwise each increment is atomic, but one can't be sure which one happens first. However that synchronization point doesn't have to be the counter itself. If it isn't, then the question arises if decrementing can't be relaxed as well. :) Especially if the count reaching zero can be taken as an indication that all read/write access has already been given up.

So yes, I think that's a valid example for relaxed. Had to think about it, though. :-)

Here http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n...
you wrote that a compiler is allowed to replace

while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);

with

while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
}

If that is still the case according to current C/C++ definitions, why don't you argue that the definition of memory_order_relaxed be refined to READ_ONCE and WRITE_ONCE? Or are you already?

Lockless algorithms for mere mortals

ras — Sat, 01 Aug 2020 04:11:22 +0000

> In other words, you can just as easily argue that the present macro state is more likely to have evolved from past macro states which had more microstates.

Actually no, you can't. In the definition of entropy the number of microstates is fixed and you are measuring how many of those microstate arrangements look like a given macro state.

That aside, no one is disputing there is a thermodynamic arrow of time. The argument being made is that all transitions between microstates are reversible, lossless in terms of information and equally likely in both directions. It's possible for that to be true for micro states and yet there be preferred direction for the evolution of macro states. It's not only possible, it's what happens.

It's not only what happens, there is no mystery as to why it happens. For example, take a go board covered with black and white stones and assume we are near blind. If all black stones are on one side of the board and white on the other we can see that, so black/white is a macro state, but the rest of the possible microstate arrangements just look like grey to us, so grey is another macro state. Back/white corresponds to a few micro states where the black and white stores are poorly mixed. I think there are 361! / 181! possible board arrangements or about 10^431. Most of those states will look just grey - a random mixture of black and white. If the pieces are moving randomly, all of the microstates will be equally likely. This means the odds of seeing black/white instead of grey look to be about 1 in 10^245 if you assume 10% can be out of place. [0] If your go board starts in the black / white macro state, then is allowed to evolve through random 100% completely reversible symmetric swaps you will see it change to grey, and stay that way.

That is the thermodynamic arrow of time. Yet it arose from a completely reversible process. Microstates and macro states are both reasonable ways of looking at the system. When we say time is reversible, we are talking about the former.

[0] Caveat: I am deriving the formulas for these probabilities in my head, as I write this. They are almost certainly wrong.

Lockless algorithms for mere mortals

HenrikH — Sat, 01 Aug 2020 02:26:47 +0000

Two full mb:s for inserting a single item in a linked list will probably be slower than just using a mutex for the whole operation, as stated above going lock-less is only of interest if you are after maximum possible performance and calling mb() at each step is not that.

Lockless algorithms for mere mortals

ras — Sat, 01 Aug 2020 02:09:19 +0000

Defending myself, my point wasn't that you couldn't explain what happens in terms of Newtonian physics. It is that personally I find it a pretty unsatisfying explanation. In Newtonian physics time and information are a given, causality is assumption - usually just an unstated assumption. If you are going to rely on unstated beliefs to derive something you may as well say God did it. In fact God is better in some ways - it at least makes the limits of your understanding plain.

But, Newtonian physics is an emergent property of other sets of rules. In those rules time, information, and causality are fundamental building blocks. Information in particular goes pretty deep - for example as others have stated here the definition of thermodynamic entropy is actually not about all possible arrangements of particles because there is no way to tell if you have swapped two identical particles, so such rearrangements can't be counted. How on earth does the Universe even know I can't distinguish between them - it's not as if it's looking through my eyes. But it does know, because if I don't discount the identical looking states when I calculate the energy I can extract from a thermodynamic inequality, I get the wrong answer. It goes even deeper in quantum mechanics. The different outcomes when you don't and don't know something become positively bizarre - plainly obvious interference patterns disappear just because I know something. [0]

It was partly observations like these that are utterly inexplicable by Newtonian Physics that led to the development of quantum mechanics, and at the other end of the scale relativity. If you are going to start marking analogies between computers programmers struggles with what can and can't be known and order things must happen (aka causality) and what physics has to say on the matter, you are far better starting with physics theories that feature those things front and centre, rather one one that ignores them completely.

[0] To this day I've yet to find a answer as to why these bizarre quantum mechanic effects happen, so it's this new physics is not _that_ satisfying. Instead "all" the physicists have done is built mathematical models of our universe that assume it happens, and those models are insanely accurate. So there is not doubt you must take the relationship between time and information, and causality into account. But the mechanism driving that relationship appears to be a total mystery. At least that my understanding, but I am just an fascinated bystander.

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 22:47:00 +0000

There are many use cases for which memory_order_relaxed is reliable and useful, but there are dragons as well. Part of the problem is that C and C++ do not respect dependencies (with a very few exceptions), so the dragons are much more difficult than they are for things like READ_ONCE() and WRITE_ONCE() in the Linux kernel.

On the reference count increment, the trick is that although the increment can be relaxed, the act of passing it somewhere else must be properly ordered. This proper ordering is often supplied by something else, though, such as thread-creation primitives. Whether this is worthwhile can depend on the CPU.

Lockless algorithms for mere mortals

itsmycpu — Fri, 31 Jul 2020 22:09:42 +0000

The PDF is an interesting read (though it will take me a few days to go through it), thank you for the reference. Also Dekker synchronization so far escaped my attention, at least by name, I will be looking into it.

My first impression regarding the described use cases of memory_order_relaxed is that it is worse than I thought. Not because I would doubt that there are algorithms that make good use of it (like seqLock), but because it seems really difficult to navigate the potential problems in most cases.

For example the use in reference counting increments. Even if decrements are non-relaxed, it isn't immediately obvious to me that increments can be relaxed. Although I am well aware that in the handover of a reference from thread A to thread B there needs to be a moment where an original reference is still in place, and that the atomic value itself is sequential for RMW operations like increment and decrement (which are already expensive enough), I would not be sure that thread B necessarily needs to execute the necessary barriers after incrementing the reference relaxed, and before starting to load values from the referenced object. This would seem to mean that the compiler (or the hardware) is free to move the effective load time to before the reference increment becomes effective. So the logic that prevents this is probably very tricky (at least from my point of view, unless I'm simply missing some principle that provides for verification to become much easier). My guess would be that the necessary barriers might be involved in assuring thread A that it can release its reference. But I'm not aware of a theoretical construct that would guarantee this expectation to always apply, in general, for unknown use cases.

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 21:27:48 +0000

Music to my ears! And please don't keep your views secret!!! It is always better when I am not the only one defending non-memory_order_seq_cst atomics. :-)

Lockless algorithms for mere mortals

itsmycpu — Fri, 31 Jul 2020 21:11:20 +0000

On x86, memory_order_seq_cst is not of practical interest because of the performance impact of the mfence instruction required in the implementation of the store operation.

Maybe it's a thing for college beginner class. ;)

Lockless algorithms for mere mortals

NYKevin — Fri, 31 Jul 2020 21:06:43 +0000

> Entropy increases because systems tend towards the macrostates which have the most microstates (again, purely because of statistics).

The problem with this argument is that it is time-symmetric. In other words, you can just as easily argue that the present macrostate is more likely to have evolved from past macrostates which had more microstates. Observation tells us that the time-reversed argument is empirically wrong (because we observe entropy increasing as a function of time), which is a bit of a problem because it appears to be symmetric with the (empirically correct) non-reversed argument. The only (obvious) way to break this symmetry is to assert a boundary condition of low entropy at or near the beginning of the universe. This boundary condition, IMHO, is the real mystery: Why did the early universe have such low entropy? I don't think we have a working answer to that question (yet).

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 18:02:50 +0000

Please feel free to send a patch implementing your reordering suggestions.

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 18:01:15 +0000

We recently had to defend memory_order_relaxed in the committee. Here is Hans Boehm's and my A Relaxed Guide to memory_order_relaxed [PDF] which was the basis of that defense.

The “Dekker synchronization” refers to the core of this locking algorithm, which as Paolo said is set up so that if two threads each doing a store to one variable followed by a load from the other, at least one of the threads sees the other's store. This is used heavily in the Linux kernel, for example, to correctly resolve wait/wakeup races.

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 17:55:12 +0000

Having per-subsystem documentation of the memory-ordering use cases of particular importance to that subsystem makes a lot of sense to me! Many of the five you call out are heavily used, but even given common documentation of a given use case, many subsystems might still want to cover special cases or how that use case interacts with the rest of the subsystem.

Naming is fun. Yet another common name for Dekker synchronization is store buffering. We probably need to list the common names for each use case in the LKMM documentation.

Good point on pthread_create() and pthread_join(). The same applies to things like mod_timer() and the resulting handler function.

Your point about C11 memory_order_seq_cst does me good, given how difficult it for me and a few others to get the committee to accept that the standard should include anything in addition than sequential consistency. ;-)

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 17:14:17 +0000

The complexity of memory ordering can be fully explained with normal Newtonian physics. For example, you can make good analogies between memory misordering and misordering of messages carried by sound waves.

Simple propagation delay is more than enough to make all these apparent reorderings happen.

But it is of course often more fun to think in terms of more trendy views of physics. ;-)

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 16:56:48 +0000

One of the motivations for LKMM was the increasing difficulty of dealing with memory-barriers.txt. In fact, LKMM is intended to be an automated replacement for large portions of memory-barriers.txt.

Lockless algorithms for mere mortals

PaulMcKenney — Fri, 31 Jul 2020 15:30:14 +0000

Thank you all for the bug report, and work has started on improving the documentation.

I would never have believed this two years ago, but it just might be possible to get to a point where someone reading only the tools/memory-model documentation might have a fighting chance of being able to make good use of LKMM. Here is hoping, anyway...

Lockless algorithms for mere mortals

joib — Fri, 31 Jul 2020 11:01:47 +0000

I'm saying we don't know. The standard model in particle physics encompasses every known interaction, except gravity. We don't have a functioning theory of quantum gravity. That doesn't mean that gravity isn't quantised, but it could alternatively mean that gravity is fundamentally different from the other interactions (say, describing instead the curvature of space-time like in GR). We just don't know yet.

Gravity waves neither prove nor disprove that gravity is quantised, just like a plethora of electromagnetic wave phenomena that neither prove nor disprove that the electromagnetic field is quantised.

And to be pedantic, a photon is a quantised excitation in the electromagnetic field (not the field itself). An electron is a quantised excitation in the electron-positron field. Sure, it would be nice and symmetrical if there analogously would be a "graviton". Unfortunately nature doesn't much care for our notions of beauty and symmetry, and so far has resisted our efforts to crack this particular nut.

Lockless algorithms for mere mortals

Wol — Fri, 31 Jul 2020 08:47:10 +0000

Are you saying that gravity is the only known wave without an associated particle?

A photon is an electro-magnetic wave. An electron is a wave (not sure what in :-). Symmetry says surely a gravitational wave has a graviton?

Cheers,
Wol