Lockless algorithms for mere mortals

Posted Jul 28, 2020 22:45 UTC (Tue) by excors (subscriber, #95769)
In reply to: Lockless algorithms for mere mortals by warrax
Parent article: Lockless algorithms for mere mortals

Machine-checkability does seem important. Recently I wanted to implement the widely-used Chase-Lev lock-free work-stealing queue in C++, but the original paper describes it with Java and completely ignores memory barriers. I found another paper with an optimised version using C11 atomic memory ordering, which spends five pages describing a proof of correctness, so it sounded like the authors knew what they were talking about and I implemented that. (The "optimised" part is valuable - their benchmarks show that using relaxed memory barriers where it's safe can double performance compared to using sequentially-consistent barriers everywhere.)

Then later I found a third paper about a tool that checks C/C++ atomic code against a specification, which mentions their tool found a bug in that other paper's C11 implementation. I think I ended up with the correct code, but it's really very tricky even when trying to follow supposed experts on a pretty simple algorithm, and it would be very nice if those checking tools were in more widespread use.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 14:41 UTC (Wed) by bgamari (subscriber, #102670) [Link] (1 responses)

> Then later I found a third paper about a tool that checks C/C++ atomic code against a specification, which mentions their tool found a bug in that other paper's C11 implementation.

Do you have a reference to this paper? Indeed I have used the C11 paper's Chase-Lev implementation in the past so I'm very likely missing a barrier.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 15:14 UTC (Wed) by excors (subscriber, #95769) [Link]

There's a corrected version of the buggy paper with C11 atomics: Correct and efficient work-stealing for weak memory models. The bug in the original published version was found by CDSChecker and CDSSpec, and their source repository has the original and bugfixed versions for comparison.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 18:18 UTC (Wed) by jezuch (subscriber, #52988) [Link] (3 responses)

> but the original paper describes it with Java and completely ignores memory barriers.

Note that Java has a formal memory model (that's how they can do anything performant without horrible memory unsafety), so it's likely that these memory barriers were not ignored, just... assumed. (But yeah, they should have at least mentioned which features of the memory model they rely on.)

Lockless algorithms for mere mortals

Posted Jul 31, 2020 5:51 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

Originally Java's memory model was unimplementable on Power and ARM. They had to doctor it, and it requires more attention than it once did. It is very possible that not everybody has caught onto the actual requirements. There is probably a great deal of Java, and C and C++ and everything else that actually works reliably only on x86. If there.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 7:02 UTC (Sun) by jezuch (subscriber, #52988) [Link] (1 responses)

It was probably very naive at the beginning, like a lot of Java. But I think the interesting thing is that, at least as I understand it, it's specified in very different terms than the C model, which uses esoteric jargon, while the Java model tried to make it at least understandable for mere mortals. Not sure how well it succeeds - usually you don't have to go into this level of detail unless you want to do something unusual, but at that point you're clearly very clever, so... ;)

Lockless algorithms for mere mortals

Posted Aug 12, 2020 18:49 UTC (Wed) by briangordon (guest, #135428) [Link]

It depends on what you're doing with it. If you're implementing a JVM I'm sure the technical details of the spec aren't easy to grapple with, but the practical implications are reasonably intuitive for developers. That's for isolated snippets of code though. In a complex system with lots of moving parts, it can still become ferociously difficult to maintain your invariants, so you usually want to work with abstractions that wrap the low-level primitives to make things as dead simple as possible. For example, the actor model has your code running in single-threaded actors that can only communicate by sending messages to each other - there's concurrent code under the hood, but you don't have to think about it.

Lockless algorithms for mere mortals

Posted Aug 1, 2020 17:23 UTC (Sat) by anton (subscriber, #25547) [Link] (1 responses)

Machine-checkability does seem important.

Or one might consider such a requirement (as well as the article) to be an indication that the concepts are too difficult to use. In other cases (e.g., wrt. page colouring), Linus Torvalds has written that he targets hardware that performs well without such complications, and that other hardware deserves what it gets. I think that is a sensible approach for memory ordering as well. Have a model that's easy to understand and performs ok on nice hardware (not sure if nice hardware already exists, but weak consistency certainly does not look nice to me); then hardware designers have an incentive to make their hardware nicer.

It seems to me that all this difficult-to-program memory ordering is there because multi-processors originally come from the supercomputing area, where hardware is still more expensive than software, and hardware designers can get away with making hardware that's hard to program correctly and efficiently.

Lockless algorithms for mere mortals

Posted Aug 3, 2020 10:12 UTC (Mon) by farnz (subscriber, #17727) [Link]

I would say that machine-checkability is necessary but not sufficient. If the rules are so complex to encode that a computer can't verify that you're getting them right, then they are also too complex for a human to get right, too, and they are certainly too complex for a human code reviewer to reliably check.

That said, this is a minimum requirement, as you can have rules that a computer can reliably verify, but that humans get wrong - see DEC Alpha ordering for the classic example.