LWN: Comments on "Lockless patterns: full memory barriers" https://lwn.net/Articles/847481/ This is a special feed containing comments posted to the individual LWN article titled "Lockless patterns: full memory barriers". en-us Mon, 03 Nov 2025 22:00:33 +0000 Mon, 03 Nov 2025 22:00:33 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Lockless patterns: full memory barriers https://lwn.net/Articles/852875/ https://lwn.net/Articles/852875/ pbonzini <div class="FormattedComment"> It&#x27;s not intuitive at all, but only one memory barrier matters in each of the two cases. But both are needed (separately) to ensure that x=0 &amp;&amp; y = 0 is not possible.<br> </div> Fri, 16 Apr 2021 07:01:23 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/849345/ https://lwn.net/Articles/849345/ firolwn <div class="FormattedComment"> After reading &#x27;Multicopy atomicity&#x27; section from kernel Documentation/memory-barriers.txt, I realize that I am wrong and no more smp_mb() is necessary to add.<br> --<br> Firo<br> </div> Mon, 15 Mar 2021 14:35:53 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/849227/ https://lwn.net/Articles/849227/ firolwn <div class="FormattedComment"> Hi Paolo, great article. I think maybe you forgot to add &#x27;smp_mb();&#x27; to the two diagrams which are just around the line starting with &#x27;Due to transitivity&#x27;.<br> --<br> Firo<br> </div> Fri, 12 Mar 2021 17:03:25 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848883/ https://lwn.net/Articles/848883/ PaulMcKenney <div class="FormattedComment"> Or I haven&#x27;t yet had a chance to read it thoroughly, your choice. ;-)<br> </div> Tue, 09 Mar 2021 20:57:34 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848880/ https://lwn.net/Articles/848880/ pbonzini <div class="FormattedComment"> If &quot;that&#x27;s all you have to say about that&quot; (as Forrest Gump would put it), then I guess you didn&#x27;t find any mistake, yay!<br> </div> Tue, 09 Mar 2021 20:31:23 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848764/ https://lwn.net/Articles/848764/ jcm <div class="FormattedComment"> Thing is you don’t even need to flush the sucker, just track age and ensure that they’ve all passed through. You can blow right through every barrier without cost provided you are willing to track enough cache lines in the process. I have been doing a lot of thinking lately about renaming SPRs and speculating through serializing instructions too. There’s no reason you couldn’t (provided you tracked everything, were willing to pay the cost, and also could precisely unwind the state to prevent side-channel crumbs, which might force you to add eg a side buffer). This has been dancing in my head for the past week solid.<br> </div> Tue, 09 Mar 2021 05:00:41 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848757/ https://lwn.net/Articles/848757/ PaulMcKenney <div class="FormattedComment"> Nice gentle introduction to the reads-from relation! (&quot;~~~~~~~~&quot;) ;-)<br> </div> Tue, 09 Mar 2021 01:41:25 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848735/ https://lwn.net/Articles/848735/ jcm <div class="FormattedComment"> Yea, I did after. I agree, thanks :)<br> </div> Mon, 08 Mar 2021 20:56:41 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848612/ https://lwn.net/Articles/848612/ pbonzini <div class="FormattedComment"> Read the paragraph again, it mentions both scenarios. :)<br> </div> Sun, 07 Mar 2021 20:56:55 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848604/ https://lwn.net/Articles/848604/ jcm <div class="FormattedComment"> Btw, in the SB example, it&#x27;s not that the store buffer is forwarding locally, &quot;which would return zero&quot;, it&#x27;s that each is writing into its local store buffer while reading the other variable from the initial state. The SB allows the stores to be delayed in terms of observation by other processors relative to the local one.<br> </div> Sun, 07 Mar 2021 16:10:35 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848603/ https://lwn.net/Articles/848603/ jcm <div class="FormattedComment"> x86 processors maintain the illusion of TSO ordering through the use of a MOB (Memory Ordering Buffer), which not only tracks the cache lines for ownership/invalidation but also replays at retirement if necessary in order to maintain ordering. So e.g. a load might be performed twice in order to ensure it retires correctly.<br> </div> Sun, 07 Mar 2021 16:02:25 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848545/ https://lwn.net/Articles/848545/ pbonzini <div class="FormattedComment"> Yep, I even mentioned (with a little bit of simplification) what Intel does for TSO. To some extent it should be possible to do the same for full barriers, by delaying invalidate messages and keeping them buffered until the store buffer has been flushed. After all in most cases there is no race and therefore the effect of the barrier is kind of invisible.<br> </div> Sat, 06 Mar 2021 17:44:20 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848541/ https://lwn.net/Articles/848541/ jcm <div class="FormattedComment"> The mental rabbit hole is separating the perceived observed order from reality. Reality might involve combinations of invalidation queue and store/load buffer tracking, but it might involve something very different :)<br> </div> Sat, 06 Mar 2021 15:28:07 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848540/ https://lwn.net/Articles/848540/ jcm <div class="FormattedComment"> One of the things that has been keeping me up at night recently thinking has been speculating through barriers. You do it by tracking cache ownership/state transitions as you go through and roll back as needed. The easiest way to think about it is probably as close to the apparatus needed for transactions. And there are a few papers in this area. So anyway, the point is draining various buffers isn’t the only way to pull this off. Think how Intel does their MOB for TSO by tracking updates and replaying too.<br> </div> Sat, 06 Mar 2021 15:25:07 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848456/ https://lwn.net/Articles/848456/ pbonzini <div class="FormattedComment"> Yes, it&#x27;s only about the observed order, and in different ways depending on whether the tricks come from the compiler or the processor.<br> <p> The genius stroke of C++11 compared to e.g. the Java memory model was to treat the compiler+processor combo as weak memory ordering even if the underlying architecture is TSO. I don&#x27;t think any compiler applies very much the leeway that it&#x27;s given, but it does make for a nice and consistent model at the language level.<br> <p> I do prefer the LKMM and its clear foundation on the behavior of hardware (which I tried to convey in this article) to C++11&#x27;s slightly too handwavy &quot;just use sequential consistency unless you need something else&quot;.<br> </div> Fri, 05 Mar 2021 19:06:47 +0000 Lockless patterns: full memory barriers https://lwn.net/Articles/848454/ https://lwn.net/Articles/848454/ jcm <div class="FormattedComment"> A key thing to remember with memory barriers is they are only about the observed order of memory operations. An implementation is actually free to do very different things with those barriers (including speculating right through them, which you can do as long as there is no circumstance under which someone can observe incorrect ordering).<br> </div> Fri, 05 Mar 2021 18:50:36 +0000