An instruction-level BPF memory model

Posted Jun 6, 2024 13:53 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624)
In reply to: An instruction-level BPF memory model by foom
Parent article: An instruction-level BPF memory model

That most certainly would be a novel claim. ;-)

But the question is instead: "In ARMv8's hazard-ordered-before (HOB), does the fact that R2 cannot read a value older than that read by R1 impose additional ordering consequences?"

The answer for PowerPC is "no". So to the extent that HOB does impose additional ordering consequences, the BPF memory model needs to avoid imposing similar ordering consequences. This is not a surprise, given that ARMv8 is other-multicopy atomic and PowerPC is non-multicopy atomic. So we should expect that there are constraints in the ARMv8 memory model that do not apply to the BPF memory model. This is all OK, because it is OK for hardware to provide stronger ordering than required, with x86 being the usual poster child for this extra strength.

For more detail, please see slides 59-77 of the presentation at https://drive.google.com/file/d/1zLkpBVL1chDfyqZATOyKb_1Q...

Perhaps one of your questions is addressed by slide 64 and later. The initial scenario involves only one variable, but the added scaffolding required to test for additional ordering does add another variable. Does that help?

An instruction-level BPF memory model

Posted Jun 6, 2024 16:48 UTC (Thu) by comex (subscriber, #71521) [Link] (2 responses)

So Daroc indeed described this incorrectly in the article.

Which is quite understandable! The property actually being tested is subtle enough that even with the benefit of reading the slides repeatedly and then reviewing the ARM architecture reference manual, I still don't really understand it :)

An instruction-level BPF memory model

Posted Jun 6, 2024 17:06 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

Thank you for catching that; I had to read through Paul's slides again myself to figure out what was going on.

I think that I have made a correct correction to the article.

An instruction-level BPF memory model

Posted Jun 6, 2024 17:10 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Looks good to me!

An instruction-level BPF memory model

Posted Jun 6, 2024 18:43 UTC (Thu) by Alan.Stern (subscriber, #12437) [Link] (3 responses)

The situation is very confusing; a more detailed explanation may help.

For one thing, consider the description of Hazard Ordered Before (HOB) on slide 60 of your talk. The slide isn't entirely clear; does it require that R1 is coherence-before W2, or does it require that R1 executes before R3? The first meaning (which seems to be the one you're using) follows directly from the read-read coherence rule, so it is a consequence of per-variable sequential consistency. Since you undoubtedly want the BPF memory model to include this, there's no need to consider HOB. And to be crystal clear, PowerPC does not allow an earlier read to observe a later value than a later read of the same variable in the same thread.

Adding to the confusion is the fact that, despite its title, the litmus test on slide 69 does not in fact test for HOB. That is, the condition in the "exists" clause isn't 0:r3=1 /\ 0:r5=0. The whole added business about y and the memory barriers is irrelevant for HOB. (Compare it to the Linux kernel code fragment on slide 62.) If you wanted to discuss the R pattern in slides 63-68, you shouldn't have gotten it all mixed up with HOB. (That could well be what confused Daroc.) It probably would have been better to avoid both topics and consider other-multicopy atomicity instead.

Another thing is worth pointing out, relating to slide 57. Why doesn't the LKMM include Dependency Ordered Before? Not because it knows the full execution a priori -- that's equally true for the hardware memory models. The real reason goes back to the early days of development of the LKMM. Originally we had two different versions, a strong and a weak model. The strong model _did_ include Dependency Ordered Before. But we ended up dropping the strong model; the current LKMM is a descendant of our weak model.

An instruction-level BPF memory model

Posted Jun 7, 2024 15:40 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

No argument whatsoever with the confusing part! ;-)

Agreed, HOB has to be more than coherence-before. And the point of those slides was in fact to come up with a test for some ordering stronger than coherence-before. Slide 69's attempt does show a difference between ARMv8 and PPC using R, but you are quite right if you are claiming that I did not prove either: (1) That this difference was in fact a consequence of ARMv8 HOB, or (2) That there is not some better test for the effects of HOB beyond coherence-before. I would of course be very happy for you to show me a better litmus test demonstrating the beyond-coherence-before effects of HOB! I am with you in suspecting that other-multicopy atomicity has something to do with it, but I have not yet proved this to myself. (Nor have I yet asked the ARM folks because experience indicates that if I haven't beat my head pretty hard against the question, I won't understand their answer.)

I am not so sure that I agree with your suggested change to the "exists" clause, given that DOB requires that R3 be coherence-before E2. What am I missing here?

You are quite correct on the historical sequence of events leading to our deciding to leave DOB out of LKMM. I was instead giving a post-facto rationale for that decision. I do not see how current hardware with current restrictions on speculation could leave out DOB without undue violence to the rest of its memory model, which might well be a failure of imagination on my part. So what are your thoughts on how reasonable hardware could leave out DOB?

An instruction-level BPF memory model

Posted Jun 7, 2024 18:58 UTC (Fri) by Alan.Stern (subscriber, #12437) [Link] (1 responses)

Responding to your later points first...

> I am not so sure that I agree with your suggested change to the "exists" clause, given that DOB requires that R3 be coherence-before E2. What am I missing here?

Here we are discussing a litmus test (on slide 69) for HOB, not DOB, so I'll assume your "DOB" above is a typo.

Yes, HOB requires that R3 is coherence-before E2. Hence the "exists" clause should test for the value of R3 (i.e., 0:r5) being the original value of y, namely 0. The conclusion of HOB is that R1 must also be ordered before E2, in other words, that the value of R1 (i.e., 0:r3) must also be the original value of y, namely 0. (Or to put it another way, HOB says that while R1 might be executed later than R3, out of program order, it can't be executed so _much_ later that it observes the value written by E2.) Therefore to test for a violation of HOB, the "exists" clause has to ask whether 0:r3 is different from 0 and 0:r5 is equal to 0. That's what my suggested change does. (Notice that none of the executions listed on slide 70 violate HOB in this way.)

> I do not see how current hardware with current restrictions on speculation could leave out DOB without undue violence to the rest of its memory model, which might well be a failure of imagination on my part. So what are your thoughts on how reasonable hardware could leave out DOB?

I agree with your point of view; DOB is a practical necessity. Another thing it is needed for, which the slides don't mention, is precise exceptions. Consider the Linux-kernel litmus test on slide 41; if the value of gp happened to be NULL so that the write to r1[i].value caused an addressing exception, the exception wouldn't be precise if the write to x (later in the program) had already been executed.

Nevertheless, even though I expect DOB will be enforced by all hardware that runs the Linux kernel, there doesn't seem to be any good reason for including it in the memory model. As far as I know, nobody has suggested any use cases for it or shown that any existing code in the kernel relies on it.

> Slide 69's attempt does show a difference between ARMv8 and PPC using R,

Indeed it does. And I would claim that this difference is _entirely_ due to R and has no connection with HOB at all. Despite what the slide says.

> I would of course be very happy for you to show me a better litmus test demonstrating the beyond-coherence-before effects of HOB!

I don't know what effects you're talking about. As far as I can tell, the only effect of HOB is related to coherence ordering; it says that the value read by R1 has to be earlier in the coherence order than the value written by E2.

An instruction-level BPF memory model

Posted Jun 11, 2024 17:08 UTC (Tue) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Yes, I should have said "HOB", apologies for my confusion.

On the "exists" clause, no memory model that enforces single-object sequential consistency can have 0:r3=1 and 0:r5=0. I was therefore looking for some consequence of HOB beyond single-object sequential consistency, hence my "exists" clause matching the HOB defintion.

A simpler approach, which I should have thought of to begin with, is to disable HOB in the Aarch64 memory model, for example, like this:

let haz-ob =
  (* Exp-haz-ob | *) TLBI-ob | IC-ob

(If you try this in a clone of the herd7tools repo, don't forget to “make install” so that your herd7 command knows about the change. And don't forget to do so again after undoing this act of vandalism!)

With this change, running herd7 on catalogue/aarch64/tests/CoRR.litmus says Sometimes, which supports your analysis of HOB, and invalidates mine. So, like DOB, HOB is not optional from a hardware viewpoint.

I agree that LKMM should continue to ignore DOB, at least until someone comes up with a good use case. But given that leaving out HOB causes the ARMv8 memory model to say Sometimes for CoRR.litmus, HOB is not optional.

Thank you for keeping me honest!