Pointer comparisons ignore provenance

Posted Oct 16, 2024 17:31 UTC (Wed) by Tobu (subscriber, #24111)
In reply to: C++ already gives us a tool for pointer zapping by NYKevin
Parent article: Zapping pointers out of thin air

See this comment by Ralf Jung working out the options: your -1 preferred, 1 in LLVM would have been worked around by casting the pointers to usize before doing the comparison. Final decision is that pointer comparison ignores provenance.

There are two consistent options for pointer comparisons in Rust (given that they have to be safe):
only addresses are compared: result is true if and only if the addresses are equal
provenance may be taken into account:

distinct addresses always compare inequal
equal address + equal provenance always compares equal
equal address + inequal provenance may non-deterministically compare either way

Pointer comparisons ignore provenance

Posted Oct 16, 2024 18:40 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (3 responses)

Also notice that if I implement (1) but you insist you want (2) I can just say that's what I'm offering.

You can't prove me wrong because the only difference in (2) is that it's possible to get a different answer, but never required, so, I can tell you that you just got unlucky and that's why it seems as though I implement (1).

So the rational thing to do is implement (1) which is at least explainable.

Pointer comparisons ignore provenance

Posted Oct 19, 2024 19:41 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (2 responses)

(1) is not semantically equivalent to (2). Under (2), if you tell me that pointers x and y are equal, then I am entitled to assume that they have the same provenance and so I can use either pointer to access the same object. Furthermore, (2) also extends to atomic compare-and-swap, meaning that you are required to ensure that the whole pointer-zapping problem described in the article does not arise in the first place (or at least, that the resulting binary behaves as if the problem does not arise).

(1) provides neither of those guarantees, so it is not equivalent to (2) no matter how lucky or unlucky I might get.

Pointer comparisons ignore provenance

Posted Oct 19, 2024 19:46 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (1 responses)

NB: This is assuming you're talking about my (1) and (2), and not Tobu's, because that seemed obviously wrong to me. Tobu's (1) requires that casting both pointers to usize and comparing them produces the same result, whereas (2) permits it to non-deterministically return false in cases where the provenance does not match, so (2) allows optimizations that (1) forbids. It is certainly not the case that compilers will prefer to leave optimizations on the table if they are permitted by whatever standard we end up adopting.

Pointer comparisons ignore provenance

Posted Oct 20, 2024 8:43 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

Unsurprisingly (surely even to you?) when I respond to Tobu, I am referring to what Tobu wrote.

If you want your numbering to exclusively refer to a specific thing you once wrote somewhere you're going to need much bigger numbers, I recommend 128-bit UUIDs for this work, however, since we're talking among people, I would suggest that naming things, though it has some undesirable quirks, is preferable, that's why the "worst case" provenance model offered in Rust and the default provenance proposed for C is referred to as "PNVI-ae-udi" rather than some hard to distinguish UUID, when I typo'd that name earlier everybody still knew what I meant.

As to the substance: Useless "optimisations" have to be painstakingly eliminated so there's no reason to ask for them knowing that you won't use them. Yes it is likely that LLVM will decide they can "optimise" ptrA == ptrB in some way that's not helpful, and if they want to do that Rust will just emit the IR for ptrA.addr() == ptrB.addr() instead so that it gets the required semantics. Today it doesn't matter because (as I wrote) there's an LLVM bug and as a result LLVM can believe that for two machine integers A, B A != B but A - B = 0 which is nonsense and the LLVM devs know it's nonsense but it's a consequence of this sort of "optimisation".