|
|
Subscribe / Log in / New account

Progress on defeating lifetime-end pointer zapping

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

By Daroc Alden
October 7, 2025

Kangrejos

Paul McKenney gave a remote presentation at Kangrejos 2025 following up on the talk he gave last year about the lifetime-end-pointer-zapping problem: certain common patterns for multithreaded code are technically undefined behavior, and changes to the C and C++ specifications will be needed to correct that. Those changes could also impact code that uses unsafe Rust, such as the kernel's Rust bindings. Progress on the problem has been slow, but McKenney believes that a solution is near at hand.

He began by noting that the obvious way to write an atomic last-in-first-out (LIFO) stack as a linked list in C or C++ invites undefined behavior. Specifically, it can end up creating a pointer that has a valid bit pattern, but an invalid provenance. Imagine that a thread wants to push an item (A) onto a stack; it reads the pointer to the current top of the stack (B), stores that into A's next field, and then uses an atomic compare-and-swap instruction to store the pointer to A as the new top item only if the top-of-stack pointer still points to B.

So far, so good. Now suppose that a second thread concurrently pops an item off of the stack, frees it, allocates a new item (C), and then pushes it to the stack. If the new allocation has a different address, then the first thread's compare-and-swap operation will fail, and it will know to retry. But what if the memory allocator, seeing that a piece of memory of the right size has just been freed, gives the same memory back to use for the new item? In that case, the first thread's compare-and-swap operation will succeed, because the pointer to B and the pointer to C are bitwise identical. As far as the actual CPU is concerned, nothing has gone wrong. But according to the C abstract machine, even though the pointer to C and the pointer to B have the same bits, they have different provenance information. This makes the dangling pointer to B (which has been freed) a "zombie pointer", any use of which is undefined behavior. Currently, compilers aren't taking advantage of this particular piece of undefined behavior, but McKenney wants to get out ahead of the problem by agreeing on a solution ahead of time.

Importantly, it's not really possible to change all of the code that works this way, because there are so many open-coded atomic LIFO stacks or equivalent pieces of code spread throughout almost every nontrivial multithreaded project, McKenney said. The first implementation of this kind of stack is unknown, but it probably dates to the 1960s. There was at least one implementation in the 1970s that referred to the technique as being generally known.

Last year, Davis Herring's "angelic provenance" proposal looked as though it would help, but the C++ committee found examples of some code where angelic provenance would "invalidate some really important optimizations". Herring's proposal would add a new rule requiring that when an integer is cast to a pointer (or, if McKenney's separate proposal is adopted, when a pointer is loaded from an atomic type), if there is any choice of pointer provenance that the compiler could make that will prevent the program from having undefined behavior, the compiler has to pick that option.

Technically, the requirement is that there is no happens-before relationship between the angelic choice and the object's creation. So if the object were being created in a separate thread, and the creation might occur after the choice, but the threads are not synchronized, so the compiler cannot prove that will be the case, the compiler will still be required to consider that object's provenance.

The modified rule would only require the compiler to consider the provenances of objects that already exist at the time of the operation (but see the sidebar for more information). This is both simpler for compiler authors to implement, and avoids problems with obtaining a pointer to an object before it is actually allocated. McKenney's additional proposal would make angelic provenance work for LIFO stacks by treating any loads through an atomic type as though the loaded pointers were just converted from an integer, so the angelic provenance rule would apply and the obvious way to implement an atomic LIFO stack would be defined behavior ... almost.

In C and C++, any operation with an invalid pointer isn't guaranteed to produce a sensible result. So, while reading a pointer value from an atomic variable for the LIFO stack no longer causes a problem (with the above proposals), writing the pointer value in the first place could be optimized out. So the final piece needed for the existing LIFO stacks to work would be a requirement that when writing an invalid pointer, the actual bits of the pointer are written to the destination, even if the provenance information is no longer usable.

All together, there is "not all that much" work to get all these changes through the committee. "There's less heartburn on current proposals than there has been in the past."

Ryhl raised a question that had come up when considering implementing analogous proposals in Rust — wouldn't the new rules prevent angelic choices from being reordered relative to "demonic" choices? Specifically, there are certain operations, such as memory allocation, where the compiler is allowed to assume that the operation goes as badly as possible for the program being optimized. Therefore, if the program is incorrect in those cases, it's incorrect overall. While Rust doesn't have the same undefined behavior rules as C, programmers writing unsafe Rust still need to uphold various invariants, and so might run into this. She shared this example Rust code from Ralf Jung demonstrating the problem:

    fn nondet() -> bool {
      let b = Box::new(0);
      ptr::from_ref(&*b).to_addr() % 3 == 0
    }

    let a1 = 0; let a2 = 1;
    let a1addr = ptr::from_ref(&a1).to_exposed_addr();
    let a2addr = ptr::from_ref(&a2).to_exposed_addr();
    let b: bool = nondet(); // demonic non-det choice
    let x = ptr::from_exposed_addr(a1addr); // picks provenance of a1 or a2
    let y = if b { x } else { x.with_addr(a2addr) };
    let _val = *y;

In this example, nondet() is a demonic choice (because the compiler knows that a newly allocated heap item could be located anywhere, so for the purposes of determining whether the function might be malformed, it can assume that the location is whatever would cause undefined behavior) and ptr::from_exposed_addr() would be an angelic choice under the proposed rules around integer-to-pointer conversions. The difference between the "exposed" functions and their unexposed variants is an artifact of Rust's experimental exposed-provenance model — when casting an integer to a pointer, only "exposed" provenances can be considered to become the provenance of the pointer. Taken together, this means that the compiler would be required to pick whichever provenance for x (out of a1addr and a2addr) makes the program valid, given its assumptions about b. Ryhl's point was that currently, reordering the assignment to x above the call to nondet() is permitted, since x doesn't depend directly on the value. But that optimization would be invalid under the proposed rules, because then the demonic choice could always pick whatever value makes the answer picked by the angelic choice invalid, and the program would break.

The conservative solution would be to require the compiler to avoid reordering angelic and demonic choices relative to each other, but that could prevent local optimizations such as moving assignment to a variable out of a loop. It's unknown what the practical performance impacts or knock-on effects of a requirement like that would be.

McKenney initially thought the example was broken as-is, until Gary Guo clarified that in Rust, one can have a pointer with an address that points outside the memory of its associated provenance, as long as it is not dereferenced. This is helpful because it allows for valid pointer arithmetic between more objects. That's not how it works in C or C++, so McKenney hadn't considered the problem. But once it was explained, he was thoroughly enthusiastic about the choice to allow pointers to be used in this way.

He complimented the Rust folks for allowing arbitrary pointer arithmetic, which is work that he has put off trying to get through the C++ committee. "I haven't thought about that, because I wanted to only fight one dinosaur in C++ at a time." He acknowledged that the example raised further problems, though, and commented that it would be exciting future work. Hopefully, by Kangrejos next year, he will have a satisfying answer. If the current proposals do get through the C and C++ committees, those languages will likely run into an analogous problem if they ever attempt to loosen the rules around pointer arithmetic.




to post comments

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 7, 2025 17:09 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (10 responses)

Reading through the example given in the article, it seems as if you could specify a rule such as the following:

> Whenever a compare-and-swap operation succeeds on a pointer-typed glvalue (or place expression in Rust terms), any pointer whose provenance matches that of the value compared against (i.e. the "old value" argument to the compare/swap) shall, as a side effect, be changed to one of the following provenances:
>
> 1. The provenance of the value compared against (the "old value" argument).
> 2. The provenance of the value overwritten (the glvalue operand).
>
> If only one of the above options would result in the program having defined behavior, the compiler must choose that option. This may cause different pointers to inherit different provenances from the same compare-and-swap operation, if necessary. If neither choice of provenance would give the program defined behavior, then the behavior is undefined.

I assume there must be a broader and more general case of this problem that I'm failing to understand, or else this wouldn't be a whole discussion. But I must admit, I'm having a hard time visualizing how you get a multithreaded provenance problem that *doesn't* revolve around compare-and-swap or a similar atomic operation. If you use pervasive locking, then provenance should move between threads just fine, and if you don't use pervasive locking, then atomics are already mandatory to avoid data race UB.

(Preempting the obvious reply: To the best of my understanding, this is not the same as "angelic" provenance, because that rule allows/forces the compiler to pull a provenance from anywhere in the program, not just the two specific pointer values that are at issue in the CAS operation.)

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 7, 2025 18:34 UTC (Tue) by daroc (editor, #160859) [Link]

Hmm. That's a good question!

Bearing in mind that I am not an expert, there are existing atomic non-compare-and-swap ways to synchronize between threads, in the form of relaxed writes/reads. In fact, imagine the following scenario involving an implementation of hazard pointers:

Thread 1 reads a pointer to A from the top of the linked list, and is then immediately preempted. Thread 2 pops from the list, frees A, allocates B in the same location, and pushes B. Now the pointer that thread 1 holds is a 'zombie' with the wrong provenance.

Now thread 1 writes the zombie pointer into its hazard-pointer list, instructing other threads not to free the referent of the pointer. Then it does a relaxed read of the top of the linked list, and checks that the list hasn't changed.

Now thread 1 thinks that it has a valid pointer and A can't be freed. But actually it has a pointer to B with the wrong provenance. The other threads won't free B, because there is a pointer to it in thread 1's hazard table, so thread 1 really is safe to operate on B. But if the compiler notices that the provenance is wrong, it could say this is undefined behavior. At no point was a compare-and-swap operation involved, but (I believe) if thread 1 hadn't been preempted at that exact moment, the whole scenario would be legal.

It's possible that I've missed some detail — provenance and concurrency are tricky — but I suspect that is the kind of more complex scenario that needs full angelic provenance to handle.

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 7, 2025 19:06 UTC (Tue) by comex (subscriber, #71521) [Link] (8 responses)

> To the best of my understanding, this is not the same as "angelic" provenance

"Angelic" just means that the abstract machine must pick whichever option results in defined behavior. So your proposal is a type of angelic provenance, but you're correct that it's weaker than P2434R4, the proposal linked in the article with the text "angelic provenance".

Abstractly, the reason your proposal is difficult is that under operational semantics, a pointer value *just is* an address and a provenance. It doesn't make sense for the provenance of an existing value to change any more than it would make sense for its address to change. A variable can change from one pointer value to a different pointer value, but that would require mutating it.

You might object: C's traditional lifetime-end pointer zap semantics do cause existing pointer values to change, including even the addresses! But those semantics are problematic and not really implemented faithfully.

You might object: who cares about operational semantics, it's just a model. True, but that model is becoming the basis for, e.g., LLVM IR semantics, in the ongoing, very slow attempt to put them onto firmer theoretical ground and prevent known miscompilations related to provenance. See e.g. https://bugs.llvm.org/show_bug.cgi?id=34548, the bug report about LLVM incorrectly optimizing `inttoptr(ptrtoint(x))` to `x`, which is still an issue 8 years later.

Let's see if I can come up with a concrete example. Given this code:

struct Foo { virtual void bar(); };

extern void do_something_with(Foo *);

void test(Foo *foo) {
foo->bar();
do_something_with(foo);
foo->bar();
}

Naively, for each of the two virtual calls, you have to load the vtable, then load the function pointer from the vtable. But Clang with `-fstrict-vtable-pointers` will optimize `test` to only load the function pointer once, then reuse it for the second call. This is because an object's vtable can't change during its lifetime.

But what if `do_something_with` frees `foo` and creates a new object that happens to end up at the same address? Then the new object might have a different vtable. The only reason the optimization is still valid is that using the `foo` pointer to access that new object would be undefined behavior, due to `foo`'s provenance being incompatible with the new object. The compiler can't see the body of `do_something_with`, so it has to rely on the fact that *any* attempt to free and reallocate the object would *necessarily* result in incompatible provenance.

However, under your rule, `do_something_with` could do a compare-and-swap operation to change the provenance, and the updated provenance would propagate back through all copies of the pointer, including `foo`. So the optimization would no longer be valid. Or if you wanted to keep it valid, you would have to define a rule for *why* the updated provenance doesn't propagate all the way back to `foo`, which might in turn cause problems for inlining or whatever.

To be fair, the value of this vtable optimization seems uncertain; I imagine the reason Clang keeps it behind a flag (for now) is that it breaks some real codebases. There are probably better examples involving different optimizations. I can think of some other ideas but they tend to involve code that's less realistic.

(Sidenote: Although I'm focusing on the case of freeing and reallocating, there is also the case of calling the destructor and then placement-new-ing a different object at the same pointer. If you do that, then you're supposed to call `std::launder` on the pointer, new in C++17. This is a form of provenance too, but one I'm less familiar with since it has no Rust equivalent.)

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 7, 2025 23:28 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

> A variable can change from one pointer value to a different pointer value, but that would require mutating it.

Well, there's the rub: In order for the original pattern to work, we have to do one of two functionally-equivalent things:

1. Mutate the provenance on the pointer we read, and all pointers derived from it, at the moment we do the CAS.
2. Say that the CAS retroactively determines the pointer's provenance.

I do not see the point in maintaining a distinction between those two operations, but I suppose some compiler engineers might find a notional time machine easier to swallow than notional spooky action at a distance. Either way, you don't know the pointer's final provenance until after the CAS.

> However, under your rule, `do_something_with` could do a compare-and-swap operation to change the provenance, and the updated provenance would propagate back through all copies of the pointer, including `foo`. So the optimization would no longer be valid. Or if you wanted to keep it valid, you would have to define a rule for *why* the updated provenance doesn't propagate all the way back to `foo`, which might in turn cause problems for inlining or whatever.

I think the way to make this work is to introduce a relationship between the CAS and the prior (atomic) read, so that we only alter the provenances of pointers derived from said read (or do magical time travel and give it the appropriate provenance from the moment it is read, despite not knowing that provenance until after the CAS succeeds). Then your example is not a problem, because the pointer foo is not derived from whatever nonsense is going on inside of do_something_with().

There is another problem, which is that the pointer might be atomically overwritten more than once, so really we should consider N distinct possible provenances. But in most cases, I would tend to imagine that only the first or the last provenances are likely to work anyway, so maybe this is unnecessary.

> there is also the case of calling the destructor and then placement-new-ing a different object at the same pointer.

Sure, but as you say, that is already protected by std::launder (or should be).

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 0:21 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Yes, the backwards-in-time provenance propagation to any related pointer would not be well received. ;-)

A few years ago, we were pushing on a provenance barrier that would have the effect of laundering a pointer and any pointer reachable from it at that point in time. This got some positive interest, but eventually was rejected. There have been many earlier proposals, which can be found here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/.... And thank you all for your interest in this topic!

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 6:28 UTC (Wed) by mb (subscriber, #50428) [Link] (4 responses)

>But what if `do_something_with` frees `foo` and creates a new object that happens to end up at the same address?

I'm just wondering:
Would this thing be UB and therefore no problem in Rust, because the self reference to foo will be alife and therefore the deallocation is guaranteed to not be done?

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 12:55 UTC (Wed) by daroc (editor, #160859) [Link] (3 responses)

In Rust, if the function takes ownership of its parameter, the calling function can't use it afterward. Conversely, if the function takes its parameter by reference, it can't free the object. So yes, this problem is ruled out at compile time.

... unless your type has a mutable interior cell that stores a trait object. In that case the function could swap out the trait object for a different one with a different vtable, and the calling code would have to re-load the pointer to the vtable. But the compiler can tell whether that's a possibility by inspecting the types involved.

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 18:19 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

> ... unless your type has a mutable interior cell that stores a trait object. In that case the function could swap out the trait object for a different one with a different vtable, and the calling code would have to re-load the pointer to the vtable. But the compiler can tell whether that's a possibility by inspecting the types involved.

I don't think that works.

&dyn Trait, and everything derived from it (such as &UnsafeCell<dyn Trait>) is a fat pointer. It has one pointer to the object, and a separate pointer to a statically allocated vtable. This vtable exists per type, not per instance, so you can't overwrite it, and swapping it for a different vtable is considered a mutation of the pointer, not a mutation of the pointee.

The other, more prosaic issue with this is that two different trait objects (with the same trait bound) are not necessarily the same size, so swapping them might not even be possible (the larger object will not fit into the smaller object's allocation).

Of course, if you have UnsafeCell<&dyn Trait>, then you can mutate the pointer rather than the pointee, but that's not specific to trait objects (and it seems rather obvious to me that you can't do this optimization if the pointer could be mutated out from under you).

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 18:34 UTC (Wed) by daroc (editor, #160859) [Link] (1 responses)

You're right; sorry, I was unclear. I meant that if you had a type like this:

    Arc<Mutex<&dyn Foo>>

... then when you lock the Mutex and call .foo_method() on it, the Deref implementation is going to have to go get the vtable pointer from the &dyn Foo; if you then pass the mutex guard to another function and/or unlock and relock it, before calling .foo_method() again, the compiler will have to fetch the vtable pointer from inside the structure again, instead of re-using the cached value.

I think. I'm fairly certain that that's correct, and I think it's analogous to the mentioned C++ example, but I might be missing a nuance.

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 21:33 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

That is correct as far as I'm aware, but the critical point is that you are describing a mutation of the pointer to point at a different trait object altogether, whereas C++ placement new has the effect of destroying the pointee and constructing a new one in its place, without touching the pointer at all.

Another way of thinking about it is that, if we ignore the Mutex (and const correctness) for the sake of simplicity, Arc<&dyn Foo> is morally equivalent to std::shared_ptr<Foo*> (where Foo has a vtable), not to be confused with the more usual std::shared_ptr<Foo>. The mutation you are describing is equivalent to replacing the Foo* with an entirely different Foo* that happens to point at an entirely different Foo instance. The mutex is needed to do this properly, of course, but it just adds another layer of indirection without really changing anything.

You might wonder what happens if we instead have Arc<Mutex<dyn Foo>>, which is of course an entirely legitimate type to construct. That allows you to get as far as obtaining &mut dyn Foo, but you can't get any further from there. You would need to call something like std::mem::swap() to replace it with a different Foo. But you can't do that, because dyn Foo is unsized and std::mem::swap() has an implied T: Sized bound, so the compiler will reject the call. All other (safe) functions or methods for doing this will either have a similar bound (usually implied), or will take T by value (which can't accept an unsized type), because there is no reasonable way to guarantee that the new Foo is small enough to fit into the old Foo's allocation.

Provenance is a hard problem, so I'm probably missing something here

Posted Oct 8, 2025 12:42 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Responding to this specific example:

struct Foo { virtual void bar(); };

extern void do_something_with(Foo *);

void test(Foo *foo) {
foo->bar();
do_something_with(foo);
foo->bar();
}

Here the conversion of "foo" to pointer has to have happened before the possible free() in do_something_with(), which means that under Davis Herring's proposed semantics, the compiler is within it rights to assume that "foo" points to the same object throughout. So the vtable-pointer optimization is still allowed under these proposed semantics.

No longer experimental

Posted Oct 8, 2025 15:01 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Aria's feature is no longer an experiment. Rust adopted the provenance rules Aria had proposed, as amended.

There are details to square away, particularly if you've got scary pointer aliasing problems Rust currently doesn't say what happens and so the only safe option may be to forbid scary aliasing in your pointer code, whereas some day that'll get nailed down enough to specify exactly what's not permitted (it might well still mean you can't do whatever it is you're thinking of doing but it depends). But the overall provenance is just an accepted Rust API today.

That's why the link about "Experimental" provenance doesn't say Experimental at the far end and the APIs it discusses are ordinary stable APIs in Rust 1.84 (which is from January). Yes that word "Experimental" does occur on the page linked, because Rust is still adding (and sometimes removing) new experimental stuff, that's how these experiments were first available, but none of the work we're discussing here is labelled experimental now.

Rust code

Posted Oct 8, 2025 15:41 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

The Rust code snippet listed in this article doesn't compile because it's from a 2023 discussion about a then-proposed feature rather than what actually landed. So I've cleaned this up to actually compile with the actual Rust shipping API. Any mistakes resulting from my translation are of course entirely my fault, whereas the insight intended is Ralf's or maybe Mario's

LWN isn't great for pasting swathes of code, so here's a Compiler Explorer link:

https://rust.godbolt.org/z/1WqW5MhqP

Notably while the word "unsafe" never appears in the original snippet, to actually do this you do need this keyword once, specifically to dereference a pointer, and so if this is wrong (I'm genuinely not sure) that's where Rust points the finger, this code is wrong because we (unsafely) dereferenced something that was not, in fact, a valid pointer and the unsafe keyword signified our responsibility to ensure it was valid.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds