Canceling asynchronous Rust
[LWN subscriber-only content]
Welcome to LWN.net
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!
Asynchronous Rust code has what Rain Paharia calls a "universal cancellation
protocol
", meaning that any asynchronous code can be interrupted in the same
way. They claim
that this is both a useful feature when used deliberately, and a source of
errors when done by accident. They presented
about this problem at
RustConf 2025, offering a handful of techniques to avoid introducing bugs into
asynchronous Rust code.
Paharia started their talk (slides) with a question for the Rust programmers in the audience. Suppose that one wanted to use the tokio library's asynchronous queues (called channels) to read from a channel in a loop until the channel is closed, raising an error if the channel is ever empty for too long. Is this code the correct way to do it?
loop { match timeout(Duration::from_secs(5), rx.recv()).await { Ok(Ok(msg)) => process(msg), Ok(Err(_)) => return, Err(_) => println!("No messages for 5 seconds"), } }
As is typical of code shown on slides, this example is somewhat terse. In this case, timeout() returns an Err value if it hits the time limit, or an Ok value wrapping the return value from whatever operation is being subjected to a time limit — in this case, rx.recv(). That returns either an error (when the channel is closed unexpectedly) or an Ok value containing the message read from the channel. In a real application, the programmer would probably want to handle the channel closing more explicitly; but as far as example code goes, it demonstrates the basic idea of wrapping a read from a channel in a timeout.
The members of the audience were hesitant to answer, because anytime a speaker asks you if code is correct, it seems like they're about to point out a subtle problem. But, after a moment of tension, Paharia revealed that yes, this is completely correct code. What about the corresponding code for the other end of the channel? They put up another slide, and asked whether anyone thought this was the right way to write to a channel:
loop { let msg = next_message(); match timeout(Duration::from_secs(5), tx.send(msg)).await { Ok(Ok(_)) => println!("Sent successfully"), Ok(Err(_)) => return, Err(_) => println!("No space in queue for 5 seconds"), } }
This time, people were a bit more confident: no, that code has a bug. If the send times out (taking the Err(_) branch) or the channel is closed (the Ok(Err(_)) branch), the code drops msg, potentially losing data. This is a problem, Paharia said: because of the way asynchronous Rust is currently designed, it's far too easy to make a mistake like this by accident.
They wanted to emphasize that this wasn't to claim that asynchronous Rust is
bad. "I love async!
" They gave
a talk at RustConf 2023 about how it's a
great fit for writing correct POSIX signal-handling code, even. At their job at
Oxide, they use asynchronous Rust throughout the company's software
stack. Asynchronous Rust is great, so they've written a lot of it —
which is exactly why they've run into so
many problems around canceling asynchronous tasks, and why they want to share
ways to minimize those pitfalls.
![Rain Paharia [Rain Paharia]](https://static.lwn.net/images/2025/rain-paharia-rustconf-small.png)
Asynchronous Rust is built around the Future trait, which represents a computation that can be polled and suspended. When the programmer writes an asynchronous function, the compiler turns it into an enumeration representing the possible states of a state machine. That enumeration implements Future, which provides a common interface for driving state machines forward. async functions make it ergonomic to stick those little state machines together into bigger state machines using await.
Paharia likes this design, but says that it can come as a surprise to programmers moving to Rust from JavaScript or Go, where the equivalent language mechanism is implemented with green threads. In those languages, creating the equivalent of a Future (called a promise) starts executing the code right away in a separate thread, and the promise is just a handle for retrieving the eventual result. In Rust, creating a Future doesn't do anything — it just creates the state machine in memory. Nothing actually happens until the programmer calls the .poll() method.
Rust's design here is driven by the needs of embedded software, they explained. In the embedded world, there are no threads and no runtime, so it pretty much has to work like this. The fact that asynchronous tasks in Rust are just normal data is also what makes it easy to cancel them. If one wants to cancel an asynchronous function, one can just stop calling .poll() on it. There are lots of ways to do that. Paharia gave some examples of real bugs where they had written code that accidentally dropped (freed) a Future:
// An operation that is initially implemented in synchronous code some_operation(); // It returns a Result, which the linter warns about ignoring // But in this case we don't care, so silence the linter warning: let _ = some_operation(); // Later, it gets refactored to be written asynchronously let _ = some_async_operation(); // ... now the Future is being created and dropped without being run
At that point, they wanted to define two terms: "cancel safety" and "cancel correctness". Cancel safety refers to whether a Future can be safely dropped or not. The reader example above can be safely dropped — it will just stop reading from the queue. The writer example cannot — dropping it can also drop some data. The former is "cancel-safe" and the latter is not. Tokio's documentation has a section on whether their asynchronous API functions are cancel-safe or not, so programmers can usually figure this out by just looking at the local definition of a function.
Cancel correctness, on the other hand, is a global property of the whole program. A program is cancel-correct if it doesn't have any bugs related to Future cancellation. A cancel-unsafe function will not always lead to correctness problems. For example, if one is using some kind of acknowledgment and redelivery mechanism, it might not actually be a bug to drop a message like the example does.
Three things have to be true for there to be a cancel-correctness bug, Paharia said. A cancel-unsafe Future must exist, get canceled at some point, and that cancellation must violate a system property. Most approaches to eliminating cancel-correctness bugs work by tackling one of those three pillars. Unfortunately, no single technique is a complete solution in current Rust. They went on to share a handful of techniques that can all help to eliminate cancel-correctness bugs.
The first technique is to break up potentially dangerous operations into smaller pieces. The writer example from above could use the .reserve() method on a tokio queue to obtain access to a guaranteed slot for the message, before it actually sends it. This code won't drop msg if a timeout occurs:
loop { let msg = next_message(); loop { match timeout(Duration::from_secs(5), tx.reserve()).await { Ok(Ok(permit)) => { permit.send(msg); break; } Ok(Err(_)) => return, Err(_) => println!("No space for 5 seconds"), } } }
That works, but it still takes care to get right. Another technique would be to use APIs that record partial progress, so that the programmer can know what data has been processed and what has not. For example, tokio's Writer::write_all() method is not cancel-safe, but there is an alternative Writer::write_all_buf() that uses a cursor so the programmer can see where a write was interrupted and recover.
A final technique is to use threads to emulate the approach from Go and JavaScript. Tokio has a spawn() function that takes a Future, starts a new thread (or grabs one from a thread pool), and then runs the Future to completion.
"This sucks!
" Paharia summarized. All of these techniques can help, but
none of them is a full, systemic solution. They went so far as to call this the
"least Rusty part of Rust
" because the language doesn't have any
mechanisms to prevent this kind of bug — yet.
There is hope for the future. There are three different proposals for how Rust could rule out cancel-correctness bugs: Async drop, which would let asynchronous code run a cleanup function on cancellation; Unforgettable types, which would require all Future values to be used; and the closely related Undroppable types.
In the future, when one of those proposals has been decided on and adopted, this entire class of bugs could go away. Until then, Paharia recommends finding ways to make asynchronous code cancel-safe, design APIs that make it easy to do the right thing, and use techniques like spawned threads to ensure that critical Future values can't be dropped or forgotten by accident. There wasn't time for everything they wanted to cover in the talk, so more detail is available in the document they wrote for Oxide on the topic.
Posted Sep 24, 2025 16:12 UTC (Wed)
by sturmflut (subscriber, #38256)
[Link] (5 responses)
Posted Sep 24, 2025 16:31 UTC (Wed)
by mb (subscriber, #50428)
[Link]
Rust doesn't force you to nest patterns like in the above code, though.
Posted Sep 24, 2025 16:56 UTC (Wed)
by geofft (subscriber, #59789)
[Link]
The real thing here is that it would be nice to give names to the two variants, maybe even just at one level, like
Maybe it'd be nice to have the ability to declare an alias for a type that has new names for the variants but is really just the same type, or something. I think there's also some work in progress to abstract the concept of Result into traits which might end up effectively doing this. That page also reminds me that there is an existing ControlFlow type that is effectively equivalent to Result but might be more suitable here (or might not, I'm not sure!).
These particular lines of code would also read better if it were matching on single level of encapsulation, something like
Posted Sep 24, 2025 17:43 UTC (Wed)
by Altan (subscriber, #153331)
[Link]
Posted Sep 24, 2025 20:28 UTC (Wed)
by jbills (subscriber, #161176)
[Link]
Posted Sep 24, 2025 22:18 UTC (Wed)
by sunshowers (guest, #170655)
[Link]
I think the Ok(Ok(()) stuff is quite readable once you're used to it -- nested pattern matching is something to be used sparingly, but is reasonable to avoid added nesting levels (a match statement adds 2 nesting levels). It looks pretty reasonable with syntax highlighting.
Posted Sep 24, 2025 18:02 UTC (Wed)
by epa (subscriber, #39769)
[Link] (7 responses)
Maybe the problem is that Future doesn't get special treatment in the type system. The await method returns the underlying value, but you don't have to await. Perhaps there should be a special syntax for "do not await" -- which obviously just returns the Future itself -- and a linter or compiler warning could ensure that you have called one or the other. If the type is known at compile time to be a Future of some kind, the let _ syntax would give a compile-time warning, unless you explicitly call .do_not_await.
Posted Sep 24, 2025 18:45 UTC (Wed)
by daroc (editor, #160859)
[Link]
That is somewhat similar to what the Undroppable Types proposal would do, only not specific to Futures. With that proposal, any type marked as being Undroppable would be a compiler error to discard, including via let _ = .... The only ways to get rid of it would be by consuming it (in this case, doing .await), handing it off to another function, or by destructuring it.
Posted Sep 24, 2025 20:20 UTC (Wed)
by duelafn (subscriber, #52974)
[Link]
That particular situation has a (disabled by default) lint: clippy::let_underscore_untyped. Gets a warning "consider adding a type annotation" which goes away if you do this: Which would catch the async problem. I go one step further and make sure the Ok result type is also specified (
Posted Sep 24, 2025 21:25 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
We already have a lint warning for dropping something that shouldn't be dropped, namely #[must_use], which is present on Futures. Clippy will warn you if you write an expression statement that implicitly drops a Future (and you can configure this lint to be deny-by-default if desired, so that it produces a hard compile error instead of a warning). The let _ syntax is the idiomatic way to silence that lint, so you're effectively proposing a second "no, I really mean it" syntax, to doubly silence the warning after it has already been silenced. I'm not sure why that would be any more effective than the existing lint is in isolation. There is, perhaps, some room for marginal improvement, since #[must_use] is quite liberal about what counts as a "use," and a Future-specific warning could be more exacting. But this seems to be well into diminishing returns relative to the increased language complexity.
If we think that #[must_use] is not sufficient, then I must concur with daroc: The logical next step is undroppable types, not another layer of "no, I really mean it" syntax. Undroppable types let us make hard assertions that you *must* dispose of an instance in the appropriate way.
We might also consider unforgettable (a/k/a unleakable) types, but I think it is much less common to accidentally leak an object than to accidentally drop an object (you either have to construct a refcounted cycle, or use an interface that "obviously" leaks, such as std::mem::forget or Box::leak). Those are only needed if you want to prohibit leaking objects as a safety invariant. In other words, you need them for cases where leaking an object could produce UB. That's pretty unusual, and the only obvious example I can think of is the original (now known to be unsound) API for std::thread::scope. But the new API does not need unleakables, and it was a fairly minor change, so this is not a great motivating use case for them.
Posted Sep 25, 2025 4:52 UTC (Thu)
by mb (subscriber, #50428)
[Link]
async fn a() -> Result<x, y> {
async fn b() {
It would be good if the first thing caused a warning by default. It's a common mistake.
Posted Sep 25, 2025 6:08 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Sep 25, 2025 8:11 UTC (Thu)
by proski (subscriber, #104)
[Link] (1 responses)
Posted Sep 25, 2025 8:57 UTC (Thu)
by taladar (subscriber, #68407)
[Link]
Posted Sep 25, 2025 9:23 UTC (Thu)
by kleptog (subscriber, #1183)
[Link]
We ran into something similar when migrating old Tornado code to the new asyncio library in Python. When Python did async code via generators/coroutines, the semantics of yield are that generators are run until the first yield. Whereas async methods don't run at all until polled, like Rust. This almost never matters, but it turned out there were a handful of places that expected some of the side-effects to happen straight away, and they broke.
Posted Sep 25, 2025 9:32 UTC (Thu)
by Fowl (subscriber, #65667)
[Link] (1 responses)
If a sending a message times out or is cancelled, isn't it expected that it is "lost" without some retry or other logic? How is this an async specific problem? Excuse my guesstimated Rust syntax for a blocking version of the same thing:
Is the bug that only for timeout errors we print a message and continue, whereas other errors return to the caller?
Posted Sep 25, 2025 9:50 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
It's the equivalent of terminating a process via kill, only being done at scope ends; if your code is written to cope with that ("cancel-safe" for Rust futures), nothing bad happens. If your code has done something like "take a work item from a distributed work queue", and expects to push that work item back onto the queue if it can't finish it, you get surprising behaviour.
Rust syntax is only getting worse
Rust syntax is only getting worse
Many languages support this.
But it's pretty common, because it actually is quite readable, once you understand pattern matching.
It's not really worse than, say, a double pointer in C, where the outer pointer can be NULL or point to some inner pointer, which itself can be NULL. You have to check for both of those, and they can mean different things. (Though you can counter that defending syntax by saying "what about double pointers in C" is not much of a defense.)
Rust syntax is only getting worse
loop {
match timeout(Duration::from_secs(5), rx.recv()).await {
Happened(Ok(msg)) => process(msg),
Happened(Err(_)) => return,
Timeout(_) => println!("No messages for 5 seconds"),
}
}
but the tradeoff is that there's a good bit of functionality that works with the pre-existing Result type (Ok/Error) that people are used to using, and coming up with a new type for this purpose loses all of that.
loop {
match timeout(Duration::from_secs(5), rx.recv()).await {
Ok(msg) => process(msg),
Err(_) => return,
Timeout(_) => println!("No messages for 5 seconds"),
}
}
but now you're inventing a brand new type with three variants that has none of the existing behavior of the existing types (e.g., how do you compose it with a normal Result? what happens if you use the question-mark syntax on it?), and you're also constraining that second argument to timeout to be an async function that returns a Result, so you have those two variants. The advantage of the current design is that timeout wraps an async function that returns anything. Losing that flexibility would make async Rust code both harder to read and harder to write.
Rust syntax is only getting worse
Rust syntax is only getting worse
Rust syntax is only getting worse
Discarding return value, and ignoring return type
Discarding return value, and ignoring return type
Discarding return value, and ignoring return type
let _ = some_operation();
let _: Result<_,_> = some_operation();
Result<u64,_>
, but the lint does not enforce that.Discarding return value, and ignoring return type
Discarding return value, and ignoring return type
...
}
// Ignore the Result of a().
let _ = a(); // What I wrote
let _ = a().await; // What I meant
}
Discarding return value, and ignoring return type
Discarding return value, and ignoring return type
Discarding return value, and ignoring return type
Funny parallel with Python
Perhaps I'm missing some context here, but I don't quite understand the "losing data" part.
Losing Data? Async specific?
loop {
let msg = next_message();
match tx.send_blocking_with_timeout(msg, Duration::from_secs(5))){
Ok(_) => println!("Sent successfully"),
Err(Timeout) => println!("No space in queue for 5 seconds"),
Err(_) => return,
}
}
The root of the problem is that cancellation can happen unintentionally, and there isn't an exact blocking equivalent. You write code that looks "normal", but because you drop a future without polling it to completion (either via .await, or via more complex means), things get cancelled when you didn't expect them to be cancelled.
Losing Data? Async specific?