|
|
Log in / Subscribe / Register

READ_ONCE(), WRITE_ONCE(), but not for Rust

By Jonathan Corbet
January 9, 2026
The READ_ONCE() and WRITE_ONCE() macros are heavily used within the kernel; there are nearly 8,000 call sites for READ_ONCE(). They are key to the implementation of many lockless algorithms and can be necessary for some types of device-memory access. So one might think that, as the amount of Rust code in the kernel increases, there would be a place for Rust versions of these macros as well. The truth of the matter, though, is that the Rust community seems to want to take a different approach to concurrent data access.

An understanding of READ_ONCE() and WRITE_ONCE() is important for kernel developers who will be dealing with any sort of concurrent access to data. So, naturally, they are almost entirely absent from the kernel's documentation. A description of sorts can be found at the top of include/asm-generic/rwonce.h:

Prevent the compiler from merging or refetching reads or writes. The compiler is also forbidden from reordering successive instances of READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some particular ordering. One way to make the compiler aware of ordering is to put the two invocations of READ_ONCE or WRITE_ONCE in different C statements.

In other words, a READ_ONCE() call will force the compiler to read from the indicated location exactly one time, with no optimization tricks that would cause the read to be either elided or repeated; WRITE_ONCE() will force a write under those terms. They will also ensure that the access is atomic; if one task reads a location with READ_ONCE() while another is writing that location, the read will return the value as it existed either before or after the write, but not some random combination of the two. These macros, other than as described above, impose no ordering constraints on the compiler or the CPU, making them different from macros like smp_load_acquire(), which have stronger ordering requirements.

The READ_ONCE() and WRITE_ONCE() macros were added for the 3.18 release in 2014. WRITE_ONCE() was initially called ASSIGN_ONCE(), but that name was changed during the 3.19 development cycle.

On the last day of 2025, Alice Ryhl posted a patch series adding implementations of READ_ONCE() and WRITE_ONCE() for Rust. There are places in the code, she said, where volatile reads could be replaced with these calls, once they were available; among other changes, the series changed access to the struct file f_flags field to use READ_ONCE(). The implementation of these macros involves a bunch of Rust macro magic, but in the end they come down to calls to the Rust read_volatile() and write_volatile() functions.

Some of the other kernel Rust developers objected to this change, though. Gary Guo said that he would rather not expose READ_ONCE() and WRITE_ONCE() and suggested using relaxed operations from the Rust Atomic crate the kernel's Atomic module instead. Boqun Feng expanded on the objection:

The problem of READ_ONCE() and WRITE_ONCE() is that the semantics is complicated. Sometimes they are used for atomicity, sometimes they are used for preventing data race. So yes, we are using LKMM [the Linux kernel memory model] in Rust as well, but whenever possible, we need to clarify the intention of the API, using Atomic::from_ptr().load(Relaxed) helps on that front.

IMO, READ_ONCE()/WRITE_ONCE() is like a "band aid" solution to a few problems, having it would prevent us from developing a more clear view for concurrent programming.

In other words, using the Atomic crate allows developers to specify more precisely which guarantees an operation needs, making the expectations (and requirements) of the code more clear. This point of view would appear to have won out, and Ryhl has stopped pushing for this addition to the kernel's Rust code — for now, at least.

There are a couple of interesting implications from this outcome, should it hold. The first of those is that, as Rust code reaches more deeply into the core kernel, its code for concurrent access to shared data will look significantly different from the equivalent C code, even though the code on both sides may be working with the same data. Understanding lockless data access is challenging enough when dealing with one API; developers may now have to understand two APIs, which will not make the task easier.

Meanwhile, this discussion is drawing some attention to code on the C side as well. As Feng pointed out, there is still C code in the kernel that assumes a plain write will be atomic in many situations, even though the C standard explicitly says otherwise. Peter Zijlstra answered that all such code should be updated to use WRITE_ONCE() properly. Simply finding that code may be a challenge (though KCSAN can help); updating it all may take a while. The conversation also identified a place in the (C) high-resolution-timer code that is missing a needed READ_ONCE() call. This is another example of the Rust work leading to improvements in the C code.

In past discussions on the design of Rust abstractions, there has been resistance to the creation of Rust interfaces that look substantially different from their C counterparts; see this 2024 article, for example. If the Rust developers come up with a better design for an interface, the thinking went, the C side should be improved to match this new design. If one accepts the idea that the Rust approach to READ_ONCE() and WRITE_ONCE() is better than the original, then one might conclude that a similar process should be followed here. Changing thousands of low-level concurrency primitives to specify more precise semantics would not be a task for the faint of heart, though. This may end up being a case where code in the two languages just does things differently.

Index entries for this article
KernelDevelopment tools/Rust
KernelLockless algorithms


to post comments

read/write volatile

Posted Jan 9, 2026 17:19 UTC (Fri) by mb (subscriber, #50428) [Link] (9 responses)

> in end they come down to calls to the Rust read_volatile() and write_volatile() functions.

I experimented with read/write volatile on AVR, where it would be greatly beneficial to use them for certain inter-thread (interrupt) communication.
Just in the same way every AVR C program uses volatile "atomic" variables to do this.
https://github.com/mbuesch/avr-atomic

However, I came to the conclusion that using read/write volatile for inter-thread (interrupt) communication is unsound in Rust.

AVR hardware is not capable of doing a non-atomic read/write for byte sized objects. So the hardware is fine for all of the relevant cases.
But the read/write volatile documentation was pretty clear to me that the Rust virtual machine considers concurrent volatile accesses unsound and is free to optimize it to bits.

https://doc.rust-lang.org/std/ptr/fn.read_volatile.html

> an operation is volatile has no bearing whatsoever on questions involving concurrent accesses from multiple threads
> Volatile accesses behave exactly like non-atomic accesses in that regard.

> This access is still not considered atomic, and as such it cannot be used for inter-thread synchronization.

My implementation worked fine and the generated assembly code was perfect.
However, I changed it to a less optimal inline asm implementation just because I think the Rust documentation considers concurrent read/write volatile without additional synchronization to be unsound.

(Yes, I know there is AtomicXX and no it's not efficient for reasons... And yes, I should fix the compiler's atomic intrinsics instead of working around the problem... I know :)

read/write volatile

Posted Jan 9, 2026 19:20 UTC (Fri) by josh (subscriber, #17465) [Link] (8 responses)

> Yes, I know there is AtomicXX and no it's not efficient for reasons

Relaxed atomics are effectively compiler-barrier atomics, and *shouldn't* have any runtime overhead. Are you encountering cases where there's more inefficiency than that?

read/write volatile

Posted Jan 9, 2026 19:54 UTC (Fri) by mb (subscriber, #50428) [Link] (1 responses)

Yes. Should not.
As I said, it's an LLVM problem on AVR. Which is not a stable tier.
Atomic always use heavy syncing on AVR in the current compiler.

But that was not my point.

My point was that I think volatile accesses are not sound in Rust for inter thread communication.

read/write volatile

Posted Jan 9, 2026 22:10 UTC (Fri) by josh (subscriber, #17465) [Link]

> As I said, it's an LLVM problem on AVR. Which is not a stable tier.
> Atomic always use heavy syncing on AVR in the current compiler.

Ah, got it, thank you. Hopefully that can be fixed.

> My point was that I think volatile accesses are not sound in Rust for inter thread communication.

Right, I believe that's correct.

read/write volatile

Posted Jan 10, 2026 9:33 UTC (Sat) by plugwash (subscriber, #29694) [Link] (5 responses)

> Relaxed atomics are effectively compiler-barrier atomics, and *shouldn't* have any runtime overhead.

The situation is a little more subtle than that.

Relaxed atomics on a given memory location, must behave as-if they had a well-defined order (though this order may differ from operations on other memory locations, unless fences are used), and this must apply to the whole set of atomic operations. You may only be using load and store on a particular location, but the compiler doesn't know that. Other code might be performing other atomic operations on that location.

My understanding is that this effectively means that if you implement read-modify-write operations by using a global lock, you must also implement plain write operations using that same global lock.

read/write volatile

Posted Jan 10, 2026 20:13 UTC (Sat) by comex (subscriber, #71521) [Link] (1 responses)

That’s correct, but not actually applicable to Rust, because Rust doesn’t allow atomics to be implemented with a global lock, unlike C++ and C. Instead, on targets that don’t natively support atomics, Rust just makes the atomic APIs unavailable. It’s one of those tradeoffs where Rust is willing to accept slightly less portability in exchange for a nicer programming model.

read/write volatile

Posted Jan 11, 2026 17:49 UTC (Sun) by garyguo (subscriber, #173367) [Link]

Note that the kernel unconditionally provide 64-bit atomic operation to all archs, and in archs that doesn't support native atomics on 64-bit integers, they are implemented using locks. This means that using `READ_ONCE()` on u64 for atomic ops is incorrect (it needs to be a `atomic64_read()`).

`READ_ONCE` on u64 might still be useful is you just want to read the value in a data-race-free way and you don't care about atomicity (i.e. allow the read to tear). However this is yet another reason I don't want people to just use `READ_ONCE()` for atomic ops on Rust side -- it's just not intuitive which semantics is desired.

read/write volatile

Posted Jan 10, 2026 20:44 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (2 responses)

> Relaxed atomics on a given memory location, must behave as-if they had a well-defined order (though this order may differ from operations on other memory locations, unless fences are used), and this must apply to the whole set of atomic operations. You may only be using load and store on a particular location, but the compiler doesn't know that. Other code might be performing other atomic operations on that location.

That is true but misleading. Your parenthetical negates all of the guarantees that are actually expensive to implement, at least on x86.

Noting for the record: A fence must be on the same thread as the relaxed atomic in order to restrict it, and there are several other requirements as well. I refer the curious reader to https://en.cppreference.com/w/cpp/atomic/atomic_thread_fe... and related documentation for more information.

> My understanding is that this effectively means that if you implement read-modify-write operations by using a global lock, you must also implement plain write operations using that same global lock.

If you take a global lock, then there are two different memory locations in play (lock and payload), so your parenthetical above already tells us that the lock is ineffective (at protecting against against un-fenced relaxed atomics on either the lock or the payload).

Or perhaps I have misunderstood what you mean by "implement read-modify-write operations by using a global lock." I would normally understand a "read-modify-write operation" to be a hardware instruction (or sequence of instructions), which is not our problem to "implement" in the first place. If you mean "emulate," then the problem we run into is that emulators do not emulate the C abstract machine. They emulate some real hardware like x86, or virtual hardware like the JVM. Those platforms have their own, more specific memory models than the C abstract machine, and the compiler backend must necessarily take advantage of those memory models to emit correct assembly/machine code. So our emulator is not permitted to stop at just taking locks for relaxed atomics - it doesn't necessarily know which stores or loads originated as relaxed atomics in the first place, and therefore may have to take locks for all loads and stores whatsoever. Of course, it would be preferable to implement these operations lock-free if it is possible to do so.

read/write volatile

Posted Jan 10, 2026 21:12 UTC (Sat) by willmo (subscriber, #82093) [Link] (1 responses)

I think it’s a third meaning of “implement”, as implied by comex’s adjacent comment: when the target hardware doesn’t natively support the desired C/C++ atomic operation (e.g. it lacks even an appropriate CAS to implement a RMW), the compiler must compile it to use a global lock. This is certainly not applicable to common data types on modern x86, ARM, etc.

> If you take a global lock, then there are two different memory locations in play (lock and payload), so your parenthetical above already tells us that the lock is ineffective (at protecting against against un-fenced relaxed atomics on either the lock or the payload).

In this case, the compiler would need to compile un-fenced relaxed atomics so that they take the global lock. That’s what plugwash meant.

read/write volatile

Posted Jan 12, 2026 16:36 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

Well, sure, if the platform leaves you up the river with no atomics, then there's only so much you can do. You're effectively in the business of emulating a (virtual) platform with atomics (the C abstract machine) on a (physical) platform that doesn't provide them. And as I said, emulation is slow.

overly strict semantics

Posted Jan 9, 2026 17:42 UTC (Fri) by bertschingert (subscriber, #160729) [Link] (11 responses)

Does anyone have a sense of how frequently READ/WRITE_ONCE() are used as a needlessly strict substitute for relaxed atomic reads/writes, versus situations where the additional strictness is actually required?

I wasn't around when they were implemented, so I'm speculating here, but I get the sense that READ/WRITE_ONCE() were implemented as a volatile cast not because volatility gives the optimal or desired semantics in most situations, but because that was the best tool available prior to the C11 atomics model.

While it may be prohibitively difficult, it does seem like changing the C side to use relaxed atomics (when correct) would be the right thing to do. But I don't really know how many uses actually require the additional "volatility" guarantee provided by READ/WRITE_ONCE().

overly strict semantics

Posted Jan 10, 2026 1:39 UTC (Sat) by wahern (subscriber, #37304) [Link] (8 responses)

IIUC the issue is that at least on some architectures atomic interfaces provided by both the compiler and ISA are unnecessarily costly in some situations, at least in the estimation of those who wrote and use rwonce.h. The comments in rwonce.h suggest that they're not guaranteeing cross-CPU atomicity, and relying on real-world behavior wrt asynchronous operations (e.g. across interrupts) on the same CPU that no compiler or memory model provides explicit support for today.

Also, C11 atomics is not the origin point for atomic intrinsics[1] or a meaningful memory model in either GCC or Linux. It's not the final or even 100% comprehensive model, either. I think the push for a more formal memory model in C, C++, and the compilers gives a false impression such a thing was completely non-existent beforehand and that things are satisfactory today.

[1] GCC had at least two sets of intrinsics before supporting C11 atomics, and of course projects like the kernel had their own set that work just as well today as they did before the latest set of builtins.

overly strict semantics

Posted Jan 10, 2026 8:54 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

Hang on, at the ISA level there is no notion of an "atomic" load or store, there's just loads and stores. Atomic - like the lock prefix on x86 - only makes sense for operations that are doing multiple things within the same instruction: load, increment, add - atomic increment.

The "atomicity" guarantees that READ_ONCE() and WRITE_ONCE() provide only come in at the compiler level. The compiler will coalesce loads and stores or emit multiple loads as a substitute for spilling registers without some notion of atomicity at the language level.

The "unnecessarily costly" part of READ_ONCE() and WRITE_ONCE() is that they don't distinguish between atomicity and ordering - they also specify strict ordering, but only to the compiler, not the hardware (they don't emit memory barriers).

Rust's atomic load/store really are just better, because they separate out ordering from atomicity and make ordering explicit. And instead of sprinkling around separate memory barrier calls, which may or may not be commented, they're attached to the operation that needs them - which is good for readability.

overly strict semantics

Posted Jan 10, 2026 17:45 UTC (Sat) by excors (subscriber, #95769) [Link]

> at the ISA level there is no notion of an "atomic" load or store, there's just loads and stores.

I'm not certain what you mean by that. E.g. the ARM ARM defines "single-copy atomicity" which is important even in single-processor code: if an interrupt occurs during a STP (Store Pair) instruction, whose operation is defined as a single assignment to memory, the interrupt handler may observe the first half of memory was updated and the second half wasn't, because STP is treated as two separate atomic writes. (The STP instruction will be restarted after the interrupt returns, so it'll complete eventually). So I think the ISA does define the notion of atomic loads and stores, even before getting to the more complex operations.

(GCC will happily use STP for an int64_t assignment, making it non-atomic, unless you add 'volatile' and then it'll use a single 64-bit STR (which is single-copy atomic).)

overly strict semantics

Posted Jan 10, 2026 20:18 UTC (Sat) by comex (subscriber, #71521) [Link]

Strictly speaking, misaligned loads are not atomic on x86, and SIMD loads may or may not be.

overly strict semantics

Posted Jan 10, 2026 16:55 UTC (Sat) by joib (subscriber, #8541) [Link] (4 responses)

> Also, C11 atomics is not the origin point for atomic intrinsics[1] or a meaningful memory model in either GCC or Linux. It's not the final or even 100% comprehensive model, either. I think the push for a more formal memory model in C, C++, and the compilers gives a false impression such a thing was completely non-existent beforehand and that things are satisfactory today.

I wonder, if the C++/C11 memory models and atomics were to be developed today, how different would they look, considering the amount of knowledge the world has gained since then and now?

Certainly there were parts of the C/C++11 models that were, ahem, less than successful, like the consume memory ordering, but otherwise, would there be a place for doing it substantially better and different in general?

overly strict semantics

Posted Jan 11, 2026 20:33 UTC (Sun) by pbonzini (subscriber, #60935) [Link] (1 responses)

One change I would remove is seq_cst stores and memory operations; and in their place I would rather have operations that behave as if they were enclosed by seq_cst fences on both sides, like for example Linux's atomic_add_return. The difference is that a RMW seq_cst operation can be reordered after a subsequent relaxed load, but that's not the case for LKMM and atomic_add_return. So this would actually make semantics *stricter*, not looser.

I don't have high hopes that this would be accepted now, but maybe it would be since "almost nobody will need anything but sequential consistent variables" has been shown wrong.

The other thing that still hasn't been fully formalized is out-of-thin-air values. Everybody agrees that they won't happen but strictly speaking they aren't prohibited, or weren't last time I checked.

overly strict semantics

Posted Jan 12, 2026 6:06 UTC (Mon) by riking (subscriber, #95706) [Link]

The special snowflake global ordering of only seqcst operations (but not seqcst fences) has got to go for sure, and I agree that "operations fused with seqcst fences" would be better.

overly strict semantics

Posted Jan 15, 2026 0:18 UTC (Thu) by milesrout (subscriber, #126894) [Link] (1 responses)

The worst part of the design is _Atomic/std::atomic. Atomic operations are atomic *operations, the operations are atomic. There is nothing inherently atomic or non-atomic about the operands themselves. The operator overloading is also a plain bad idea.

overly strict semantics

Posted Jan 15, 2026 15:40 UTC (Thu) by bertschingert (subscriber, #160729) [Link]

The GCC atomic intrinsics seem to get this right. I'm not sure if there's a portable way to do atomic operations on regular int types, though.

OTOH, what I like about the Rust (and C/C++11?) atomics is that the type system prevents accidentally introducing data races because you can't do a non-atomic load/store to an atomic type -- at least without unsafe code. Given that the article mentions there are cases in C where READ_ONCE() and WRITE_ONCE() should have been used, but weren't, this seems to be a real risk.

overly strict semantics

Posted Jan 10, 2026 15:42 UTC (Sat) by bjackman (subscriber, #109548) [Link] (1 responses)

I think one of the most common usecases for {READ,WRITE}_ONCE is where you have concurrency without parallelism. E.g. when sharing CPU-local data between a task and an IRQ.

IIUC C11's relaxed ordering is too weak for that, but any of the other C11 orderings are likely to be unnecessarily strict, i.e. they might force the use of special (costly) CPU instructions where normal reads and writes are already safe enough.

overly strict semantics

Posted Jan 11, 2026 17:57 UTC (Sun) by garyguo (subscriber, #173367) [Link]

I think in C11 ordering, a relaxed op is too weak, but a relaxed op + an atomic signal fence (which is usually just a compiler barrier) is sufficient. Alternatively, a volatile relaxed op should also be sufficient.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds