|
|
Log in / Subscribe / Register

McKenney: So You Want to Rust the Linux Kernel?

Paul McKenney has started a blog series on Rust for the Linux kernel. He has posted six of a planned 11 articles, though several are labeled as "under construction".
This series focuses mostly on use cases and opportunities, rather than on any non-trivial solutions. Please note that I am not in any way attempting to dictate or limit Rust's level of ambition. I am instead noting the memory-model consequences of a few potential levels of ambition, ranging from "portions of a few drivers", "a few drivers", "some core code" and up to and including "the entire kernel". Greater levels of ambition will require greater willingness to accommodate a wider variety of LKMM [Linux-kernel memory model] requirements.

[...] These blog posts will therefore present approaches ranging upwards from trivial workarounds. But be warned that some of the high-quality approaches require profound reworking of compiler backends that have thus far failed to spark joy in the hearts of compiler writers. In addition, Rust enjoys considerable use outside of the Linux kernel, for example, as something into which to rewrite inefficient Python scripts. (A megawatt here, a megawatt there, and pretty soon you are talking about real power consumption!) Therefore, there are probably sharp limits beyond which the core Rust developers are unwilling to go.



to post comments

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 6:47 UTC (Mon) by scientes (guest, #83068) [Link] (10 responses)

Maybe instead of a memory model, they should have actual IPC, like channels in Go (which rust has)? That way the APIs can be manageable and possible to secure (and drivers could even run in user-space, but that is something entirely differn't).

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 7:51 UTC (Mon) by LtWorf (subscriber, #124958) [Link] (6 responses)

Isn't that what makes microkernels slow?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 9:10 UTC (Mon) by ehiggs (guest, #90713) [Link] (1 responses)

If I understand correctly the alleged slowness in microkernels comes from interprocess communication. However, I think the memory model and what scientes is referring to is std::sync::mpsc[1] or crossbeam::channel[2] which are in-process channel primitives to synchronize memory.

That said, I think the functionality of these primitives depends on having a stable memory model to build on, so I'm not sure hand waving it away saying 'message passing' works here.

[1] https://doc.rust-lang.org/std/sync/mpsc/
[2] https://docs.rs/crossbeam/0.7.3/crossbeam/channel/index.html

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 11:40 UTC (Mon) by ehiggs (guest, #90713) [Link]

Oh jeeze, I reread scientes post and they explicitly say IPC. How do I delete pre-coffee posts?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 7, 2021 6:50 UTC (Thu) by ncm (guest, #165) [Link] (3 responses)

Some have said that what made microkernels slow was moral depravity. I don't know any way to prove them wrong.

But, is MacOSix slow?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 7, 2021 9:43 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (2 responses)

AFAIK it's not a microkernel, so how it performs doesn't really matter.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 9, 2021 4:43 UTC (Sat) by ncm (guest, #165) [Link] (1 responses)

Last I heard, MacOSix was made out of a Mach microkernel running a FreeBSD-based personality as a process under the Mach microkernel.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 9, 2021 6:37 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

It started as a microkernel, but ended up pretty much like NT with lots of stuff inside the kernel (filesystems, graphics, network, etc.)

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 9:37 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (2 responses)

The channels in Go or communicating serial processes (CSP) as they were called in the original paper by Tony Hoare are problematic for performance. Ada initially relied exclusively on them, but quickly gained mutexes when it was realised in eighties that the hope that the compiler could optimise CSP patterns was way too optimistic. Almost 40 years later situation has not changed much. Even in Go if one wants to max the performance one better use mutexes or other low-level synchronisation methods. So channels are just not suitable for kernel.

And even when performance is not critical channels at least as implemented in Go have numerous problems. Proper cancelation is hard. Priority message delivery is not supported. From anecdotal experience even in Go using single untyped message queue per goroutine modelling what Erlang leads to more maintainable code than using multiple typed channels and select.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 5, 2021 15:16 UTC (Tue) by sytoka (guest, #38525) [Link] (1 responses)

And concerning Erlang, it is also a message-based communication model and the Erlang language has been used by Ericson routers for a long time?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 5, 2021 19:52 UTC (Tue) by ibukanov (subscriber, #3942) [Link]

Message passing as implement in Erlang is very different from Go CSP. Each Erlang thread has single message queue associated with it. There is no notion of multiple channel or select statements. Erlang supports message priorities. This architecture is similar to message passing API in various OS and runtimes. They are known to work. CSP on other hand has been known to be problematic for ages.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 7:41 UTC (Mon) by dvdeug (subscriber, #10998) [Link] (1 responses)

It seems like a lot of hand-grenade juggling is going on here. Take "At the assembly language level on many weakly ordered architectures, a conditional branch acts as a very weak, very cheap, but very useful memory-barrier instruction. It orders any load whose return value feeds into the condition codes before all stores that execute after the branch instruction completes, whether the branch is taken or not. ARMv8 also has a conditional-move instruction (CSEL) that provides similar ordering."

Making C's "if" into a memory-barrier statement is the opposite of being explicit. It's almost certainly pessimizing some part of the code if the compiler has to treat every "if" statement as a barrier to certain optimizations. No matter what Rust does, wouldn't it better to actually tell the C compiler what you're doing there?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 9:00 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

Yes, and discussions on this very topic are in progress. See https://lwn.net/Articles/860037/ for a summary as of last June.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 4, 2021 15:28 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

At this point, they are all supposed to be marked "under construction". I just now fixed the ones missing this disclaimer, so thank you for calling this to my attention!

volatile

Posted Oct 4, 2021 18:50 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (5 responses)

So, Rust actually ships, today, what you'd actually want, a pair of generic intrinsics for volatile reads and writes. Provide a suitably aligned pointer and you can read, or write, any implemented data type. Rust promises this read (or write) doesn't tear and won't get re-ordered, duplicated, optimised out etc. since performing the read (or write) has side effects invisible to the compiler (except that ZSTs can be elided because, fancy as MMIO is, reading or writing *nothing* does not have side effects). On most systems, of course, the intrinsic evaluates to a single machine code instruction.

This is roughly what Paul and co. have asked for in P1382R1 for C++ which makes sense because, as I said, it's what you'd actually want.

But what jumps out at me is that although Rust provides this intrinsic, which is exactly what you'd want here, Paul apparently proposes that what Rust ought to do instead is wrap the Linux kernel features that try to coerce a C compiler into doing what you want despite the C language standard not promising anything useful. Why? Is it because Paul didn't know these intrinsics already exist?

volatile

Posted Oct 4, 2021 19:52 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

It is because some architectures' implementations of Linux-kernel READ_ONCE() and WRITE_ONCE() are required to do more than just a volatile access, and in some cases this "more" is controlled by a Linux-kernel Kconfig option. Yes, you could duplicate this logic in Rust code, but it might be simpler to just rely on the existing LInux-kernel code, especially to begin with.

Developer's and maintainer's choice, though! ;-)

volatile

Posted Oct 7, 2021 10:45 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Can you say a bit more, or a bit more concretely, than "some architectures" and "some cases" where this cheap-looking feature might actually be arbitrarily more complicated and need wrapping?

volatile

Posted Oct 7, 2021 16:06 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

DEC Alpha needs a full memory-barrier instruction after the load emitted by READ_ONCE().

Itanium needs a READ_ONCE() to emit an acquire-load instruction and a WRITE_ONCE() to emit a store-release instruction, but the C compiler takes care of that for us. Plus it is not obvious that Rust would ever need to care about this particular case.

ARMv8 needs a READ_ONCE() to emit an acquire-load instruction, but only in kernel builds that enable LTO, that is, builds that have CONFIG_LTO or similar enabled. (This choice stems from concerns that LTO breaks address, data, and/or control dependencies.)

So not massively complex, but something that needs to be kept track of.

volatile

Posted Oct 8, 2021 19:24 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

So, if I'm not misunderstanding you need acquire-release semantics (you don't need to ask for them on x86 because x86 always has acquire-release semantics), that's what all of these cases seem to be doing.

Because x86 doesn't care, I don't actually know and it wouldn't surprise me if the existing volatile intrinsics already have acquire-release on every platform. But even if not it does still feel like this makes more sense as a (new) intrinsic than trying to call into C every time.

volatile

Posted Oct 8, 2021 20:09 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Why not take a look at the actual Rust documentation and implementation to see what the Rust volatile intrinsics actually guarantee across architectures?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 6, 2021 21:52 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (8 responses)

And there is now a prototype of all 11 articles. Thoughts?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 7, 2021 6:36 UTC (Thu) by calumapplepie (guest, #143655) [Link] (1 responses)

- Compiler writers hate ADDRESS/DATA dependencies
The fourth solution option is "Wait for compiler backends to learn about control dependencies". it looks like you partially messed up the copy-paste: the rest of the option makes sense, but not that bit.

Similarly, the last sentence advises that info about control dependencies can be found in a chapter about data dependencies.

That's all I got. Can't say I made sense of the whole thing, and I'm falling asleep (not because its boring, but because its 2:30 am), but I (at least in theory) know a bit more about memory. I just hope it doesn't get overrwritten...

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 7, 2021 16:00 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Good eyes, especially considering local time when you read this! Thank you, and it should now be fixed.

Copy-pasta for the loss! ;-)

And if this is your first exposure to these concepts, it might take a few reads. This is not intuitive stuff!

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 8, 2021 4:05 UTC (Fri) by neilbrown (subscriber, #359) [Link] (5 responses)

Part 7 suggests that RCU "vaguely resembles a reader-writer lock". I don't find that to be a helpful analogy, and have come to dislike the name "rcu_read_lock()" (preferring rcu_get()/rcu_put())
The only thing that RCU and rwlocks have in common is a refcount, and that is where I would build an analogy.

RCU provides an effective reference-count on *all* objects accessible through __rcu pointers, combined with a mechanism to get notifications when all previously claimed references have been dropped. It is similar to kref_get/kref_put with the counter being global and the 'release' function passed to kref_put() being asynchronous.

But maybe this analogy isn't SO much better that it can overcome a natural resistance to change.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 8, 2021 17:11 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

You are quite right that the analogy between reader-writer locking and RCU is quite rough. After all, at its core, RCU is nothing more nor less than a way of waiting on things to finish, and even then, only those things that have already started are guaranteed to be waited on. The things to be waited on are delimited by rcu_read_lock() and rcu_read_unlock() and friends, and synchronize_rcu() and call_rcu() and friends do the waiting, synchronously and asynchronously, respectively. To your point, this is almost nothing like a reader-writer lock, reference count, garbage collector, multi-version concurrency-control system, type-safe memory system, or any of the many other use cases to which RCU has been so successfully applied.

Unfortunately, when I tell someone about "a way of waiting for already-started things to finish", it is a very rare someone who bridges the chasm between this semantic and their particular use case. In fact, the most common response is of the form "but what on Earth can I possibly do with that???", albeit typically considerably less polite.

So yes, the analogy with reader-writer locking is quite rough. For example, RCU readers are not blocked by updaters, only very specific portions of RCU updaters are blocked by readers, and in some cases there is nothing delimiting the extent of an RCU updater.

But the analogy between reference counts and RCU is also quite rough. RCU readers cannot acquire a reference to a specific object, but instead acquire a "bulk reference" to each and every RCU-protected object that is still reachable by readers. RCU updaters have no way to determine if a given object is referenced by readers. RCU readers have no way to determine that their reference is the last one. RCU readers unconditionally acquire their bulk reference, whereas in many reference-counting implementations, an attempt to acquire a reference to a specific object might in fact fail. Yes, the CONFIG_PREEMPT=y RCU implementation's current->rcu_read_lock_nesting can be thought of as a per-task reference counter, but the CONFIG_PREEMPT=n RCU implementation has no such counter.

So why do I usually lead with the reader-writer locking analogy when describing RCU?

Because (1) RCU is most frequently used as an alternative to reader-writer locking, (2) Almost everyone is familiar with reader-writer locking, and (3) As a rough rule of thumb, those who have both RCU and hazard pointers at their disposal should first try using RCU to replace reader-writer locking and hazard pointers to replace reference counts. (And yes, the analogy between hazard pointers and reference counting is also rather rough: Hazard pointers cannot easily determine whether anything holds a reference to a given object and also cannot reasonably check that their reference is the last one for a given object.)

Does that help?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 11, 2021 0:19 UTC (Mon) by neilbrown (subscriber, #359) [Link] (3 responses)

Hi Paul,
yes that does help - particularly in providing me with specific points to disagree with - or at least to discuss.

While use of a refcount does provide the opportunity to read the counter value (kref_read()) it is my understanding that the code has no business doing that. The value could change between the moment when it is read and the moment when it is used, making it unreliable at best. However Linux does have a surprisingly large number of calls to kref_read(). Many are in debugging messages, which would be expected to be racy. I haven't analysed the others - probably many are justified. In any case I don't think this ability is core to refcounts in general, just useful in some particular refcount patterns.

While RCU cannot fail to claim the bulk-reference, rcu_dereference() can still return NULL. This can be roughly equivalent to kref_get_unless_zero() failing.

While the CONFIG_PREEMPT=n RCU implementation has no explicit reference counter, it seems likely to me that there is some implicit counter. At any moment there is some set of events which need to happen before a grace period can end. I think they are "a CPU calls schedule() or returns to user-space". The size of this set is the implicit ref-count.

While I accept that replacing reader/writer locking may be a common path to RCU, I don't think that at all justifies presenting such a flawed analogy. Describing RCU as "a way of waiting for already-started things to finish" is equally unjustified, though for completely different reasons. I personally prefer "exploiting deferred destruction" as the description, but even that misses the core idea.

The core idea of any RCU-solution is to use write-side atomics and read-side barriers. They are what replaces the various sorts of spinlocks. The use of rcu_read_lock() provides a context where these barriers and atomics can be used safely - particularly when combined with pointer dereference. I guess you could argue that this parallels a spinlock which provides a context where normal loads and stores can be used safely. Whether that parallel is enough to justify the analogy .... I'm not sure.

I was surprised by the strong contrast you suggested between RCU and hazard-pointers. They are both deferred-destruction mechanisms which seems to me to have similar use cases. In fact your perfbook suggests it may be possible for an implementation to switch between them based on dynamic assessment of performance needs. This points to substantial similarity.
I guess the distinction you are highlighting is that RCU can only hold its bulk-reference for a bounded time (or a least, there are costs that are at least linear in hold time). Hazard-pointers can protect a single pointer for an arbitrary time for comparatively little cost. This allows hazard pointers to *replace* refcounts in a way that RCU cannot.

This suggests I could clarify my analogy to say that "RCU provides an effective TRANSIENT reference-count on *all* objects...".
Or something like that.

The heart of my argument is that a spinlock typically prevents an object from changing, while a refcount prevents it from being destroyed. Surely RCU is more like the latter than the former.

Behind these thoughts is the observation that I read from time to time that RCU is difficult. Similarly that memory-barriers.txt is confusing. These comments remind me of the famous "you are not expected to understand this" in the Unix Edition 6 scheduler. It has always bothered me.
All computing is difficult, but we build abstraction and tools to make it manageable. "While" loops are better than goto's. Recursion is rendered transparent by invariants. Resource management can be greatly simplified by ownership tracking.
Telling people that something is hard, or that it does not need to be understood, is self-defeating as some of them might believe you and not even try.

If RCU is perceived as difficult then we (collectively) would be well served by efforts to find the weak points in our presentation of it and to strengthen them (and you are typically at the forefront of this). I genuinely think that "Like a lock" is one of those weak points.
For myself, I've found that thinking about RCU as a transient refcount on everything (which makes use of atomics safe) made it a lot easier to think about. Similarly the introduction of "read_acquire" and "store_release" terminology makes it much easier to think about memory barriers than "read-memory-barrier" and "write-memory-barrier" ever did.

Maybe the refcount analogy wouldn't work as well for others, and maybe we should avoid analogies altogether. Certainly "a lock that doesn't provide exclusion" doesn't seem like a winning description.

"RCU supports find-grained concurrency control using atomics and memory barriers and particularly allows pointers to be dereferenced without holding any lock" ??
(Some might say "supports lockless programming" but I hate that term as there still are locks - the CPU/memory controller manage them internally)

Thanks.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 11, 2021 17:15 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

Hello, Neil, and thank you for your thoughts!

This reminds me of a queuing-theory class I took some decades back, though more than 50 years after the passing of Agner Karup Erlang, who laid the foundations for modern queuing theory. The professor explained one aspect of queuing theory (I forget which, sorry!). Then explained it again. And again. At which point, I raised my hand and asked why we were going over this so many times. The professor complimented me on understanding that aspect so quickly and then asked me to be quiet while he continued getting it across to the remainder of the class. I eventually figured out that he was explaining that aspect of queuing theory from different viewpoints, so it could be understood by different mindsets. I just happened to be lucky in that he happened to pick my mindset first.

I thought nothing of this incident at the time, but have had many occasions to recall it since Jack and I invented RCU.

And this is because of the wide gap between RCU's base-level semantics and typical use cases. And the base semantic really is "waiting for pre-existing things to finish", as you can see from the LKMM formalization of both RCU and SRCU. And because of this wide gap, I am not at all surprised at your discomfort with this description. It is a good place to start only for those few people who like building up from the fundamentals (not me!), and is in many cases worse than useless for the many people who prefer thinking in terms of tools to take on particular use cases.

To your later "deferred destruction" point, Maged Michael and I group Hazard Pointers and RCU under the "deferred reclamation" heading, along with reference counting. But not all RCU use cases involve reclamation. Plus this loses much of the connection to replacements for reader-writer locking. (The reason we don't say "deferred destruction" has to do with C++ destructors, in case you were wondering.)

So it is necessary to present multiple views of RCU, just as it was necessary for that professor to present multiple views of that aspect of queuing theory, whatever it might have been.

And your reference-coiunting view is a most excellent view, which is why it has its own section in perfbook. More than one, in fact: 9.5.4.3 "RCU is a Restricted Reference-Counting Mechanism" and 9.5.4.4 "RCU is a Bulk Reference-Counting Mechanism". But to your point, it does not seem to be well represented in Documentation/RCU. Would you like to add it? (Not as a replacement for the reader-writer-locking analogy, but as an additional view.)

Yes, RCU and hazard pointers overlap quite a bit. See Tables 9.6 and 9.7 in perfbook for Maged's and my view of the advantages and drawbacks of each. The point I am trying to make is that if your code already uses reference counts and already handles reference-count acquisition failure, the software-engineering disadvantages of hazard pointers don't matter so much to you. Yes, you might simplify your code by switching to RCU, but there might not be so much incentive to do so. Similarly, if your code already uses reader-writer locking, you will have kept your read-side critical sections reasonably short, so the memory-footprint disadvantages of RCU don't matter so much to you. But it is all just software, and thus can all be changed. In theory, anyway. ;-)

A common RCU use case resembling reader-writer locking avoids changing data (as opposed to list pointers). This use case has readers seeing constant data, just as they would with classic reader-writer locking (but not with some of the more esoteric use cases that cause some people to not unreasonably object to the name "reader-writer locking"). And to your point, one reason why the C++ committee was interested in both RCU and hazard pointers was that they saw them as ways of taming many classes of lockless algorithms. And the C++ community is one reason why I shy away from describing RCU in terms of reference counting. Once someone in that community dives down the smart-pointer rabbit hole, it is extremely difficult for them to understand RCU. And almost impossible to haul them back out of that hole.

To people claiming that RCU is difficult, I go back further in time, to a late 1970s lecture by the late Edsger Dijkstra that I had the privilege of attending. He claimed that "while" loops were difficult and that the typical programmer could not be trusted to write one correctly. This claim resonated with much of the audience (though not with my then-young and arrogant self), but probably would not resonate today. So part of the perception of RCU's difficulty is cultural, and the culture is changing. Me, I have had quite a number of people tell me that they had heard that RCU was really difficult, and were surprised to find that not only was it reasonably easy, but that it simplified things (deadlock avoidance being the source of simplification that is usually cited).

So from what I can see, the trick is to get a given developer pointed at the view of RCU that best matches the use case at hand. Thoughts on attacking that problem?

And again, can I interest you in adding an "RCU as reference counter" .rst file to Documentation/RCU?

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 12, 2021 22:09 UTC (Tue) by neilbrown (subscriber, #359) [Link] (1 responses)

Hi Paul,
thanks for sharing your experience and wisdom!

> to a late 1970s lecture by the late Edsger Dijkstra that I had the privilege of attending

...envy...

> And again, can I interest you in adding an "RCU as reference counter" .rst file to Documentation/RCU?

Ahhh... the gentle art of delegation. :-) Stay tuned.

McKenney: So You Want to Rust the Linux Kernel?

Posted Oct 13, 2021 0:38 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

I very much look forward to seeing what you come up with!


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds