Resources for learning Rust for kernel development

By Daroc Alden
September 23, 2024

Kangrejos 2024

Dirk Behme led a second session, back-to-back with his session on error handling at Kangrejos 2024, discussing providing better guidance for users of the kernel's Rust abstractions. Just after that, Carlos Bilbao and Miguel Ojeda had their own time slot dedicated to collecting resources that could be of use to someone trying to come up to speed on kernel development in Rust. The attendees provided a lot of guidance in both sessions, and discussed what they could do to make things easier for people coming from non-Rust backgrounds.

He opened the session by noting that "most of you are special" — that the attendees were, by and large, already knowledgeable about Rust. They have written drivers, seen that abstractions were missing, and written the abstractions as well. So nearly everyone in the room was an expert, who knew all of the details of how Rust works in the Linux kernel. Behme isn't a computer-science person, though. His background is in electrical engineering.

He put up a picture of Linux Device Drivers, 3rd edition, asking: does there also need to be a book about Rust kernel abstractions? Rust is said to have a steep learning curve — and Rust-for-Linux goes even further, since it involves writing low-level code in a particular style and the kernel is always under heavy development.

To illustrate his point, Behme put up some examples of beginners asking about writing kernel Rust. One person was having trouble writing a module. Alice Ryhl had replied to them that the abstraction they were using had changed its API, and explained how to adapt their module. This isn't an uncommon problem — others have also reported needing time to adapt, he said. Behme himself took some time to figure out the devicetree abstraction — about a week. He said that this wasn't a complaint, just an example of how learning the necessary prerequisites can be hard, and how the project could have better learning materials.

Andreas Hindborg said that when an abstraction goes into a kernel tree, the requirement is for there to be a user of that abstraction — so there should be an example right there in the tree. In practice, he said, the abstractions that do go in also tend to have good examples in the documentation. So the project certainly intends for the type of learning material Behme was asking for to exist.

Miguel Ojeda pointed out that there may be books about Linux device drivers, but that it's still early days for Rust in the kernel. It took time for those books to be written, he said. "We were thinking about writing a book," he continued, but it was just too much work right now.

Behme replied that he did think that the project was doing a good job with documentation, but that it was not enough. At his work, he had asked about whether they could start using Rust-for-Linux soon; his manager said no, not for technical reasons, but for social reasons — the learning curve from C to Rust was too steep for most of the engineers at his company.

One audience member asked whether that difficult curve was due to the language, or the Rust-for-Linux project. Behme said that the main concern was Rust, but that the project adds complexity on top of that. He gave the example of looking at some Rust code and seeing that it called spin_lock(), but "why the hell was there no unlock"? (Answer: the spin_lock() Rust abstraction returns a guard object that automatically releases the lock when it is dropped — either explicitly by the programmer, or implicitly at the end of the function.) There are examples for these things, but the underlying reasoning is different from C, and that takes time to learn.

Paul McKenney noted that modern kernel C code actually has similar lock guards now, and that maybe this would make Rust's use of lock-guard objects less counterintuitive. Ryhl wondered whether having translations for common conventions between languages would be helpful.

Hindborg agreed that it takes time to learn a language, and that Rust is fairly difficult to learn as imperative languages go. But "you need to invest time in learning that; it doesn't come for free". Once you have put the time in, there are substantial benefits. He suggested that Behme tell his manager that, noting that Google claimed a 3x productivity increase with Rust. He also said that while Rust-for-Linux does add some additional details on top of plain Rust, that's nothing new for the kernel — C in the kernel is pretty different from C in user space.

Ryhl asked what it takes to teach other developers to write Rust, in the other attendees' experience. She noted that there would be an upcoming talk at RustConf on that topic, actually. Another attendee pointed out that Rust-for-Linux patches go via the mailing lists, just like any kernel patch — so the documentation that justifies or explains a change is often there. Behme asked if it would be possible to get something like a set of release notes for each kernel, talking about what changed.

Hindborg replied that all of the changes are there in Git. Ojeda suggested that when Behme saw a change to an API, he should go look at the corresponding commit.

Greg Kroah-Hartman pointed out that the kernel's C developers don't provide internal kernel-change information — so why should the Rust developers do that? On a related note, he advised against using Linux Device Drivers, because it is now seriously out of date. Ojeda agreed, noting that the changes Behme had highlighted were all to internal Rust APIs — and just like other kernel interfaces, there is no guarantee that they can't change at any time.

Richard Weinberger thought Behme had a good point, however — he noted that most people writing device drivers are electrical engineers, not computer scientists. From that point of view, Rust looks "outlandish and hostile". If you know OCaml and Haskell, Rust looks awesome, he said. The Rust-for-Linux developers should be careful not to assume that kernel hackers who only know C have the same positive impression.

Hindborg replied that he understood Weinberger's point, but that he was himself an electrical engineer who learned C and then Rust. It's not impossible, he said, and you can expect people to learn new tools.

Yes, but you need to give them motivation to do so, Weinberger responded. Benno Lossin said that as a writer of Rust documentation, it's often hard to know what a beginner won't understand. If you're coming at it from the Rust side, the reason that there's no corresponding unlock() in the code is pretty clear. We need to listen to the people coming from the kernel side who have problems in order to improve our documentation, he said. He asked Behme to write down some of the problems he had encountered, so they could turn it into some documentation. Behme agreed.

Lossin also agreed that the changes to the Rust APIs were frustrating, but that they could become better over time — there's a preexisting plan to split up some of the functionality of the kernel crate into smaller crates that should therefore see less frequent changes. He said that there are so many people working on the kernel crate right now that it's hard for anyone to track all of the changes.

Gary Guo thought that Rust would actually shine for developers writing device drivers. It's unrealistic to expect every engineer to understand C's many sharp edges, he said. In Rust, the APIs are not all there yet, but it's possible that they could only ever need to write safe Rust code. So there's actually value in getting less experienced engineers to write Rust — the compiler will help them write fewer bugs.

Simona Vetter said that, in her experience, the average kernel C developer doesn't understand C either. There are, in theory, five people who could write a bug-free driver, and in practice zero. It's practically impossible to write bug-free kernel code, she stated.

Behme replied that, in industry, you have to write drivers. So getting acceptable drivers out of real engineers is a requirement. Vetter thought that Rust could actually be helpful with that — her hope is that new engineers could just type out "random code" in Rust and get a correct driver out, which would never happen in C.

Hindborg thought that was an interesting observation. He predicted that lots of people would be angry if their employer told them to write Rust, because nobody likes being told off by the compiler. But despite that, when the necessary libraries are in place, perhaps we can just never compile a buggy driver.

Ryhl noted that she has seen other people contribute to her driver without adding any abstractions. So in at least one domain, things have gotten to the point that things are mostly stable.

Collecting resources

At that point, it was officially time to go to the next session. But, luckily, the next session was scheduled to be a roundup of different educational materials, with the aim of producing a recommended list for learning Rust in the kernel.

Ojeda asked that people list out the resources they found most helpful while learning rust. For his own part, he found an online book from Brown University's Cognitive Engineering Lab with an interactive borrow-checker helpful.

Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust. Lots of concepts map, he said, but Rust is much better. Adrian Taylor suggested the New Rustacean podcast. He thought that audio was a weird way to learn a programming language, but he liked it in this case. He also suggested a series of articles, "Learn Rust the Dangerous Way", which shows the incremental conversion of a C program to Rust.

Kroah-Hartman said that the Linux Foundation has a free online course for learning Rust. Hindborg said that Google had a free five-day course as well, "Comprehensive Rust".

Lossin gave a more general recommendation — read blogs. There are lots of good posts on advanced topics, he said. He particularly liked Amos Wenger's explanation of Pin. Ojeda suggested the Master's thesis "You Can't Spell Trust Without Rust" by Aria Desires as a good resource for advanced topics as well. She also wrote "Learn Rust With Entirely Too Many Linked Lists", which nobody recommended at the time, but that is also intended as an introduction to Rust for programmers with existing C experience.

With the resources collected, the discussion turned to what to do with them. Bilbao said that the project should make a distinction between people just starting out, and people who have been writing Rust in Linux for some time — they have different needs. He suggested using the Rust-for-Linux web site as a central location for hosting good blog posts, but also thought that it was important that the project be "serious" about ensuring things are well documented.

Lossin noted that there is already a linter rule requiring that all public items (functions and types) be documented. There was a brief discussion of current kernel conventions for documenting C code.

Vetter ended up pointing out one problem with kernel-doc, the tool that checks whether C code is documented. It doesn't complain when there are no comments, but it does complain when there is one comment and some still missing. This makes people not want to add documentation where it doesn't already exist. Rust is ahead, she said, because just requiring that documentation exists, even if people don't put effort into it, makes it easier to improve later. She pointed out that the overlap between people who are good at writing complicated code, and people who are good at writing documentation is often not big — so it's okay to encourage people to collaborate.

In all, there was a clear consensus that the Rust-for-Linux project could make it easier for people to get up to speed with the knowledge necessary to write Rust in the kernel. So the project will continue to encourage good documentation standards, centralize learning resources, and work with other kernel developers who bring up pain points to figure out what else needs to be covered.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos. ]

Index entries for this article
Conference	Kangrejos/2024

> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.

Posted Sep 23, 2024 16:25 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (3 responses)

Yes, but I hope "hate it" part is where the joke is :-).

The easiest way to start hating Rust is to implement something with OsString, keep them around (so you can't use String goodies) and pass them to external C string-accepting library (so it must be CString in the end).

It can be quite paternalistic language. Expect culture shock here.

> why the hell was there no unlock

This is so true. In C++ one would write

{
auto _ = std::lock_guard{obj->mutex};
...
}

which hides unlock which takes time to get used to.

Kernel being kernel with 8 spaces per tab doesn't help so this additional indent level may trigger checkpatch.pl alarms.

> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.

Posted Sep 23, 2024 19:07 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> The easiest way to start hating Rust is to implement something with OsString, keep them around (so you can't use String goodies) and pass them to external C string-accepting library (so it must be CString in the end).

The basic reason this is painful is because:

1. You are trying to round-trip possibly-invalid Unicode...
2. ...and (presumably) do some string transformations on it that may not be reasonably applicable to invalid Unicode...
3. ...and then you want to impose a restriction on the output format (no embedded nulls) that was not applied to the input (OsString does not prohibit embedded nulls, since it can be constructed directly from String without checking for nulls).

Of course this is all doable, but if you want it to function correctly, you are going to have to stop and think a little bit about what "function correctly" even means in this context. Frankly, you are either going to have pain up-front or pain later (when it does something subtly wrong), no matter what language you use for this. Rust is a little unusual in that it forces you to have that pain up-front instead of later, but that's arguably the whole point of using Rust.

Anyway, I would also point out that Rust does provide slice::utf8_chunks(), which makes this at least somewhat practical (if a little fiddly). See https://doc.rust-lang.org/stable/std/primitive.slice.html... for example code. Of course, if you're not in UTF-8, that's useless... but I'm not convinced this is even feasible in (most) encodings other than UTF-8 in the first place (UTF-8 is self-synchronizing, so you can "resume" decoding it after getting interrupted by invalid bytes, but most other encodings make no attempt to support that, e.g. if UTF-16 gets offset by one byte, then the whole rest of the string will be parsed incorrectly - and that's the easy one, Shift JIS is even worse in comparison since it reuses some single-byte code units as the second of a two-byte code sequence). You probably could do it in a legacy 8-bit encoding like ISO-8859-*, but meh, at that point you can just iterate one byte at a time anyway.

> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.

Posted Sep 24, 2024 14:48 UTC (Tue) by aragilar (subscriber, #122569) [Link]

https://docs.rs/bstr/latest/bstr/ also makes this easier when you know you're dealing with probably-UTF-8.

> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.

Posted Sep 24, 2024 6:15 UTC (Tue) by hunger (subscriber, #36242) [Link]

That's standard fare for decades now in a ton of programming languages, not just Rust. I find it depressing how disconnected from general language developments Linux Kernel devs seem to be.

I just hope rust won't suffer with useless features, added only to please some old guys in the kernel community.

Locking

Posted Sep 23, 2024 17:35 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (28 responses)

The lack of "unlock" has led to proposals to give Rust's Mutex<T> type an unlock method.

The way you'd do that is your unlock function takes the guard as a parameter. Since Rust has the destructive move semantic, the unlock doesn't need to actually "do" anything, it can return immediately - the guard was moved into the function and then not returned, it's gone - it was dropped, which gives effect to the programmer's intent.

Locking

Posted Sep 23, 2024 18:45 UTC (Mon) by intelfx (subscriber, #130118) [Link] (19 responses)

> The lack of "unlock" has led to proposals to give Rust's Mutex<T> type an unlock method.
>
> The way you'd do that is your unlock function takes the guard as a parameter. Since Rust has the destructive move semantic, the unlock doesn't need to actually "do" anything, it can return immediately - the guard was moved into the function and then not returned, it's gone - it was dropped, which gives effect to the programmer's intent.

And then you will have half of the code using this no-op `.unlock()` and the other half of the code relying on the implicit Drop of the guard in the function scope itself. Sounds like a recipe for "disaster" levels of inconsistency.

You can, of course, mandate one way or the other via a code style document and/or linters. But that would add friction and strike another blow to the "it compiles, ergo it works" ideal.

Locking

Posted Sep 24, 2024 2:56 UTC (Tue) by viro (subscriber, #7872) [Link] (18 responses)

How would the compiler verify that a call you've added in the area under mutex will not lead to the same mutex being grabbed? If the answer is "it won't", you need to be able to see that without its assistance. The lack of explicit unlock makes that much harder to see.

Sure, you don't lock to protect the code, or even the data - you lock to protect the invariants. But code is what you change, so unless one's answer to everything (including debugging) is "throw it all away and rewrite from scratch", you do need to be able to reason about the locking state at given spot in the code...

Locking

Posted Sep 24, 2024 6:58 UTC (Tue) by roc (subscriber, #30627) [Link] (11 responses)

> How would the compiler verify that a call you've added in the area under mutex will not lead to the same mutex being grabbed? If the answer is "it won't", you need to be able to see that without its assistance. The lack of explicit unlock makes that much harder to see.

As noted in this article, the kernel is already using scoped guards in C. E.g.:
https://github.com/torvalds/linux/blob/abf2050f51fdca0fd1...
so it's too late to require explicit "unlock".

Locking

Posted Sep 24, 2024 17:09 UTC (Tue) by viro (subscriber, #7872) [Link] (10 responses)

... and it's a trouble waiting to happen. Look around and you'll find all kinds of bad ideas, most with exact same reason - hard to reason about the program state. With varying unpleasantness - a few goto in a 10-line function could be just fine, but it can easily get much worse.

As those things go, guard() is pretty high on the "easily turned into a landmine" scale. scoped_guard() is saner, but it comes with its own set of headache sources - deeply nested structure can get confusing.

It's not about preserving some kind of purity; there's no such thing in the real world anyway. Bitrot _is_ a fact of life; there's no One True Tool/Style/Language that would prevent it. What matters is how brittle the thing is. Another fact of life is that trying to figure out a bug elsewhere will lead you through a lot of unfamiliar code (possibly - yours, but with detailed memories of the area swapped out of active memory; when there's a dozen of areas you need to get through, the latency of mental context switch can be very painful); you will need to make decisions about the next thing to look into, and do that based on the reasoning about unfamiliar code. It _can't_ be avoided; what matters is how hard will it be. And yes, that includes the need to modify something you've written ten years ago. It happens. And it's all about tradeoffs.

As for the __cleanup-based tools... in moderation it's fine, but blind use can lead to a lot of PITA. In particular, it comes with serious, er, adverse interactions with other tools. We'll see how bad a source of bitrot it will be; for now I'm very cautious about the cases where accidentally delaying the cleanup can cause problems, and locking is firmly in that class.

Locking

Posted Sep 24, 2024 19:44 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> ... and it's a trouble waiting to happen. Look around and you'll find all kinds of bad ideas, most with exact same reason - hard to reason about the program state.

Why?

The reasoning here is simple: everything is locked after the scope guard is taken until the end of the scope. And you can't access the protected data accidentally without taking a lock. You also can't forget to clean up the lock in "goto cleanup". Early returns are also not a problem anymore.

It does reduce the flexibility a bit, but hand-over-hand locking is pretty rare.

Locking

Posted Sep 24, 2024 20:02 UTC (Tue) by daroc (editor, #160859) [Link] (3 responses)

One also can drop the guard explicitly:

    drop(guard);

So I don't think it's less flexible, really. But, as in every discussion about programming languages, it's not really about what's possible so much as what the language makes easy or hard. I think it's a good point that Rust's locking is less explicit! That's certainly true, and it is a tradeoff. I (personally) think that the guarantees the compiler provides are worth it, but that doesn't mean we shouldn't acknowledge that less explicit locking does have downsides.

Forcing explicit destructuring

Posted Sep 24, 2024 20:19 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

There is some thought being put into whether something like undroppable types would be a useful addition to the language. These would allow Rust for Linux to have a scope guard that must be explicitly destroyed (probably via a fn unlock(self) function), and where dropping them without calling that function is a compile-time error.

Forcing explicit destructuring

Posted Sep 24, 2024 20:21 UTC (Tue) by intelfx (subscriber, #130118) [Link]

> There is some thought being put into whether something like undroppable types would be a useful addition to the language

So… true linear types (as discussed right in this comment section)? :-)

Locking

Posted Sep 25, 2024 17:54 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

One can argue that drop(guard) is just a funny way of spelling (the currently nonexistent) guard.unlock().

Locking

Posted Sep 24, 2024 20:49 UTC (Tue) by viro (subscriber, #7872) [Link] (1 responses)

... except that when you have several levels of nesting, finding the end of the right scope 80 lines down from where you've taken the sucker can be an interesting exercise. spin_unlock(...) is a lot more recognizable than one } in the series of 4.

BTW, do *NOT* mix that with goto-based cleanups without a lot of care. Any goto *into* that scope must come from that scope itself, which would be trivial if not for the fact that guard() can be not the first thing within the compound statement. In that case the scope extends from guard() to the end of compound statement. Now, it's very obvious that

if (condition) goto exit_something;
{
some_type var = foo();
....
exit_something:
something(var);
...
}
is a bug - no matter how liberal your interpretation of standard might be, on the path that takes this goto exit_something you have var used uninitialized. Anyone tempted to add such goto would see that this candidate failure exit is not suitable, no matter how similar it might be to what you want. And compiler will catch that if you try.

What's less obvious is that it applies to goto around the guard - there's no visible spin_unlock(...) in the end, but it is implicit and there's exact same kind of bug. Worse, gcc 12 and earlier does not even warn you - it goes ahead and produces broken code. clang does catch it properly, but neither gcc nor eyeball do.

So _if_ you use __cleanup-based cleanups, be very careful about labels in the scope - any goto from outside of scope to that will be trouble. With zero visible indication that such and such failure exit is *NOT* suitable to jump into from before the declaration that carries __cleanup on it. Gets especially nasty when the damn thing sits in the middle of function body, not nested at all. Do that and you've turned the goto-based cleanups you might have there into bugs.

It's not a theoretical concern - I've run into just that several time this cycle. Ended up with doing cross-builds on clang (and excluding the targets not supported by clang - thankfully, build coverage had not suffered in the places I needed to touch), but step into that while debugging something and forget about gcc missing such stuff and you are looking into a really fun debugging session...

In some cases it's a useful tool, but it's _really_ not something that could be used without (apriori non-obvious) care.

Locking

Posted Sep 24, 2024 21:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> ... except that when you have several levels of nesting, finding the end of the right scope 80 lines down from where you've taken the sucker can be an interesting exercise. spin_unlock(...) is a lot more recognizable than one } in the series of 4.

Here's a thought: why does that matter? Is that because the scope is too large? Then it's probably a good idea to extract the locked code into its own scope. In my code, I also try to take the lock at the beginning of a scope, so the entire scope itself becomes a visual indicator of what's locked. The kernel code style with 8-space tabs makes this less practical in C, but Rust is different.

And if you want to re-lock the object, you need to look up for the locks that were just taken, instead of down to see the unlocks. It's a bit different perspective, but it's about the same level of mental overhead.

> What's less obvious is that it applies to goto around the guard - there's no visible spin_unlock(...) in the end, but it is implicit and there's exact same kind of bug.

Yes, the two styles are completely incompatible, and it's better not to mix them at all in one function (perhaps, a checkpatch.pl rule?). Good news is that Rust will not allow this kind of error.

Locking

Posted Sep 24, 2024 20:53 UTC (Tue) by roc (subscriber, #30627) [Link] (2 responses)

The C cleanup-based guards were recently added, in 2023. So the decision was made, by Linus and others, that the advantages outweigh your concerns. I don't see why this needs to be relitigated for Rust.

Locking

Posted Sep 28, 2024 23:12 UTC (Sat) by viro (subscriber, #7872) [Link] (1 responses)

> The C cleanup-based guards were recently added, in 2023. So the decision was made, by Linus and others, that the advantages outweigh your concerns. I don't see why this needs to be relitigated for Rust.

A nice bit of MBAese, that... Opposition between "advantages" on one hand and "your concerns" on another is particularly charming - the former being implicitly objective and unchanging, while the latter - subjective and theoretical in the best case. Use of passive-with-reference-to-management syntax is also impressive. Well done.

Back in the real world, cleanup.h contains tools that are
1) potentially useful
2) experimental and not well-understood
3) demonstrably dangerous in incautious use - and that's demonstrably, not theoretically. Fresh example: https://lore.kernel.org/lkml/20240927-reset-guard-v1-1-29...

The tools in question have nothing to do with Rust, so their relevance in this thread is, indeed, questionable. Who'd dragged them in, I wonder... <checks> https://lwn.net/Articles/991431/ is where that had happened, so it would be yourself, wouldn't it?

Locking

Posted Sep 29, 2024 7:22 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> 2) experimental and not well-understood

Deferred cleanup has been used in C++ and Go for _decades_ and is very well understood. The main rule is: do NOT mix it with goto-based control. It can even be automated via a simple checkpatch.pl rule.

In that case, the of_node_put should also be rewritten into the guard-style cleanup.

Locking

Posted Sep 24, 2024 11:55 UTC (Tue) by daroc (editor, #160859) [Link] (4 responses)

The Rust standard library mutexes won't protect you from this, and neither will the ones designed (so far) in the kernel. But after Kangrejos, I went to RustConf. One of the presentations there was about how to encode lock dependency ordering into the type system so that it can be checked at compile time. So ensuring that there are no deadlocks from cyclic mutexes is possible, and the Rust-for-Linux folks could adopt that approach if they wanted to.

Keep an eye out for an article about that in the coming weeks.

Locking

Posted Sep 24, 2024 17:21 UTC (Tue) by viro (subscriber, #7872) [Link] (3 responses)

Details would be very interesting... Basically, what kind of information do you need to put into declarations and how expressive the mechanism is - can it e.g. deal with "you need to take locks on the tree nodes in the parent-before-child order" kind of rules? That's where the things tend to get tricky...

Locking

Posted Sep 24, 2024 17:31 UTC (Tue) by daroc (editor, #160859) [Link]

The parent-before-child order is actually fairly easy to enforce in Rust without using the technique from the talk (which was focused on imposing a global order). For the tree case, imagine a node looks like this:

    struct Node {
        some_unprotected_data: usize,
        protected_data: Mutex<NodeInner>,
    }
    struct NodeInner {
        more_protected_data: usize,
        children: Vec<Node>,
    }

By putting all the data that should only be accessed with a Mutex held inside an inner structure like that, we can enforce that the program can only have references to it that live as long as the lock is held. So if I had a reference to a Node, I couldn't access the children until I locked the Mutex. And then I could only hold on for references to them until I unlocked the Mutex, at which point the compiler would no longer let me use them.

Locking

Posted Sep 24, 2024 22:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

In general, sequencing of operations can be enforced using typestate APIs. The broad idea is that the earlier operation returns a thing (which can be zero-size, i.e. nonexistent at runtime) which is consumed (or borrowed, depending on the desired semantics) by the next operation in the sequence. Technically, lock guards are an example of this pattern, since you cannot access the underlying data without going through the Mutex, and you cannot unlock the Mutex without dropping the guard. So it is impossible to access data without locking the Mutex.

The more difficult problem is preventing deadlock (which has never been an explicit goal of Rust's safety guarantees). Here's a simple way to at least prevent a lock from being recursively acquired:

* Invent a move-only wrapper (i.e. a type that is not Copy or Clone) for &Mutex. Each thread has this wrapper instead of the actual &Mutex, and the &Mutex is private so you can't directly get it out again.
* The wrapper has methods to lock the underlying Mutex as usual, but those methods borrow the wrapper mutably to ensure they cannot be called more than once.
* The wrapper is also !Send (otherwise, one thread could end up owning two of them and break the rules).

But this is unideal for at least two reasons:

1. The whole point of Mutexes is to allow shared access to mutable data. A move-only wrapper is just undoing all of that hard work. Yes, it's per-thread, but that's still painful.
2. It does not solve the broader lock hierarchy problem.

So I thought some more, and came up with a more elaborate idea, which works as follows:

* Each distinct lock in the lock hierarchy gets its own &Mutex wrapper (but those wrappers are Clone and Copy, so you can do all the same things you can do with &Mutex now). You can maybe use macros to make the wrappers less tedious to write.
* Each wrapper impls a generic trait that indicates its relationship with other locks in the hierarchy. For example, impl LockAfter<A> for B{} would mean "you may lock B after you have locked A."
* There's an empty struct (actually it has a PhantomData, but it's "empty" in the sense that it takes up zero bytes and gets constant-folded out of existence) which can only be constructed by these wrappers, or by some private initialization function that is callable from whatever "management" or runtime code is directly responsible for spawning threads. The struct has a generic type parameter which can either be one of the wrapper types, or it can be (). It also has a lifetime parameter, which it may not outlive (as constrained with PhantomData) To avoid having to write "the struct" over and over again, let's just say this struct is called LockHierarchy<'a, T>.
* LockHierarchy is not Clone. You have to pass it by value everywhere (but at runtime, it vanishes, so you can pass it around as much as you want without incurring any overhead).
* LockHierarchy is also !Send, so that no thread can ever own more than one.
* When a thread is created, the caller gives it a LockHierarchy<'static, ()> instance (created using the private initialization function). This construction must be done in the callee thread's startup code and not on the main thread, because it is !Send and can't be transferred, but this is not really much of a problem since the thread-spawning code can simply arrange for the thread to construct this object before it calls into the application code.
* Each wrapper takes a &'b mut LockHierarchy<'a, T> argument to lock(), and returns both a lock guard and a new LockHierarchy<'b, U> instance, where U is the type of the wrapper, T is trait-constrained to come before U in the lock hierarchy (or T = () is allowed unconditionally), 'b is the lifetime parameter of the lock guard, and 'a must outlive 'b. The latter constraint does not need to be spelled out, since it is implied by the validity of the type &'b mut LockHierarchy<'a, T>.
* At any given time, each thread should have exactly one LockHierarchy object (which is not borrowed mutably), that object is required in order to take any locks, and the type of the LockHierarchy object determines which locks you are still allowed to acquire. Each LockHierarchy's lifetime parameter stops it from outliving its associated lock guard.
* A thread can e.g. std::mem::forget() the LockHierarchy, or drop it, but oh well, that just means you can't lock anything. It doesn't cause unsafety or other problems, because LockHierarchy is just a trivial opaque object with no internal machinery.

I *think* this comprehensively prohibits any lock inversions from happening, statically, with zero overhead, provided that all mutexes in the system participate and your hierarchy really is acyclic. I also think you can probably use dyn as an "escape hatch" to check lock ordering at runtime instead of compile time, if it turns out that it's too hard to prove that some code is sound.

But I have not actually tried to implement this, so there might be some problem with it that I cannot see. For example, I'm not sure that Rust's trait solver is smart enough to materialize the whole DAG of an arbitrary lock hierarchy, so you might need to use macros for the hierarchy traits as well as the wrappers. Or you might tell everyone to manually write out the individual hierarchy traits they need, and check for cycles with some sort of linter or static analysis tool.

Locking

Posted Sep 24, 2024 22:34 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

So I looked around a bit more and there are at least two crates which already do something resembling the above:

https://docs.rs/ordered-locks/latest/ordered_locks/
https://docs.rs/lock_ordering/latest/lock_ordering/

So yes, this is a thing you can do. Whether it is entirely feasible for very large and complicated lock hierarchies is still unclear to me.

Locking

Posted Sep 24, 2024 14:01 UTC (Tue) by ehiggs (subscriber, #90713) [Link]

Note: all my experience is in user space, using Rust with access to the std library.

> How would the compiler verify that a call you've added in the area under mutex will not lead to the same mutex being grabbed?

In some cases, tools can find this. For example in this Playground link I lock a mutex twice. The compiler will build and run the executable but deadlocks. If we use Tools->Miri, the issue is found. But I think this is found by running the program in an interpreter rather than detecting it at compile time.

https://play.rust-lang.org/?version=stable&mode=debug...

For more elaborate issues, the language will make the types messier and messier to work with as you venture further out of bounds.

For example if you have a `my_lock = std::sync::Mutex<MyData>` then getting access to `MyData` involves calling `my_lock.lock()` which gives a `MutexGuard<MyData>`. You can't pass a `MutexGuard<MyData>` to a function expecting `&MyData` so you need to try to get the reference to `&*MyData` - a reference to a dereference. Already, this is looking a bit weird and hopefully helps a reviewer. But you also have the MyData reference and pass that around.

While I C you might think 'how can I get access to the pointer and make sure it's locked', in the Rust case if you can find access to the reference then you know it's locked - otherwise it would have still been behind the Mutex wrapper.

Locking

Posted Sep 23, 2024 19:02 UTC (Mon) by atnot (subscriber, #124910) [Link] (6 responses)

This is one reason I'd personally love to see true linear ("must move") types[1] in Rust. Not really for locks, I never have issues with those (although I do understand being scared of the thought as a C developer, especially one that is used to protecting code instead of data[2]). But there's a bunch of scenarios where you need a certain function to be called for correctness and things quickly get hairy when ordering, io, async and extra contextual info gets involved. In those cases you currently just have to resort to a whole lot of dirty hacks like stashing data in the struct just to have it available on drop, using thread locals, using must_use use lints, contorting the whole design to be robust to anything being dropped at any time, or doing things that are technically safe but non-desirable like leaking memory or panicking. There's just a whole lot of things that could be a lot nicer with some sort of "here's a thing, please give it back to me when you're done with it" mechanism.

And as a bonus, people can have their lock functions in a useful way too if they really want.

[1] for the unitiated: affine: at most once (Rust has this), linear: exactly once. I don't know why the math people use those words.
[2] https://blog.ffwll.ch/2022/07/locking-engineering.html

Locking

Posted Sep 23, 2024 19:51 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (3 responses)

> There's just a whole lot of things that could be a lot nicer with some sort of "here's a thing, please give it back to me when you're done with it" mechanism.

You can steal the API from std::thread::scope(), but I'm not sure how applicable that is in all cases. The basic idea is as follows:

* The caller passes in a closure, which takes some opaque scope object by reference.
* The scope object provides methods for the closure to do whatever thing(s) the caller wants to do (in the case of thread::scope(), spawn threads).
* The scope object internally keeps track of whatever thing(s) the closure does, and whatever cleanup operation(s) may be required in response (in the case of thread::scope(), join those threads to ensure they do not outlive the scope).
* Because the closure is called from the callee, the callee always gets control back after the closure returns (or panics, if unwinding is enabled and we use catch_unwind). The callee can ensure that all cleanup code is executed, by referencing the scope object's internal data. Since the closure does not receive ownership of the scope object, it cannot forget it.
* In the case of thread::scope(), the scope object also gives the closure "handles" which it can use to reference the operations (threads) it has performed (spawned). But if the closure forgets these handles, it has no effect on the scope object, because they're just handles - they do not need to be dropped for the scope object to do the appropriate cleanup.
* Lifetime parameters are used to prohibit the closure from smuggling these objects and references out to the caller and trying to carry out further mischief with them.

This is obviously not as flexible as true linear types would be, but it is better than nothing.

Locking

Posted Oct 2, 2024 10:45 UTC (Wed) by taladar (subscriber, #68407) [Link] (2 responses)

Couldn't you just force (assuming code doesn't panic or otherwise never return) the "give it back" semantics by making the type of the closure return the opaque type you pass in that the closure has no way to construct and so it is forced to return the one it got in its own parameter?

Locking

Posted Oct 2, 2024 12:56 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

I think that if you have a closure, there's no need for anything that tricky - you can just pass an exclusive reference to some object into the closure. The closure can't drop the object or leak the reference outside of its scope. The closure can't fail to return except by infinitely looping or blocking (which is relatively easy to debug by looking at the offending thread's call stack). When the closure returns, you know you've got the object back, and you can do any relevant cleanup.

For example, mutexes could be designed like:

let mut m = ScopedMutex::new(42);
m.with_lock(|i: &mut i32| { println!("{}", i); });

where with_lock guarantees it will always lock and unlock correctly around the accesses (unless the closure never returns).

The issue is, it would be nice to get similar guarantees *without* the closure, because closures create a potentially-inconvenient nested code structure and can introduce some extra lifetime challenges. Usually that's fine, but sometimes it's rather annoying. Better to have an API like:

let mut m = Mutex::new(42);
let i: MutexGuard<_> = m.lock().unwrap();
println!("{}", i);
std::mem::forget(i); // this should be an error - it should be required to properly drop i (by letting it go out of scope, or calling i.unlock() which transfers ownership of i, etc)

Currently the forget() is allowed and the mutex will never get unlocked, and some unrelated code may deadlock later (which is much harder to debug). It also means the Mutex might be dropped while it is still locked, so Mutex has to know how to clean up from that unusual state. Prohibiting the forget() (and any other code with similarly forgetful behaviour) will require new Rust language features.

(I think that's an easy trap to fall into when designing APIs: you design a MutexGuard whose lifetime is less than Mutex's lifetime, so you know the MutexGuard cannot be dropped after the Mutex is dropped (which is true, and a very helpful guarantee), and intuitively that means the MutexGuard will be dropped before the Mutex is dropped. And that's almost always true, but it's not guaranteed - the MutexGuard might have been forgotten instead of dropped - so you have to either handle that correctly (which can be difficult), or switch to a closure-based API.)

Abuses of mutable references

Posted Oct 2, 2024 13:44 UTC (Wed) by farnz (subscriber, #17727) [Link]

Note when doing that sort of analysis that you also have to take into account std::mem::swap and friends. An exclusive reference to a type I cannot construct works fine for what you describe, but once you allow a user-chosen type into the closure parameters, you have to consider what happens if I swap it out for another one I have lying around. I think that for the case of a single layer of locking, it's fine, but it becomes more exciting once you consider the case of nested locks, and my brain isn't up to looking for obscure corner cases in that situation.

Locking

Posted Sep 23, 2024 20:45 UTC (Mon) by aszs (subscriber, #50252) [Link] (1 responses)

Not mathematician but went down a googling rabbit hole.

The terms "linear types" and "affine types" comes from substructural logic theory which in turn borrows those terms from linear and affine equations in pure math theory.

The connection is this:

linear equations are of the form 𝑓(𝑥)=𝑎𝑥 and affine are 𝑓(𝑥)=𝑎𝑥+𝑏, where 𝑎 and 𝑏 are arbitrary constants.
So in the logic:

Linear: f(x) = a*x, (x^1 as opposed to x^2...) "use x once"
Affine: f(x) = a*x + b = a*x^1 + b*x^0, "use x once or zero times".

perfectly obvious!

Locking

Posted Sep 24, 2024 6:53 UTC (Tue) by vasvir (subscriber, #92389) [Link]

Thanks - you just saved me a trip to this rabbit hole and back...

Locking

Posted Oct 5, 2024 14:37 UTC (Sat) by slanterns (guest, #173849) [Link]

> The lack of "unlock" has led to proposals to give Rust's Mutex<T> type an unlock method.

https://github.com/rust-lang/rust/issues/81872#issuecomme...

It was once added to the standard library as an experiment, and later withdrawn since the libs-api team thought it's better to just use `drop`.

QOTW

Posted Sep 23, 2024 17:58 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust.

This should be a quote of the week. It's so true.

QOTW

Posted Sep 24, 2024 22:34 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

I actually ended up learning a lot more C++ as a result of learning Rust. But as a result I like it even less now than I did five years ago when I had written no Rust. I do not think I would recommend learning C++, and I particularly cannot recommend the choice (apparently made by several institutions) to teach C++ as first language.

New Tools

Posted Sep 23, 2024 19:08 UTC (Mon) by richard_weinberger (subscriber, #38938) [Link] (10 responses)

I’d like to clarify something that isn't fully addressed in this article. While I completely agree that it's reasonable to expect a developer to learn new tools, I want to emphasize that Rust is more than just another tool. Rust requires adopting a different mental model for programming. Trying to apply familiar idioms from C can lead to significant challenges and frustration when working with Rust.

That being said, I encourage everyone to learn Rust, but keep in mind that it comes with its own set of idioms and approaches that are essential to understand.

New Tools

Posted Sep 24, 2024 5:40 UTC (Tue) by dirklwn (subscriber, #80581) [Link]

> Rust requires adopting a different mental model for programming

Yes, exactly!

And the question we should discuss about how to make it easier for people with the C-kernel mental model to adopt.

New Tools

Posted Sep 24, 2024 7:17 UTC (Tue) by Wol (subscriber, #4433) [Link]

> Trying to apply familiar idioms from C can lead to significant challenges and frustration when working with Rust.

But isn't that true for ANY change in language?

I program C like it's Fortran. Using DataBASIC requires a completely different way of thinking. Scheme, well... and of course there's Forth ... :-)

Even comparing Scots and English :-)

Cheers,
Wol

New Tools

Posted Sep 24, 2024 9:43 UTC (Tue) by LtWorf (subscriber, #124958) [Link] (7 responses)

> perhaps we can just never compile a buggy driver.

I'm not comfortable with the mental model that rust programs have no bugs to be honest.

New Tools

Posted Sep 24, 2024 10:14 UTC (Tue) by Wol (subscriber, #4433) [Link] (6 responses)

The computer is an idiot, it knows not how and when,
In fact the only thing it knows is 1+1 is 10

It's not that Rust programs don't contain bugs. It's that a Rust program is much more likely to do what you (thought you) told it to. The Rust compiler is very aggressive about coming back and saying "I don't understand", whereas the C compiler will just go off and do its best.

Doesn't stop Rust being full of "Do What I Mean Not What I Say" bugs ...

Cheers,
Wol

Localised bugs

Posted Sep 26, 2024 10:00 UTC (Thu) by gmatht (subscriber, #58961) [Link] (5 responses)

As I understand, bugs in safe rust are likely only to affect the module in which the bugs occur (and which call the affected module, etc). For example, if you had an intern who was a bit of a space cadet, you could get them to clean up the gazpacho soup module. Sure, the soup dispenser might start endlessly dispensing "trout a la creme", but an off-by-one error shouldn't cause the reactor to vent radiation into the crew quarters.

Localised bugs

Posted Sep 26, 2024 13:35 UTC (Thu) by pizza (subscriber, #46) [Link] (4 responses)

> but an off-by-one error shouldn't cause the reactor to vent radiation into the crew quarters.

If your reactor core is capable of venting radiation into crew quarters and/or your soup dispenser is connected to the reactor in any way, you have far more serious problems than the language the language used to write their respective control software.

Localised bugs

Posted Sep 26, 2024 14:15 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> If your reactor core is capable of venting radiation into crew quarters and/or your soup dispenser is connected to the reactor in any way, you have far more serious problems than the language the language used to write their respective control software.

Doesn't that describe the modern car, though? :-)

Cheers,
Wol

Localised bugs

Posted Sep 26, 2024 15:11 UTC (Thu) by pizza (subscriber, #46) [Link]

> Doesn't that describe the modern car, though? :-)

As someone whose $dayjob currently revolves around SoCs targeting next-gen Automobiles, yes and no.

Yes in the sense that these systems _may_ share common physical communication buses, but No in the sense that when they do, there are figurative (as well as literal) firewalls designed into the overall system to ensure only suitably blessed messages are acted upon by any given component.

That isn't to say that bugs can't occur [1], just that this class of bug is nearly always due to incorrect/incomplete specifications, typically due to poorly-thought-out scope creep [2], not traits of the language used to implement the specification.

[1] I recall reading that someone was able to trigger brake lockup on some Jeep models via their cellular modems
[2] eg by exposing what was once a completely private and trusted bus to the open internet with no authentication to enable remote start capabilities.

Localised bugs

Posted Sep 26, 2024 16:22 UTC (Thu) by james (subscriber, #1325) [Link]

For what it's worth, I think gmatht was referring to the British sit-com Red Dwarf, where they explored this management failing in depth.

Localised bugs

Posted Oct 14, 2024 9:52 UTC (Mon) by sammythesnake (guest, #17693) [Link]

I think AJ Rimmer's defence council described anyone who would craft such a situation as "a yogurt"...

Discussion results

Posted Sep 24, 2024 5:59 UTC (Tue) by dirklwn (subscriber, #80581) [Link]

There are some results from the discussion, already. A "kernel-docs: Add new section for Rust learning materials" patch [1] adding some of the collected resources to the kernel's Documentation has been posted. And in the Rust for Linux Zulip chat some proposals are shared how a "Linux kernel – From C to Rust for Linux" documentation [2] could look like.

[1] https://lore.kernel.org/rust-for-linux/20240922160411.274...
[2] https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/Documentation.3A.20From.20kernel's.20C.20to.20Rust.20For.20Linux/near/469300705