Locking

Posted Sep 24, 2024 6:58 UTC (Tue) by roc (subscriber, #30627)
In reply to: Locking by viro
Parent article: Resources for learning Rust for kernel development

> How would the compiler verify that a call you've added in the area under mutex will not lead to the same mutex being grabbed? If the answer is "it won't", you need to be able to see that without its assistance. The lack of explicit unlock makes that much harder to see.

As noted in this article, the kernel is already using scoped guards in C. E.g.:
https://github.com/torvalds/linux/blob/abf2050f51fdca0fd1...
so it's too late to require explicit "unlock".

Locking

Posted Sep 24, 2024 17:09 UTC (Tue) by viro (subscriber, #7872) [Link] (10 responses)

... and it's a trouble waiting to happen. Look around and you'll find all kinds of bad ideas, most with exact same reason - hard to reason about the program state. With varying unpleasantness - a few goto in a 10-line function could be just fine, but it can easily get much worse.

As those things go, guard() is pretty high on the "easily turned into a landmine" scale. scoped_guard() is saner, but it comes with its own set of headache sources - deeply nested structure can get confusing.

It's not about preserving some kind of purity; there's no such thing in the real world anyway. Bitrot _is_ a fact of life; there's no One True Tool/Style/Language that would prevent it. What matters is how brittle the thing is. Another fact of life is that trying to figure out a bug elsewhere will lead you through a lot of unfamiliar code (possibly - yours, but with detailed memories of the area swapped out of active memory; when there's a dozen of areas you need to get through, the latency of mental context switch can be very painful); you will need to make decisions about the next thing to look into, and do that based on the reasoning about unfamiliar code. It _can't_ be avoided; what matters is how hard will it be. And yes, that includes the need to modify something you've written ten years ago. It happens. And it's all about tradeoffs.

As for the __cleanup-based tools... in moderation it's fine, but blind use can lead to a lot of PITA. In particular, it comes with serious, er, adverse interactions with other tools. We'll see how bad a source of bitrot it will be; for now I'm very cautious about the cases where accidentally delaying the cleanup can cause problems, and locking is firmly in that class.

Locking

Posted Sep 24, 2024 19:44 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> ... and it's a trouble waiting to happen. Look around and you'll find all kinds of bad ideas, most with exact same reason - hard to reason about the program state.

Why?

The reasoning here is simple: everything is locked after the scope guard is taken until the end of the scope. And you can't access the protected data accidentally without taking a lock. You also can't forget to clean up the lock in "goto cleanup". Early returns are also not a problem anymore.

It does reduce the flexibility a bit, but hand-over-hand locking is pretty rare.

Locking

Posted Sep 24, 2024 20:02 UTC (Tue) by daroc (editor, #160859) [Link] (3 responses)

One also can drop the guard explicitly:

    drop(guard);

So I don't think it's less flexible, really. But, as in every discussion about programming languages, it's not really about what's possible so much as what the language makes easy or hard. I think it's a good point that Rust's locking is less explicit! That's certainly true, and it is a tradeoff. I (personally) think that the guarantees the compiler provides are worth it, but that doesn't mean we shouldn't acknowledge that less explicit locking does have downsides.

Forcing explicit destructuring

Posted Sep 24, 2024 20:19 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

There is some thought being put into whether something like undroppable types would be a useful addition to the language. These would allow Rust for Linux to have a scope guard that must be explicitly destroyed (probably via a fn unlock(self) function), and where dropping them without calling that function is a compile-time error.

Forcing explicit destructuring

Posted Sep 24, 2024 20:21 UTC (Tue) by intelfx (subscriber, #130118) [Link]

> There is some thought being put into whether something like undroppable types would be a useful addition to the language

So… true linear types (as discussed right in this comment section)? :-)

Locking

Posted Sep 25, 2024 17:54 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

One can argue that drop(guard) is just a funny way of spelling (the currently nonexistent) guard.unlock().

Locking

Posted Sep 24, 2024 20:49 UTC (Tue) by viro (subscriber, #7872) [Link] (1 responses)

... except that when you have several levels of nesting, finding the end of the right scope 80 lines down from where you've taken the sucker can be an interesting exercise. spin_unlock(...) is a lot more recognizable than one } in the series of 4.

BTW, do *NOT* mix that with goto-based cleanups without a lot of care. Any goto *into* that scope must come from that scope itself, which would be trivial if not for the fact that guard() can be not the first thing within the compound statement. In that case the scope extends from guard() to the end of compound statement. Now, it's very obvious that

if (condition) goto exit_something;
{
some_type var = foo();
....
exit_something:
something(var);
...
}
is a bug - no matter how liberal your interpretation of standard might be, on the path that takes this goto exit_something you have var used uninitialized. Anyone tempted to add such goto would see that this candidate failure exit is not suitable, no matter how similar it might be to what you want. And compiler will catch that if you try.

What's less obvious is that it applies to goto around the guard - there's no visible spin_unlock(...) in the end, but it is implicit and there's exact same kind of bug. Worse, gcc 12 and earlier does not even warn you - it goes ahead and produces broken code. clang does catch it properly, but neither gcc nor eyeball do.

So _if_ you use __cleanup-based cleanups, be very careful about labels in the scope - any goto from outside of scope to that will be trouble. With zero visible indication that such and such failure exit is *NOT* suitable to jump into from before the declaration that carries __cleanup on it. Gets especially nasty when the damn thing sits in the middle of function body, not nested at all. Do that and you've turned the goto-based cleanups you might have there into bugs.

It's not a theoretical concern - I've run into just that several time this cycle. Ended up with doing cross-builds on clang (and excluding the targets not supported by clang - thankfully, build coverage had not suffered in the places I needed to touch), but step into that while debugging something and forget about gcc missing such stuff and you are looking into a really fun debugging session...

In some cases it's a useful tool, but it's _really_ not something that could be used without (apriori non-obvious) care.

Locking

Posted Sep 24, 2024 21:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> ... except that when you have several levels of nesting, finding the end of the right scope 80 lines down from where you've taken the sucker can be an interesting exercise. spin_unlock(...) is a lot more recognizable than one } in the series of 4.

Here's a thought: why does that matter? Is that because the scope is too large? Then it's probably a good idea to extract the locked code into its own scope. In my code, I also try to take the lock at the beginning of a scope, so the entire scope itself becomes a visual indicator of what's locked. The kernel code style with 8-space tabs makes this less practical in C, but Rust is different.

And if you want to re-lock the object, you need to look up for the locks that were just taken, instead of down to see the unlocks. It's a bit different perspective, but it's about the same level of mental overhead.

> What's less obvious is that it applies to goto around the guard - there's no visible spin_unlock(...) in the end, but it is implicit and there's exact same kind of bug.

Yes, the two styles are completely incompatible, and it's better not to mix them at all in one function (perhaps, a checkpatch.pl rule?). Good news is that Rust will not allow this kind of error.

Locking

Posted Sep 24, 2024 20:53 UTC (Tue) by roc (subscriber, #30627) [Link] (2 responses)

The C cleanup-based guards were recently added, in 2023. So the decision was made, by Linus and others, that the advantages outweigh your concerns. I don't see why this needs to be relitigated for Rust.

Locking

Posted Sep 28, 2024 23:12 UTC (Sat) by viro (subscriber, #7872) [Link] (1 responses)

> The C cleanup-based guards were recently added, in 2023. So the decision was made, by Linus and others, that the advantages outweigh your concerns. I don't see why this needs to be relitigated for Rust.

A nice bit of MBAese, that... Opposition between "advantages" on one hand and "your concerns" on another is particularly charming - the former being implicitly objective and unchanging, while the latter - subjective and theoretical in the best case. Use of passive-with-reference-to-management syntax is also impressive. Well done.

Back in the real world, cleanup.h contains tools that are
1) potentially useful
2) experimental and not well-understood
3) demonstrably dangerous in incautious use - and that's demonstrably, not theoretically. Fresh example: https://lore.kernel.org/lkml/20240927-reset-guard-v1-1-29...

The tools in question have nothing to do with Rust, so their relevance in this thread is, indeed, questionable. Who'd dragged them in, I wonder... <checks> https://lwn.net/Articles/991431/ is where that had happened, so it would be yourself, wouldn't it?

Locking

Posted Sep 29, 2024 7:22 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> 2) experimental and not well-understood

Deferred cleanup has been used in C++ and Go for _decades_ and is very well understood. The main rule is: do NOT mix it with goto-based control. It can even be automated via a simple checkpatch.pl rule.

In that case, the of_node_put should also be rewritten into the guard-style cleanup.