Safety in an unsafe world

By Daroc Alden
November 5, 2024

RustConf 2024

Joshua Liebow-Feeser took to the stage at RustConf to describe the methodology that his team uses to encode arbitrary constraints in the Rust type system when working on the Fuchsia operating system (slides). The technique is not unknown to the Rust community, but Liebow-Feeser did a good job of both explaining the method and making a case for why it should be used more widely.

He began with the motivation for his presentation, which was netstack3, Fuchsia's new networking stack written entirely in Rust. The project started six years ago, and Liebow-Feeser led the project for four years. Networking stacks are "very serious", he said. They are responsible for almost all traffic, often implement dozens of different protocols, and are the first line of defense against attackers. They're also just plain large pieces of software.

Netstack3 encompasses 63 crates and 60 developer-years of code. It contains more code than the top ten crates on crates.io combined. Over the last year, the code has finally become ready to deploy. But deploying it to production requires care — networking code can be hard to test, and the developers have to assume it has bugs. For the past eleven months, they have been running the new networking stack on 60 devices, full time. In that time, Liebow-Feeser said, most code would have been expected to show "mountains of bugs". Netstack3 had only three; he attributed that low number to the team's approach of encoding as many important invariants in the type system as possible.

The method

This isn't a new idea, Liebow-Feeser said. Many people have tried to make it so that buggy programs simply don't compile. But the netstack3 team has a concrete, general framework for approaching this kind of design. He broke the process into three steps: definition, enforcement, and consumption. For definition, the programmer must take something that Rust can reason about (usually types) and attach the desired property to it. This is usually done via documentation — describing that a particular trait represents a particular property, for example. Then the programmer enforces the property by making sure that all of the code that directly deals with the type upholds the relevant invariant.

Critically, the programmer must at some point use field privacy or unsafe markers to ensure that these invariants cannot be broken elsewhere. For example, a structure with all public fields can be constructed from anywhere, so it isn't possible to rely on invariants tied to that type. If the structure includes a private field, it can only be constructed in that specific module, which means the programmer can audit all of the uses. When adding a private field isn't a solution, for some reason, Rust programmers can fall back on marking things unsafe, so that other programmers will hopefully read the documentation before using a structure or trait. Finally, other code can rely on the property as a guarantee, such as by making optimizations that require it.

In a codebase using this method, each guarantee will have a single Rust module that deals with the internals of the property. Then, the rest of the code can rely on the type checker to ensure that the property holds everywhere. For a more concrete understanding of what that looks like in practice, Liebow-Feeser introduced several examples.

The first example he gave was nearly trivial: a binary tree that needs to enforce ordering invariants between nodes. Of course Rust cannot natively check for this. So how might the programmer ensure that an invalid tree is never produced? Following Liebow-Feeser's process, they must first document the requirement. Then, they ensure that the code which deals directly with creating and modifying nodes in the tree respects the invariant. They must also ensure that relevant details are hidden behind private fields, so that code outside the module cannot interfere. Finally, the rest of the code can now rely on the fact that the tree is always in order.

That approach will be familiar to any programmer who has implemented a data structure. But the approach generalizes to more subtle properties, Liebow-Feeser said. His second example was actually from Rust's standard library. Rust promises thread safety, but the language itself actually knows nothing about thread-safety properties. Those are all implemented in the standard library.

One relevant property that the authors of the standard library would like to guarantee is that non-thread-safe closures are never passed to std::thread::spawn(). To do this, they created an unsafe trait (Send) that represents code which is safe to run on another thread. Send is unsafe not because it is inherently dangerous, but just because it represents a property that the compiler cannot check. The trait could then be used as a bound to restrict which functions can be passed to spawn(). Finally, while safe Rust code cannot implement the trait directly, the standard library adds various trait-inference rules and derive macros to let other code implement the trait. (In fact, this is a slight simplification because Send is an auto trait — an unstable feature that instructs the compiler to implement traits automatically for compatible types, although Liebow-Feeser did not mention this in his talk.)

Automatic deadlock prevention

The final example Liebow-Feeser gave was significantly more involved, and was taken directly from the netstack3 code. The networking stack needs to be multithreaded with fine-grained locking, for performance, he explained. Rust's existing libraries can guarantee that the code is thread-safe, but they can't guarantee that it won't deadlock. With so many developers working on such a large codebase, being able to know that the code won't deadlock is an important property that isn't easy to guarantee. Netstack3 has 77 mutexes, across thousands of lines of code. Its predecessor, netstack2 (implemented in Go) had a lot of deadlocks.

For the first step, the developers added names to each mutex, in the form of a generic Id type parameter. The relevant IDs are all zero-sized types, so they don't exist at run time, and have no run-time overhead. Then, they defined two unsafe traits: LockAfter and LockBefore. These represent exactly what they sound like — for a specific mutex to be locked after another mutex, LockAfter must be implemented for the relevant types. They added a derive macro, so that there's little boilerplate needed to add to each ID type, and added blanket trait implementations such that if there is a cycle in the graph of locks, it results in a compile time error because of overlapping trait definitions.

In order to actually make the existence of the locking traits useful, however, the developers also needed to add methods that use them. In this case, they created a new Context type that carries around one of the ID types. The lock() method on their mutex type takes a mutably borrowed context, so the original context cannot be used until the mutex is unlocked. At the same time, it provides a new context with the ID of the lock, which can only be used to lock mutexes with the correct LockAfter implementation.

So, from the perspective of all of the code outside the module that implements this, mutexes can only be locked with an appropriate, un-borrowed context object. The context objects impose a global ordering on how mutexes can be locked, and attempting to add an incorrect LockAfter implementation (one that would permit a cycle) is a compile error. Programmers are free to lock whatever mutexes they can get past the compiler, secure in the knowledge that this can't cause a deadlock. In turn, this makes it easier to justify adding more fine-grained locking to the implementation. At run time, all of the type-level machinery associated with this guarantee has been compiled out, so there is no run-time overhead.

Conclusion

In practice, there are actually some problems with the simplified deadlock-prevention example as presented, Liebow-Feeser said. But generally, this ability to take a property that the language does not know about and "teach" it to Rust, so that now it is enforced at compile time, is why he likes to call Rust an "X-safe" language. It's not just memory-safe or thread-safe, but X-safe for any X that one takes the time to implement in the type system.

He concluded his talk by asking people to try this out for themselves, especially in new domains that nobody has tackled yet. He encouraged people to think of functions that panic or return an Option as a code smell — those are all places where the code could encode the necessary invariants at compile time instead. He called on the audience to "make your APIs exactly match the structure of the problem". At the same time, he cautioned people that the appropriate solution may not be the same every time. For deadlocking in netstack3, they used traits and trait inference rules to ensure that there are no cycles. But not every circumstance will warrant the use of traits; the important thing is to follow the method he laid out.

Liebow-Feeser thinks "this can reshape how we do software engineering". There are lots of domains where correctness is important, and successfully scaling software in those areas will require making it possible for tools to verify that correctness, he said. Whether his prediction is right remains to be seen — but in any case, the method he spoke about seems like a nice framework to unify different techniques for ensuring program safety.

Index entries for this article
Conference	RustConf/2024

Abstract code and higher-order functions instead of loops

Posted Nov 5, 2024 16:09 UTC (Tue) by taladar (subscriber, #68407) [Link]

Another powerful way to limit how your code can fail in a strict language like Rust or Haskell is to make it more abstract.

More abstract code can only use the operations provided by the traits specified so it can be a lot easier to reason about it, e.g. code taking a T and returning a T can only really return the T it was given, it can not contain a literal, unlike code taking a String and returning a String.

The same can be said for higher-order functions like map or filter which are a lot more limited in ways they can fail when compared to a general loop. A map call always returns the same number of elements and every element is derived from its respective input element. Filter can never produce additional elements, only the same number or less.

These two methods can be combined of course by making very abstract higher order functions that have very limited ways they can fail because they know very little about the data they work with and concentrate the code that needs to work with the concrete or a less restricted type in the function passed into the higher-order function.

Direct links

Posted Nov 5, 2024 20:57 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (68 responses)

Some helpful links for the article:

Example of lock ordering (LockBefore):
https://cs.opensource.google/fuchsia/fuchsia/+/main:src/c...

And here's the lock hierarchy definition:
https://cs.opensource.google/fuchsia/fuchsia/+/main:src/c...

This is pretty neat!

Too complex for my small brain

Posted Nov 6, 2024 6:33 UTC (Wed) by mb (subscriber, #50428) [Link] (67 responses)

TBH this code looks pretty much write-only to me.

For me, there is a point where generics turn from something useful into something that has too many dependencies and rules to constantly think about.

Also, the almost complete lack of comments doesn't help it.
Comments are extremely helpful to partially mitigate the complex generics effect.

From what I have seen here, I would very much prefer something like Linux lockdep over this trait/generics based system.

(Context: I develop a lot of code written in Rust).

Too complex for my small brain

Posted Nov 6, 2024 7:03 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (63 responses)

It looks a bit complex, but actually makes sense once you understand the overall logic.

But I agree, netstack3 code needs a LOT more comments, in general.

> From what I have seen here, I would very much prefer something like Linux lockdep over this trait/generics based system.

This does have the advantage of working during the compile time. I'm also not sure that the extra complexity is justified, but on the other hand, it's probably the most dangerous part of the system. Everything else nowadays depends on the network stack, so switching the development to "you not only have to write something, but also _prove_ that it's correct" might make sense. It's an entirely different attitude to development, and we'll see if it pays off.

Too complex for my small brain

Posted Nov 6, 2024 8:11 UTC (Wed) by taladar (subscriber, #68407) [Link] (2 responses)

> something that has too many dependencies and rules to constantly think about.

The thing is, with many problem domains those rules exist anyway and breaking them leads to very real bugs. I'd much rather have a compile error to work through while writing the code than some deadlock to debug in production months later (when I am far less familiar with the code and its intricacies)

Too complex for my small brain

Posted Nov 6, 2024 8:29 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

In practice, lockdep will likely quickly uncover most of the issues, for such well-exercised code as the network stack.

Too complex for my small brain

Posted Nov 7, 2024 5:43 UTC (Thu) by ssmith32 (subscriber, #72404) [Link]

Yes, but removing the "likely" and substituting "all" for "most" in your statement is quite the trick, and a world of difference.

Anti-lock brake system software that "likely" will be available when you need it, isn't quite as reassuring as software that will..

Too complex for my small brain

Posted Nov 6, 2024 15:54 UTC (Wed) by mb (subscriber, #50428) [Link] (59 responses)

I think the advantage of this complex compile time checking over lockdep is not that big.
Lockdep is not a simple runtime check. It is a rules sanity check. The deadlock does not have to actually happen to be found by lockdep. Each lock involved just has to be executed once (in each possible context, which usually is one or two contexts). Even basic coverage testing will achieve this.

Too complex for my small brain

Posted Nov 6, 2024 18:34 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (58 responses)

I think the main advantages could be summarized as follows:

With typestate locks, it is not possible to compile code that can deadlock. You don't have to run tests or exercise the locks, much less have 100% test coverage. This is a marginal benefit in cases where you control all of the code that will interact with the lock (you presumably want to have 100% test coverage anyway), but it's a much bigger deal if you want to expose the lock to foreign code, which might not even know that you have a lockdep-like tool to test with. Since the restriction is encoded into the type system, there is simply no way for foreign code to ignore it - it has to either respect the rules, or explicitly call a manifestly unsound "escape hatch" function (if you provide one).

This is obviously harder if you want to expose the lock to non-Rust foreign code, because then you're probably talking to it through a C-like ABI, and that is not going to support borrow checking or typestates. But a typestate API can provide a mixture of static and runtime checking, and allow you to pick and choose which is appropriate for a given situation. If your traits are object safe, this can be as simple as using dyn, but I tend to assume that most practical typestate traits will not be object safe, so you probably have to write a little bit of boilerplate type erasure logic by hand. Whether the result is more or less convenient than using lockdep is probably going to depend on a lot of specifics (such as whether or not the foreign code is directly acquiring the lock itself, or calling back into first-party code to acquire the lock, since the former probably requires lockdep instrumentation that may not be present).

Too complex for my small brain

Posted Nov 6, 2024 21:27 UTC (Wed) by khim (subscriber, #9252) [Link] (57 responses)

Yes, but why do you need such a complicated system with lots of types and so many movings parts?

One can imagine much simpler way of doing things:

First pass information about currently taken locks to the function. That could be simple priority value attached to the lock.
Wrap all the locks that you deal with in a simple type which takes that information and verifies that you don't have locks talen with wrong priority (and generate the compile-time error if that happens).
Ensure that each time you take the lock you also adjust the information about what you have taken or not.

And voila: no need for the central registry of all the locks in you program, no need to care about how arbitrary locks from different subsystems.

You can even add an “escape hatch” function that allows you to restore information on the call to C and back (unsafe, of course).

Very roughly this would look a tiny bit like this:

struct TakenLockPriority<'a, T: ?Sized, const PRIORITY: usize> {
    phantom: PhantomData<&'a mut T>
}

struct PriorityMutex<'a, T: ?Sized, U, const PRIORITY: usize> {
    previous: PhantomData<&'a mut U>,
    mutex: Mutex<T>,
}

impl<'a, T: ?Sized, U, const PRIORITY: usize> PriorityMutex<'a, T, U, PRIORITY> {
    fn lock<'b, 'c, V, const PREVIOUS_PRIORITY: usize>(
        &self,
        _previous_priority: PhantomData<&'c mut TakenLockPriority<'b, V, PREVIOUS_PRIORITY>>
    ) -> (TakenLockPriority<'c, Self, PRIORITY>,
          MutexGuard<'_, T>) {
        const {
            if PREVIOUS_PRIORITY >= PRIORITY {
                panic!("Improper use of lock is detetected")
            }
        }
        (TakenLockPriority::<'_, _, PRIORITY>{phantom: PhantomData::<&mut _>},
         self.mutex.lock().unwrap())
    }
}

I guess one may need to think about and add some macros to make the code a bit more readable, but the idea is simple: don't try to build hierarchy of lock types, make sure developers themselves may organize these locks that they care about and then only resolve conflicts when they arise.

This sounds much closer to how Linux is developed and how lockdep is used, the only big difference is that checking of invariants is moved from the runtime to compile time.

Why wasn't such an approach taken, but instead centralized system with registration of types (which also all need to have a name!) was used?

Too complex for my small brain

Posted Nov 6, 2024 21:39 UTC (Wed) by mb (subscriber, #50428) [Link] (3 responses)

>no need for the central registry

The priority is from a globally managed list, right? How would you know your priority otherwise?

Too complex for my small brain

Posted Nov 6, 2024 21:52 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

Priority is just a number in that scheme. If one lock have priority 1 and the other have priority 2 then you can only take them in one order (in my example 2 can be taken if 1 is taken or not, but if you already took something with priority 2 then taking another lock with 1 is forbidden).

If locks are never taken together then their priority doesn't matter and if you have more or less important components then you can delegate ranges of priorities to them.

Of course some coordination is needed, it's not a complete chaos, but there are no central repository which knows about all the locks in the system (only compiler knows everything and you only have to act, somehow, when it complains) and I think it's important to keep it like that in large projects like kernel.

P.S. All that machinery if, of course, built on zero-sized types, too, which means it would disappear after compilation. But there are relatively few of them and no complex hierarchies between them.

Too complex for my small brain

Posted Nov 15, 2024 19:56 UTC (Fri) by sammythesnake (guest, #17693) [Link] (1 responses)

Wouldn't that "priority" be essentially the same as the depth into the Directed Acylic Graph of lock dependencies? How would one calculate the priority without that graph?

AFAICS, the best you could hope for is that the DAG could be split into unconnected pieces where locks don't have dependencies, but that's likely to end up being something like one registry per subsystem in the kernel, n'est-ce pas?

Too complex for my small brain

Posted Nov 15, 2024 21:22 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> Wouldn't that "priority" be essentially the same as the depth into the Directed Acylic Graph of lock dependencies? How would one calculate the priority without that graph?

Correct. However, because the graph is static and mostly stable, you can just have a locking hierarchy described in something like a DOT language and code-generate a "header" file with priorities.

You don't need to use the trait resolver machinery for this, so the resulting Rust code is more concise and easier to understand. I'm actually also now convinced that this approach is more practical.

> AFAICS, the best you could hope for is that the DAG could be split into unconnected pieces where locks don't have dependencies, but that's likely to end up being something like one registry per subsystem in the kernel, n'est-ce pas?

Yes. But then how much does this matter in practice?

Too complex for my small brain

Posted Nov 6, 2024 22:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (51 responses)

> struct TakenLockPriority<'a, T: ?Sized, const PRIORITY: usize> {

Ahhh... Smells like good old IRQL from Windows!

The problem with this approach is that you need to have complete lock ordering. The netstack3's approach only requires partial ordering, so it can work across multiple lock hierarchies.

Your code can be extended by making multiple priority types (we called that "colored priority"), but then you also can get into corner cases where you need to express something like "RED 5" needs to be locked before "BLACK 4". It's trivial with netstack3's approach, but not possible with yours.

Too complex for my small brain

Posted Nov 6, 2024 23:12 UTC (Wed) by khim (subscriber, #9252) [Link] (50 responses)

> Your code can be extended by making multiple priority types (we called that "colored priority"), but then you also can get into corner cases where you need to express something like "RED 5" needs to be locked before "BLACK 4".

It's trivial if you can redo either RED or BLACK and add more colors.

> It's trivial with netstack3's approach, but not possible with yours.

Because of orphan rules it's only “trivial” when you have control over either RED or BLACK side. In that case you may as well do aforementioned renumbering.

And, worse, because of the orphan rules you would face serious pain way before that point: if two components RED and BLACK ever interact in some other, BLUE component then, because of orphan rules, netstack3 approach would force you to either make BLUE depend on RED or the other way around.

Which, basically, means that while my approach, indeed, breaks apart, at some point, it happens much farther then in netstack3 approach.

Worst-case scenario, when you have two components that couldn't be touched you may always add additional mapping to reorder numbers in that one const block that compares them.

Also, keep in mind that, unlike Windows IRQL checks all these verifications happen in advance which means that while yes, fixing these may not be a trivial exercise in both cases it happens before you run any tests and before something fails on end-user system.

Too complex for my small brain

Posted Nov 7, 2024 6:13 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (15 responses)

> Because of orphan rules it's only “trivial” when you have control over either RED or BLACK side. In that case you may as well do aforementioned renumbering.

If levels 1, 2, and 3 already exist, I can't just invent 3.5 and shove it in the middle, since usize is an integer.

Yes, we could use f64 instead. But that might not be the best idea, since compile-time floats might not be wholly consistent with runtime floats. Instead, you'd have to do something dumber, like saying that the original hierarchy starts at 1000 and each subsequent level is 1000 after the previous one, and then tell downstreams to use smaller intervals (but still don't go all the way down to 1, just in case a downstream-of-downstream wants to put more numbers in between your numbers). Of course, some mildly popular crate will fail to read the instructions and produce inseparable lock levels, and then we'll have to kludge together a workaround because upstream stopped reading bug reports four years ago.

Or we can skip all this nonsense and just use a partial order with a pair of traits, like sensible people.

Too complex for my small brain

Posted Nov 7, 2024 12:03 UTC (Thu) by khim (subscriber, #9252) [Link] (14 responses)

> If levels 1, 2, and 3 already exist, I can't just invent 3.5 and shove it in the middle, since usize is an integer.

I have only used integer in my hastily written example. Zero-sized type may carry arbitrary payload thus you would, probably, use enum for “global” part and simple number for a local part.

Or you may even just replace that part that does checking with a linker-level tricks that would dump information about licks that is passed around in arguments to the object files (Linux kernel already does many similar tricks).

Then there would be three modes:

No checking production mode: all verification types are the same, compiler doesn't do any checking, but as long as checking mode have the same source for the locking function it would work in the same way.
Checking mode: locks have priorities assigned to them, compiler does all the required checking, but there are lots of identical functions produced that are then merged by linker
Priority-generation mode: relationship between locks is transferred into the object file, special tool pulls it from the compiled code and then priorities are automatically assigned if possible, if not possible then error is detected and tool presents two code paths that could lead to a deadlock.

Last one acts like lockdep, only everything happens compile-time (but it's very hard to pass it through C layer, which is also a significant limitation).

> Or we can skip all this nonsense and just use a partial order with a pair of traits, like sensible people.

How is it sensible? It's slower, it breaks apart much faster and it makes things that are complicated with const approach even more complicated!

It would have worked Ok with C++, with templates that are type-checked after instantiation and where one doesn't need to specify relationships between types upfront, but with Rust's traits your every function have to carry description of what locks it may, potentially, use, including transitive usage and if you change something in that scheme these relations are affecting all the functions in the call graph!

Why do you think they have proudly declare that they have “77 mutexes, across thousands of lines of code”? Because in that structure addition of one new lock type (by splitting one of them) or removal on one type of lock (by merging two locks) is a huge ceremony that virally affects dozens of functions!

Not only you may have trouble with “bad refactorings” that need to shove level 3.5 between 3 and 4, no, now you have an exact same way before, when in const approach you would just effortlessly split priority 30 into two (30 and 35, e.g.) in a traits-based approach you need to refactor everything that ever touches your lock with priority 30 in your whole codebase!

The whole discussion is, frankly, insane: you are pointing to deficiencies in one approach that would be seen after you would add thousands of locks in hundreds of components and advocate another approach which would fall apart under its own weight much faster!

Kinda like: if we would use a ship then after thousand miles we may lose an orientation and after ten thousand miles we would be hopelessly lost, let's go on foot, instead, then we would never be able to walk thousand miles and wouldn't have these problems at all.

How can that logic ever be called “sensible”?

I can understand situation where different approaches have different trade offs and then you pick the one that suits your needs better… but to point out on the deficiency of one approach and then go with the one that have this exact deficiency, only worse… makes no sense to me, sorry.

Too complex for my small brain

Posted Nov 7, 2024 13:42 UTC (Thu) by daroc (editor, #160859) [Link] (13 responses)

Without replying to the majority of your comment, I would like to clarify one aspect of Fuchsia's solution that I failed to explain adequately in the article:

Because of generic parameters, changing the lock hierarchy is not viral to every function affected. A function can take "any Context type that is permitted to lock type X" as a parameter, and such functions don't need to be changed when the hierarchy is changed — unless their caller would no longer be able to lock things of type X without a potential deadlock, which is sort of the whole point of the system.

Too complex for my small brain

Posted Nov 7, 2024 15:28 UTC (Thu) by khim (subscriber, #9252) [Link] (12 responses)

> unless their caller would no longer be able to lock things of type X without a potential deadlock, which is sort of the whole point of the system

But that's precisely the issue and it's where lockdep/C++ templates/Rust const solutions differ from traits-based solutions!

Traits work well if you can easily say whether caller is able to lock things of type X or not… but in complicated systems like a kernel the answer to that question is, very often, not “yes” or “no”, but “it depends”.

Quite typical issue in kernel is question of “is this allocation allowed to sleep”. Because, well, certain memory allocations need to do that (to pull memory from swap) and certain ones couldn't do that (because they are needed to enable the ability to pull memory from swap) and many-many-many parts are shared between these contexts (imagine swap attached over network, for example).

Of course if the answer to the question is caller able to lock things of type X or not is fully dynamic then static solutions couldn't handle anything there.

But with C++ templates it's trivial to move that information to Context type: just make you function accepts arbitrary Context, then use if constexpr to probe it for it's ability to use locks of certain kinds… and done. If you found yourself in a crazy convoluted place where nothing can work reliable – call the static_assert(false); and ask the developer to untangle that mess.

If you go with Rust's const-based solution then it becomes more convoluted, because if const in Rust is more limited that if constexpr in C++, you have to invent certain contortions for the code that would be compiled but would never be executed (like in C++14 world), it's not as pretty as in C++ or Zig, but it's still doable.

But with traits-based solution… I could see no way forward at all. Even if we would ignore complicated cases where we want both to be able to swap out the network stack to SSD and swap out SSD access somewhere code over the network in the same kernel… how can we create two modules which support both combos? One of them would have call from SSD-capable code to network-capable code deep in the bowels of everything, the other would have call from network-capable code to SSD-capable code deep in the bowels of everything… but what about generics definitions of everything in the middle? How would they work to support both capabilities?

Too complex for my small brain

Posted Nov 7, 2024 19:42 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (11 responses)

The trick with traits is to move conditional logic inside of a method on the trait. If the trait can do X, and doing X would help, then the method does X, and otherwise it does Y (e.g. in the default implementation of the method). This is all statically dispatched and monomorphized, so the compiler can inline it if doing so is performant.

Too complex for my small brain

Posted Nov 7, 2024 20:15 UTC (Thu) by khim (subscriber, #9252) [Link] (10 responses)

> The trick with traits is to move conditional logic inside of a method on the trait. If the trait can do X, and doing X would help, then the method does X, and otherwise it does Y (e.g. in the default implementation of the method).

Except that's how can you do things in C++ or Zig. Not in Rust.

In Rust only trait declarations may place restrictions on method. You couldn't add extra restrictions (or relaxations) in the particular implementation.

The way out? I know of only one: move you checks into const block.

Sure, it's not as elegant as in C++ or Zig, it's downright ugly, I would say… but that's the only “way out” that I know.

> This is all statically dispatched and monomorphized, so the compiler can inline it if doing so is performant.

Can we do some practical excercise? So I have something like this:

fn foo000() {
  …
  if lots-of-memory {
    foo001(…, advanced_alloc);
  } else {
    foo001(…, primitive_alloc);
  }
  …
}
fn foo001(alloc: & impl Allocator) {
  …
  foo002(…, alloc);
  …
}
…
fn foo999(alloc: & impl Allocator) {
  …
  // Here I want to use `bar` if allocator have certain properties or `baz` if it's not as advanced
  //if allocator-is-advanced
    bar(…)
  //else
    baz(…)
  //end
  …
}

The important thing here is to make a decision at the very top (in Linux that wouldn't be arbitrary decision, you would inherit from the environment, it may be sleepable state or non-sleepable state, preemprtable/non-preentable, etc) and it's used somewhere at the very bottom without being textually visible in the middle (because conditions may change and even set of conditions may change, but the majority of the code doesn't really care whether you may allocate memory for a hashmap or if you are in context where you need to use inefficient algorithm because you couldn't do allocations there) and, also importantly, it should be detectable, at compile time if I mess up in the foo999 function and call the wrong algorithm.

How does it work with traits? I can see it easily being doable in C++/Zig because their metaprogramming uses duck-typing. I can cobble crazy-but-working solution with some macros and const in Rust… but I have no idea how to do something like that with traits.

Note that I'm not “making up” that need! Rust is literally build up on top of that capability… which standard library liberally uses – but doesn't give it to the end users. It's called specialization and doesn't cover the full set of things that you can do in a C++/Zig, but it's very important to Rust, without some of these tricks (that are not easily implementable with just traits and are barely implementable with const checks) it would be a different language.

That's where Rust heritage shines through: it used something nicely sounding in theory, found out that it doesn't work in practice and after huge fight relented just enough to make it possible to barely squeeze kida-sorta-workable solutions (whereas C++/Zig have full support for pretty usable metaprogramming without that theoretically-nice-yet-practically-very-cumbersome baggage).

If that's the wrong interpretation of what actually happened then why is standard library full of these tricks that are not allowed to be used by mere mortals?

Too complex for my small brain

Posted Nov 7, 2024 23:23 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (9 responses)

What you have missed is that impl in parameter position (your &impl Allocator) desugars into parametric polymorphism. That is, your methods all desugar into the following (but note that the turbofish syntax shown in the comment is not legal unless you actually write out the desugared polymorphic version instead of &impl Allocator):

fn foo001<A: Allocator>(alloc: &A) {
  …
  foo002(…, alloc);  // type inferred as foo002::<A>(..., alloc)
  …
}

As a result, the type of A is known at compile-time, and so it is straightforward enough to do this:

trait Derived: Allocator{  // Ideally this would have a better name, which should be related to the bar_or_baz() method.
    fn bar_or_baz(...){
        baz(...)
    }
}
impl Derived for AdvancedAllocator{
    fn bar_or_baz(...){
        bar(...)
    }
}
impl Derived for SimpleAllocator{}  // Uses default implementation of method
fn foo001(alloc: &impl Derived){ ... } // etc.
fn foo999(alloc: &impl Derived) {
    alloc.bar_or_baz(...);
}

Of course, if you're trying to write code that works for every possible Allocator, then this is more difficult, because the blanket impl of Derived will conflict with the type-specific impl. I'm not entirely sure that can be done in a sound way without using type erasure (dyn), which definitely can solve this problem (for most "reasonable" types) if you're willing to accept the minor overhead of an additional vtable (using something like Any::downcast_ref()), and I suspect the compiler might be able to remove that vtable if it is as simple as this example (if you box up your trait object into &dyn Any as late as possible, it seems like the compiler should be able to infer that the downcast always succeeds or always fails for a given monomorphization, and remove the entire detour through Any altogether). But if you only have two allocators, then I would just write two impls and call it a day. No need to generalize if generality is not required for a given application.

Too complex for my small brain

Posted Nov 7, 2024 23:46 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Of course, if you're trying to write code that works for every possible Allocator, then this is more difficult, because the blanket impl of Derived will conflict with the type-specific impl. I'm not entirely sure that can be done in a sound way without using type erasure (dyn),

(Note that it *will* be possible once specialization, or at least min_specialization stabilizes, but that might not be for a while.)

Rust specialization problems with lifetimes

Posted Nov 8, 2024 11:27 UTC (Fri) by farnz (subscriber, #17727) [Link]

Specialization is currently blocked (in all forms) because of lifetimes; currently, Rust requires that lifetimes do not influence runtime behaviour, but it's turning out to be really hard to implement specialization in a way that complies with this.

There's two reasons Rust wants to avoid having lifetimes influence runtime behaviour:

Right now, given a function of the form fn foo<'a, 'b, 'c: 'b + 'a>(ref1: &'a T, ref2: &'b T) -> &'c T, the compiler can assume that there will only be one monomorphization of the function for all possible concrete lifetimes substituting for the type parameters 'a, 'b and 'c. If lifetimes affect runtime behaviour, this becomes untrue, and a lot of work has to go into making sure that the compiler generates a minimal set of implementations (since chances are high that every call to foo will have different lifetime parameters).
There's no current need to determine the correct concrete value for a lifetime parameter; you can treat it as a constraints problem, and confirm that the return value of foo(&one, &two) lives for less time than one and two, without caring how long it actually lives for. If lifetimes affect runtime code, then this stops being true, and a lot more work has to be done to make sure that you've chosen the correct lifetime for the return value of foo(&one, &two).

And as a corollary of the above, if you "simply" ignore the problem, you're able to come up with routes to write a safe function with signature fn foo<'a, T>(foo: &'a T) -> &'static T that simply casts away the lifetime parameter, resulting in unsoundness problems. The standard library handles this by promising not to have specializations that let you write this sort of function.

Unfortunately, this makes safe specialization a hard problem; we need the chosen specialization to be the same regardless of lifetimes, which implies that any lifetime parameters must be unconstrained (since if the lifetime parameters are constrained, we can use a specialization to change runtime behaviour based on lifetimes), but lifetimes can be hidden in generics. For example, impl<T> Foo for (T, T) has a hidden constrained lifetime in it; if T is in fact &'a str, then both elements of the pair have lifetimes that are constrained to be the same modulo variance, but we've just said that to avoid runtime behaviour depending on lifetimes, we need the lifetimes to be unconstrained.

As a consequence, I would not expect specialization to stabilize any time soon; it's not clear how to address this such that you get both clear code (e.g. you should be able to have both impl<T> Foo for T and a specialization impl<T> Foo for (T, T), without needing special syntax to say "T can't hide a lifetime constraint") while also avoiding the compiler's reasoning for choosing a specialization from being obscure - you don't want the specialization on pairs to be ignored for ("foo", "bar"), for example. There is a trick with autoref and macros you can use in some cases to get specialization, but it's not usable in all cases, not least because it doesn't act as a trait bound.

Too complex for my small brain

Posted Nov 8, 2024 3:20 UTC (Fri) by khim (subscriber, #9252) [Link] (6 responses)

> Of course, if you're trying to write code that works for every possible Allocator, then this is more difficult, because the blanket impl of Derived will conflict with the type-specific impl.

In practice that's not the main concern: usually, in such cases, you have fixed (and not large) number of cases that you need to support. Much bigger issues is the fact that with some allocations you just simply couldn't support some cases (but would happily support others) and, even more importantly, the set of operations that you want to handle differently (or, maybe, not handle at all) is the most fluid part of the whole story.

I may need/want special case because Intel decided that saturated operations should be provided for bytes and 16bit words, but not for doublewords or quadwords, or you may special-case some algorithm because it doesn't work with too small precision of fp16 or, in kernel, you may support some operations with a filesystem in one context (because they are needed to support swap file), but not in the other context (because these would require swappin)… and bam:

fn foo001(alloc: & impl Derived) { ... } // etc.
fn foo999(alloc: & impl Derived) {

Suddenly internal details and limitations of my very lowest level functions are plastered all over the whole stack between foo001 and food999.

It's like hated Java checked exceptions, only in reverse! And yes, the whole point of the excercise, the whole issue that I complain about is that same need to plaster all these tiny details in all intermediate steps! I know that impl is a generic in a disguise, thank you very much, my point is that I could't keep internal details of that type hidden from the intermediate layers (without certain tricks, at least).

I couldn't just simply doing small alterations somewhere down below and see if that would be enough to connect on the top, even the tiny, minor changes require massive code rewrites.

How can you sat that “traits are not viral” if any simple tiny change in low-level have to propagate through the whole stack from the bottom to the very top?

> I'm not entirely sure that can be done in a sound way without using type erasure (dyn), which definitely can solve this problem (for most "reasonable" types) if you're willing to accept the minor overhead of an additional vtable (using something like Any::downcast_ref())

Type erasure would defeat the point of the whole exercise: you couldn't have dyn Trait half-implemented. You could have impl Trait half-implemented by calling __this_code_shouldnt_be_used__ function (that would cause link-time error), but because link-time error is happening too late to be supported well by IDEs you need another thing that would stop you… const check works fine.

You just create trait MyCleverType, add hundreds of functions to it (everything you can switch on at the very bottom, essentially), give them the default implementation that calls __this_code_shouldnt_be_used__ and a constant that notifies the user about actual availability of function, then add some macros on top of that… and voila: you have very-very rough and limited approximation of what literally all modern languages (C++, Go, Java, Python, whatever) have “out of the box”.

> But if you only have two allocators, then I would just write two impls and call it a day. No need to generalize if generality is not required for a given application.

Oh, absolutely. Here I agree 100%. You couldn't really support open-ended, unbounded, set of limitations and complicated rules which would determine which combinations are allowed and which are disallowed. Usually it's properties that lead to 2-3-4 choices and not more… but then you also have combinations of these properties – and that either leads to traits which have hundreds of possibilities or, alternatively, you end up with hundreds of traits that are attached to all your types and your code is filling with annotations that are entirely irrelevant on that level of abstraction.

Or, as I have said: you just [ab]use traits and const expressions to provide something that all other modern language give you without any heroic efforts. Sure, only C++ and Zig give you the ability to do compile-time checks, most language make reflection a runtime event and I'm glad that Rust at least gives you enough kludges to create similar system, but still… I couldn't look on all that and just wonder: why? Just, really, why do I have to have all that complexity to combat system that was supposed to “simplify” things?

P.S. I understand perfectly why Rust designers have started with assumption that everything in the world is total and fully additive: dealing with such world of types is easy. But I just hate that when they faced reality and have found out that this imaginary unicorn-style world just simply doesn't work even for standard library… they just added some few opt-out features that made it possible to solve issues they had… and then ignored the fact that the ability to avoid couple of allocations in some cases is not the worst case of non-totality that one may face in software design. Our programs are full of non-total implementations of things and the language that doesn't handle that would just simple be unusable… thankfully, today, after some pushing, Rust have gotten enough kludges to handle that limitation of it's design in not entirely insane fashion… but still I'm not sure if that was good thing to fight for or not: on the other hand the fact that you need to just throught so many hoops to implement non-total things means that people are at least thinking about how they could approach the problem by simply using total functions and not leaving landmines in their design everywhere… on the other hand this made so many things so complicated… ugh.

Too complex for my small brain

Posted Nov 8, 2024 3:55 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (5 responses)

> How can you sat that “traits are not viral” if any simple tiny change in low-level have to propagate through the whole stack from the bottom to the very top?

Well, first, I'm pretty sure I never actually said that, but regardless: The point is not that you make a million different traits for a million different foo_or_bar() methods, and constantly have to figure out which trait you need. The point is that you make *one* trait (and possibly one generic wrapper struct), give it whatever methods it needs to have, and use it everywhere. This is viral, but it's only one type, so it maintains proper encapsulation and the rest of your code has no need to know anything other than "this is the trait we use instead of Allocator."

Too complex for my small brain

Posted Nov 8, 2024 11:54 UTC (Fri) by khim (subscriber, #9252) [Link] (4 responses)

> Well, first, I'm pretty sure I never actually said that.

We are in subthread that discussed this particular point: because of generic parameters, changing the lock hierarchy is not viral to every function affected., thus I assumed that we are discussing precisely the “virality” of different solutions.

But you are right, if you ignore the context of the discussion then you answer to my question is sensible. I would have given the same one if question was asked outside of that context.

But discussing things if you have to always repeat all the minutiae details of all the previous messages and all history of the discussion in every new message is just too tiring, I'm not sure I can do that.

> The point is that you make *one* trait (and possibly one generic wrapper struct), give it whatever methods it needs to have, and use it everywhere.

Yes. That's the point of generics. That's how generic types and functions are supposed to be developed and used, but that's not how locking schemes in Linux are developed and used, that's not what lockdep is enforcing and thats, in the end, what Ted Tso complained about in the, now infamous, episode.

From what I understand Ted demanded precisely that same fluidity that I'm desiring here about, too: splitting locks, merging locks, changing the rules about how different locks are held is just normal part of the Linux development process (you may imagine all these locks and spinlocks as descendants of the infamous big kernel lock that was split into smaller and smaller pieces).

And if the only option available and presented to Ted was “do as we demand or die” (define rules once and never change them, essentially) then I can understand the reaction.

> This is viral, but it's only one type, so it maintains proper encapsulation and the rest of your code has no need to know anything other than "this is the trait we use instead of Allocator."

It's “not viral” and “it's only one type” only if one can produce and fix the rules once and for all and don't want to ever change them.

But kernel development is the exact opposite! Rules are constantly changing and while code still need to follow “the latest and greatest thing” when they change you really don't want to change your whole codebase if you decided that it would be better to take two locks in a different order. You want to change these 2, 3, or 10 functions that actually take these locks and you don't want to reannotate everything when rules changing. Just read about how lockdep works: you annotate your assumptions about how locks work in places where you need them – and lockdep verifies how the whole things works.

And now we want to use this same approach in Rust (remember where he started: I would very much prefer something like Linux lockdep over this trait/generics based system.

I'm not sure it was “virality” that mb complained about or not, but for me that virality in extra-rigid locking enforcement scheme is the biggest turn-off.

Using types with meta-information erased after compilation would have worked perfectly with C++ templates or Zig comptime but it's almost entirely incompatible with Rust's generics.

Almost, but not quite: as I have shown you can create lockdep-style system in Rust (and we are discussing that same exact plan here)… you just have to be creative and use const checking instead of traits-checking.

And now you do the full circle and say: hey, it's all is just simple, you just have to shove you 10+ (20+, 30+) years of experience and do things in the “proper”, Rust-compatible way, they everything would be great.

And that kind of discussion-ignorance logic is used by Rust-in-Linux developers routinely then now I can understand where Ted's outburst comes from.

If someone would have tried to change the way I am doing things and the only justification presented was that it's for the tool that I'm not using and don't plan to use… I would have been pretty incensed, too!

Too complex for my small brain

Posted Nov 8, 2024 17:01 UTC (Fri) by mb (subscriber, #50428) [Link] (3 responses)

>I'm not sure it was “virality” that mb complained about or not

I'm not complaining about any particular detail of this.
And it's also not that I would be unable to understand the code.

I'm only comparing the complexity and thinking about whether it's worth it.

lockdep:
- Completely invisible for the developer using locks.
- There is nothing to do, except for flipping the kconfig to on to enable it.
- In practice it easily quickly finds all bugs. (Just run all lock code path once).

encoding lock dependencies in the type system:
- It's visible everywhere.
- It consumes additional brain capacity while reading/writing the code.
- In this case the compile time check is not worth much, because the lockdep runtime check is so powerful.

lockdep is a lot less intrusive and gets the job done.

I am all for compile time checks. But only where they make sense.
If the benefit is low than what's left is a major pain.

Encoding short lived things in the Rust type system is a good thing and it sometimes makes things easier.
Encoding long lived things or global dependencies in the type system is almost always a major pain.
This should be balanced against the perceived benefits compared to doing a runtime check.

Too complex for my small brain

Posted Nov 8, 2024 18:08 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

> Just run all lock code path once

Much easier to day that to do. Here's an example of bug that waited for more than 40 years for someone to learn to reliably trigger it's error path.

How do you know your locks are handled correctly in various error recovery situations? That's where most bugs lurk, after all

> In this case the compile time check is not worth much, because the lockdep runtime check is so powerful.

Isn't that what they told us about KASAN? Yet, somehow, we are exploring Rust that guarantees memory safety using pile of pretty invasive compiler-time annotations, because it turned out to be not feasible to handle these object lifetime issues using runtime checks.

> Encoding long lived things or global dependencies in the type system is almost always a major pain.

Yet we are doing that, anyway – to handle memory issues. What makes them special and why should locks be handled by probabalistic runtime detection tool and not by something deterministic?

> This should be balanced against the perceived benefits compared to doing a runtime check.

Sure, but it's always a balance between invasivity and benefits. I'm not sure the scheme as implemented by netstack3 hits the desired balance, but if you move all the checking to an external tools like I outlined the compile-time lock checking becomes very similar to how you track borrows: you just pass zero-sized context around and compiler does everything else.

Too complex for my small brain

Posted Nov 8, 2024 18:56 UTC (Fri) by mb (subscriber, #50428) [Link] (1 responses)

Sure. I guess we have differing opinions and we will both have to live with that. ;-)

Just keep in mind that I am not arguing against Rust, memory safety or proving the absence of other bugs.
Quite contrary.
The presence of a tool to avoid locking errors is essential.

Just saying to pick the correct tool instead of hammering everything, because the only thing available is a hammer.
We just have different views of what a hammer should be able to do.

Too complex for my small brain

Posted Nov 8, 2024 20:13 UTC (Fri) by khim (subscriber, #9252) [Link]

I think I understand what you are complaining about: for you the need to pass zero-sized context used for compile-time checking couldn't be entirely hidden and ignored (like one can do with lockdep and for me the fact that you may reduce it somewhat radically (e.g. by using special naming convention for the functions that may take locks and them adding procmacro to add the required machinery you may reduce the whole thing to only include something like `#[with_lock]` before `fn` and the use of `Ł_` function prefix) is enough – but the fact that you may miss potential deadlock is a problem for me, but not for you.

I can understand the tradeoffs and I can see why different people would disagree about them.

But I still couldn't understand that scheme where any non-trial change to the locking causes lots of code churn and I still couldn't understand what it brings to the table (except, maybe, for the warm fuzzy feeling that you are using advanced type system in very cleaver way).

Maybe I'm missing something but the fact that if that's actually happening then no one may show my what am I missing makes me sad.

Too complex for my small brain

Posted Nov 7, 2024 17:40 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (33 responses)

> It's trivial if you can redo either RED or BLACK and add more colors.

This means that you have to coordinate hierarchies. In the end, it ends up being simpler to just keep one lock hierarchy. But since you're essentially flattening a tree, level numbers end up constantly churning.

We used that approach for a simulation software back in 2005, and it was really unwieldy. In our case, we used level numbers to "check out" objects into the local memory.

Too complex for my small brain

Posted Nov 8, 2024 12:33 UTC (Fri) by khim (subscriber, #9252) [Link] (32 responses)

> But since you're essentially flattening a tree, level numbers end up constantly churning.

Sure, but if I understand how existing netstack3 approach works then it's actually even worse than that!

True, they don't have numbers to rearrange, but if understand their scheme correctly they their context still carries information about one taken lock (presumably the one with highest used priority) and they generic declaration demands one requirement (presumably the one with lowest needed priority) and then typechecker does the math for all other locks involved.

That way you may guaranteed that your storage subsystem wouldn't care about locks held by network subsystem, etc.

If I'm correct (and I could be wrong, I'm not a netstack3 developer) then in that scheme to slit STORAGE group of locks into REAL_STORAGE and MAYBE_NETWORKED_STORAGE and put NETWORK between them you would need not to change their numbers (tedious but easy to do, especially if they are all in one place) but to change all the places where locks from STORAGE group locks are used as bounds!

As I have said: it's possible that I have misunderstood their scheme completely and they avoid that churn (which is worse than in my scheme!) – but then I want to understand how they avoid it.

> We used that approach for a simulation software back in 2005, and it was really unwieldy.

I tried to use approach that's similar to how my understanding of netstack3 works in C++17 years ago – and that worked perfectly, thanks to duck-typing.

Then I tried to replicate this same scheme in Rust… and it collapsed under it's own weight because for ~2000 types that we had proc-macro script generated more than million trait implementations and compile-time went into hours. We tried to only provide implementations that we actually need for our code but then it meant that any change of the rules meant we have to add something to the big “transition allowance whitelist”. Then I have finally cobbled together out that scheme with __this_code_shouldnt_be_used__ and if const wrapped in a macro… and that one holds, so far.

It's still much uglier then what I had in C++17, but at least it works.

Too complex for my small brain

Posted Nov 8, 2024 16:05 UTC (Fri) by farnz (subscriber, #17727) [Link] (31 responses)

If I understand the implementation correctly, the key is the two blanket implementations.

First, you never manually implement LockBefore. That is only implemented generically as the inverse of LockAfter. However, you always take impl LockBefore<MyLock> as your parameter type.

Second, the impl_lock_after macro that you use to define pairs of locks expands to two implementations of LockAfter. The first is the obvious implementation; if I tell the macro that lock C must be locked after lock B, then the macro generates impl LockAfter<B> for C. The second is a generic implementation of LockAfter for all lock markers that implement LockBefore<B>, which combines with the other two implementations to ensure that anything up to and including Unlocked in the chain of locks implements LockBefore<C>.

This second implementation from the macro also provides the compile time checking that you have a chain of locks, not a cyclic graph; you can reduce any DAG to a chain via a topological sort, at the expense of having some oddities in the graph (basically, where you have a diamond in the DAG, you flatten it by choosing one arm of the diamond as the "first" and the other arm as the "second" lock in order, even though they are technically unordered).

It's thus not quite perfect at compile time; the restriction imposed is slightly weaker than that imposed by the DAG, since you've built a chain of locks, and you've chosen arbitrary orderings for locks that "should" never be taken at the same time, whereas a perfect compile time check would say that those two locks are unordered, and you cannot take both of those locks.

Too complex for my small brain

Posted Nov 8, 2024 16:42 UTC (Fri) by khim (subscriber, #9252) [Link] (30 responses)

Thanks. Pretty clever [ab]use of a type system, but yeah, that's one of way of avoiding the O(N²) combinatorial explosion. Works if you have a DAG, not sure if can be expanded to something more complicated (in my case rules were more complicated: certain operations were allowed only of one of many “enablers” happened before, but there was no need to order these in any particular order).

But that still doesn't answer how they are avoiding “virality” while still solving that RED vs BLACK “colored priority” dilemma.

AFAICS the only way to use these relations is to:

Declare, in the function prototype, the full set of lock types that your function or functions called from your function may ever take
Only declare the lowest priority lock in your function prototype and rely on the fact that higher priority locks would be permitted because they are higher priority

#1 means that any change at the bottom of your call stack becomes “viral” and ripples through the whole call graph.

#2 means that you have “colored priority” dilemma that's even harder to resolve than if you would have just given priorities to your locks and used simple const check, like in my example.

And in both cases typechecking is more expensive and usage is more complicated then when you use const way of correctness checking!

Of course you can combine these two approaches, but I'm not sure this would simplify anything, it really looks to me that it would just create a bigger mess.

IOW: I'm still not sure where exactly all that machinery becomes simpler and easier to use that const way of correctness checking!

If you want to avoid hassle of coordination of priorities and keep all your lock types in one place then you can just pass your LockAfter to a simple procmacro that would assign priorities to locks and if your don't want to think about that in advance you may even design a tool that would help you to assign priorities based on the order that your locks are using in your code without even actually planning how would you use them in advance!

I'm probably missing something there, because when smart people do stupid things… and pick solution that looks, on a surface, strictly worse than some other solution… then there are usually some reason for doing things that way.

Too complex for my small brain

Posted Nov 8, 2024 17:14 UTC (Fri) by farnz (subscriber, #17727) [Link] (29 responses)

You'll need to explain what you mean by "avoiding virality" in a lot more detail; you can't arbitrarily rearrange locks without forcing significant changes, but you can insert new locks into this system without any changes to existing code.

The key is that if I take something that impl LockBefore<TowelLock> + LockBefore<SpongeLock>, I can accept anything that comes before both TowelLock and SpongeLock in the hierarchy as my lock marker. No changes are needed to add new locks to the hierarchy, because if my chain is BathroomLock → ShowerLock → TowelLock → SpongeLock, and I insert a new lock ToothBrushLock between BathroomLock and ShowerLock, then it's still the case that I can pass a LockBefore<BathroomLock> to my existing function that takes impl LockBefore<TowelLock> + LockBefore<SpongeLock>. The thing that becomes "viral" is changing the acceptable order of locks; if I swap BathroomLock and TowelLock, then things that used to be "happy" taking impl LockBefore<TowelLock> may now want a different argument.

Too complex for my small brain

Posted Nov 8, 2024 18:28 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (17 responses)

> The thing that becomes "viral" is changing the acceptable order of locks; if I swap BathroomLock and TowelLock, then things that used to be "happy" taking impl LockBefore<TowelLock> may now want a different argument.

That's literally the whole point of the exercise in the first place. If the argument is no longer acceptable, it's because your lock reordering just created a possible deadlock and now you have to fix it (or undo the reordering).

Too complex for my small brain

Posted Nov 8, 2024 18:39 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

Yep, which is why I'm lost in khim's reasoning around virality. With the compile-time check used, if my lock chain is BathroomLock → ShowerLock → TowelLock → SpongeLock, and I have a function that takes ctx: impl LockBefore<TowelLock>, I can supply it with Unlocked, ShowerLock or BathroomLock as context. Further, if I swap ShowerLock and BathroomLock, or insert a new lock ToothbrushLock after BathroomLock but before ShowerLock, my function is unchanged - indeed, it even automatically acquires the ability to accept ToothbrushLock, because ToothbrushLock is before TowelLock in the chain.

The only virality I can see is when the rearranging of lock orders results in a potential deadlock; at that point, my function might have to change to take ctx: impl LockBefore<TowelLock> + LockBefore<ToothbrushLock> to avoid problems, but that's because if it doesn't do that, there's now a chance of a deadlock.

Too complex for my small brain

Posted Nov 8, 2024 19:53 UTC (Fri) by khim (subscriber, #9252) [Link]

>The only virality I can see is when the rearranging of lock orders results in a potential deadlock; at that point, my function might have to change to take ctx: impl LockBefore<TowelLock> + LockBefore<ToothbrushLock> to avoid problems, but that's because if it doesn't do that, there's now a chance of a deadlock.

I think we have the different meanings of what “chance” means here. My take on that “chance” is very similar to mb's take on this (we differ about feasibility of compile-time detection of the problem, but the core idea is the same): for me “chance” only have a meaning if the code that I actually have for the full program may deadlock.

I don't particularly care about what happens to the code that “shizophrenic monkey” would achieve if it would start randomly shuffling code while ignoring the purpose of the code and would only follow the type traits written in the function declarations as a guidance (I call that “shizophrenic monkey” because normal monkey would just randomly shuffle code without looking on trait declarations and wouldn't be able to produce anything compileable… we need pretty advanced money that would look on the traits in function definitions – yet would ignore semantic of the code).

If code, according to the traits visible in declarations, may deadlock, but compiler wouldn't allow me to actually cause a deadlock… I'm happy. I have an idea about what I'm doing, I have an idea about what I'm calling, I don't need these breadcrumbs to understand what I'm doing. I only need to ensure that compiler (or, in case of mb, runtime checker) would catch me if I would do a mistake that may lead to the problems for the actual user of my program. Spare me that lecture about “potential deadlocks“: if something in my code ensures that they would never become an “actual deadlocks”… then I don't particularly care about who or what would prevent it. Even if they work correctly by accident – I'm still happy.

I only start to care when something in my code start leading to the harmful, deadlockable program, not when there some “code smells” in my code.

I'm sick and tired of code written with “best practices” in mind without any regard to whether these “best practices” are solving any real problems or not.

In fact in my life I have spent more time ripping out “best practices” code that is solving imaginary problems that may never happen in real life then I spent time writing code that solves real problems.

Because almost every time I try to use something in non-trivial way… I find out that it's impossible – but not because something in real world prevents my design from being workable, but simply because someone added crazy arbitrary “best practices” constraints.

Sometimes I rip them out, sometimes I give up and built the usual Babylonian tower with 100 layers that redo things that are already done on the other layers because “best practice” isolates me from results of that work… but it's always the same: people try to “protect me from myself”… and they successfully protect me from my desire to do job quickly and efficiently.

Can we, please stop protecting me from myself? Catch my mistakes, if you can, I would be glad, don't stop me from doing things simply because “you know better”. No, you “don't know better”, otherwise you would have been able to explain how and why my code would break.

Too complex for my small brain

Posted Nov 8, 2024 19:25 UTC (Fri) by khim (subscriber, #9252) [Link] (14 responses)

> If the argument is no longer acceptable, it's because your lock reordering just created a possible deadlock

What is “potential deadlock” and why should I care about “potential deadlock”?

If you means “possible deadlock” then, of course, I would care… and would want to know just how that “possible deadlock” may happen. Which functions should I call in one thread, which functions should I call in another thread… lockdep gives me that info.

> now you have to fix it (or undo the reordering).

I only have to fix it if code that I wrote and compile may actually deadlock.

If that code can be changed by someone and then it would deadlock… then I don't particularly care about that case.

Let that “someone“ think about how and why s/he would change the code (and locking scheme) to avoid deadlocks.

Too complex for my small brain

Posted Nov 8, 2024 23:40 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (13 responses)

It is possible in the sense that the type checker cannot prove that it is impossible. Which, again, is the whole point of the exercise.

But it is also possible in the sense that it is at least somewhat likely, assuming reasonable code. If you accept a &impl LockAfter<Foo>, the obvious implication is that you intend to lock a Foo and then lock the argument (or else you should have accepted a looser trait bound in the first place). Or at least, you intend to do something that might plausibly result in a Foo getting locked before the argument. If you reorder things so that the argument may not be locked after Foo anymore, then the function no longer makes sense and needs to be fixed or rewritten so that it does not attempt to lock the argument after locking a Foo.

The other point I would like to make goes to virality: Type aliases and associated types can allow a significant degree of compile-time indirection that makes the overall strategy much less painful than what you seem to be describing. For example, you might have a function that needs some complicated set of locking rules to be obeyed by its callers. Instead of requiring the callers to pervasively copy and paste the callee's type bounds in their own signatures, the callee may introduce a type alias, the callers can use that type alias where appropriate, and then you only have to update it in one place instead of needing to change every callsite. In cases where we're a trait method, this can be done as an associated type, which has the added benefit of allowing some coupling between the generics on the method or trait and the generics on the type. For obvious reasons, this strategy is more useful for internal/unstable APIs that frequently change, and is less useful for anything that intends to observe SemVer (you can't just change the types whenever you want, if it's supposed to be a stable interface, but it would still save everyone a little bit of typing and cognitive effort, so it is better than doing nothing).

Too complex for my small brain

Posted Nov 9, 2024 0:57 UTC (Sat) by khim (subscriber, #9252) [Link] (12 responses)

> Which, again, is the whole point of the exercise.

The whole point of the excercise was automatic deadlock prevention. If deadlock is prevented (as in: my code that could actually deadlock on the users machine) – I'm happy, if false positive is detected I'm annoyed. It's as simple as that.

Of course we all know that achieving zero false positives, zero false negatives is just simply mathematically impossible, but that doesn't make false positives less annoying.

As advocacy of deplock here shows some people even prefer to have a [small probability] of false negatives – mostly to avoid annoying and disrupting false positives.

> It is possible in the sense that the type checker cannot prove that it is impossible.

If that couldn't lead to actual bug in a program then that's false positive. And false positives are really big deal. They literally make or break tool adoptions. TSAN was the first widely adopted data race detector precisely because it has so few false positives. Some very rare corner cases exists, sure, but, most of the time, if TSAN detects error – you can be sure it's real error. You spend time fixing real bugs instead of making tools happy. That's good trade-off. If tools force you to change code because they report possible errors where none exist… you are spending time chasing windmills… that's annoying and not productive. That's bad trade-off.

Heck, it took decade and half for Rust borrow checker to become intelligent enough not to make people run screaming after interacting with it. Precisely because originally it had too many false positives.

If compiler tells me my program have an issue I want to be sure compiler is right at least 90% of time (99% is better and after 99.9% I would probably stop caring), otherwise I would prefer not to even bother with such useless set of checks.

> If you reorder things so that the argument may not be locked after Foo anymore, then the function no longer makes sense and needs to be fixed or rewritten so that it does not attempt to lock the argument after locking a Foo.

And who would ensure that? AFAICS compiler wouldn't care. I'm not sure if there are clippy lint and even if such lint does exist it may very easily be wrong if I'm taking or not taking Foo depending on what kind of callback I'm accepting.

So we have found another annoying problem with that approach – the one that doesn't even exist in the alternatives! Was it even discussed when traits were chosen as the way to detect potential deadlocks?

> Type aliases and associated types can allow a significant degree of compile-time indirection that makes the overall strategy much less painful than what you seem to be describing.

Sigh. Sure, you can invent many tricks to help someone who is forced to participate in a potato sack race: some kinds of rollers, running poles and many other things… but why the heck removal of the potato sack is not even discussed?

All your suggestiongs to reduce virality and pain related to it… – they all rely on the ordering of locks from “least important” to “more important” – and if you have such ordering then the only objection to the simple, naïve, const-based correctness checker suddenly just disappears.

Can we, please first discuss some tangible benefits that all that complexity provides (the reason to participate in the potato sack race instead of regular race) and only then start talking about mitigation strategies.

Because, in my book, to discuss two approaches they both need to have some benefits!

If one gives you flexibility but also small percentage of false negatives (lockdep) and the other gives you zero false negatives but somewhat annoying syntax (C++ templates or const-based deadlock checking prevention in Rust) then I can understand how one or another may be picked.

But if one approach have strictly more negatives than the other approach then it doesn't even matter if these negatives are large or small: they exist, ergo this is worse approach, ergo it shouldn't be used.

From what you are saying one of the positives that you perceive as “nice to have” is the fact that &impl LockAfter<Foo> acts, in addition to other things… also as a documentation, not as an automatic lock detector.

Fair point. But if that is the advantage over const-based deadlock checking prevention then… was that advantage ever measured or at least discussed?

We may have our locking scheme documentation spread over hundreds of files that we have written instead of keeping it in one place somewhat removed from the rest of the code… well… hmm… an interesting proposal. It would mean that it's easier to see how our locking scheme lock works, but harder to modify… and then we would use type aliases to reduce an easiness of discovering it but to reduce pain with modifying it, too… hmm… is it a good trade-off or not?

Discussion structured in that way would at least make some sense… but I don't see anything like that being discussed in the article that we are talking about, I don't see it being discussed in netstack3 documentation… I don't see it anywhere! It's as if no one even tried to think about if using complicated hierarchy of traits is better than just simple enum with 77 items!

This is what baffles me: the road that more-or-less guarantees that there would be a lot of work and a lot of complexity was chosen seemingly on the whim without ever discussing alternative roads… and now only mitigation strategies are discussed without ever explaining why that road was picked in the first place! Why? If something hurts us shouldn't we first try to remove that something instead of discussing which painkillers work best?

There should be some reason for it! Do we know it or not? Do we have at least an idea about what the reason was?

Too complex for my small brain

Posted Nov 9, 2024 2:58 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Of course we all know that achieving zero false positives, zero false negatives is just simply mathematically impossible, but that doesn't make false positives less annoying.

That's not quite true. You can achieve automatic verification with zero false negatives, although not in the general case.

Too complex for my small brain

Posted Nov 9, 2024 11:53 UTC (Sat) by khim (subscriber, #9252) [Link]

> That's not quite true. You can achieve automatic verification with zero false negatives, although not in the general case.

Not sure what you are talking about. It's trivial to achieve “zero false negatives”: return true; guarantees that. It's also trivial to achieve “zero false positives”: return false; guarantees that. You don't even need to know the question!

Not very useful implementation, but it works, isn't it? Thus saying that it's not possible to guarantee zero negatives would be stupid.

What's not possible is to precisely separate things. You have to have either false positives or false negatives or both.

And since ideal solution is not possible… we have many different solutions with different trade-offs.

Too complex for my small brain

Posted Nov 10, 2024 5:09 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (9 responses)

> Discussion structured in that way would at least make some sense… but I don't see anything like that being discussed in the article that we are talking about, I don't see it being discussed in netstack3 documentation… I don't see it anywhere! It's as if no one even tried to think about if using complicated hierarchy of traits is better than just simple enum with 77 items!

Isn't Fuchsia a Google project? I'm sure there is non-public discussion around the project, so it may well have been discussed internally and this is the result. I mean, I'm all for Google being more transparent about things, but there's a *long* way to go before that happens.

Too complex for my small brain

Posted Nov 10, 2024 13:14 UTC (Sun) by khim (subscriber, #9252) [Link] (8 responses)

The problem with that reasoning is that they documented the great many things… except the most basic question of why this complicated scheme with traits is even needed if much simpler one is obviosly possible.

It's as if someone decided to go from Eilat to Dubai around Africa without any explanation instead of using Suez.

We don't need to know about minutiae details of these talks to know that Houthi were mentioned and was deemed real enough problem to do that.

Similarly here: sure, we don't have transcripts of all the talks Google had inside… but they had to had some reason to do what they did, isn't it?

What could that be? If the documentation was the primary reason then it would have been mentioned, somewhere, I assume.

Too complex for my small brain

Posted Nov 10, 2024 13:30 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (7 responses)

Filing an issue with the Fuchsia team to improve documentation sounds a lot more productive than arguments with other non-Fuchsia participants here, no? Even if we, collectively, did come up with some rationale that satisfied you…what does that matter? And sure, they may not have a reason (e.g., the const pattern you suggest may not have occurred to them). Or they may have weighed the tradeoffs differently and just disagree with you. No matter what the case may be, I don't believe the answers lie in this comment thread.

Too complex for my small brain

Posted Nov 10, 2024 14:18 UTC (Sun) by khim (subscriber, #9252) [Link] (6 responses)

> Filing an issue with the Fuchsia team to improve documentation sounds a lot more productive than arguments with other non-Fuchsia participants here, no?

Depends on your goals, ultimately. If I wanted to improve Fuchsia then sure, it would have been the best way.

But I'm not interested in Fuchsia. Multics, Taligent, Sailfish or FirefoxOS… funny OSes come and fail regularly. Fuchsia looks like one of these so far.

But they do include many interesting ideas – and it's the question of whether I'm missing them, that I'm after.

In particular I use various tricks that help me turn runtime errors into a compile-time errors in C++ – and, naturally, am very interested in using them in Rust, too. Except lack of duck-typing for generics makes them very hard to use.

> Even if we, collectively, did come up with some rationale that satisfied you…what does that matter?

It matters because my goal here is to pick the best solution that would for me, not for Fuchsia. Also to help others to do the same, but that's secondary goal.

That's why I'm not, really, not interested in knowing why Fuchsia as part of Google did that choice. Even if there are some deeply internal reason to pick that approach… why would I care?

> Or they may have weighed the tradeoffs differently and just disagree with you.

To discuss “tradeoffs” we need two approaches with pluses and minuses. We don't have that yet.

Traits don't even work all that well for the documentation purpose because it's way too easy to leave annotations in place when they no longer needed. And then they become pure nuisance.

> And sure, they may not have a reason (e.g., the const pattern you suggest may not have occurred to them).

Could be. I, too, went the traits route when I wanted to implement similar scheme – because I have done that many times in C++ and it works beautifully there. But in Rust… that was a disaster. Viral nature of traits declarations created an insane mess.

I started looking for a solution… and found the const trick. It worked beautifully, made it possible to adopt C++17 style metaprogramming without turning my code into “insane mess” (just into “somewhat ugly mess”) – and thus the idea that someone may just not know it never occured to me in that discussion.

But I guess that's the best explanation that we have, so far.

Too complex for my small brain

Posted Nov 11, 2024 12:14 UTC (Mon) by daroc (editor, #160859) [Link] (5 responses)

Do you happen to have any code demonstrating your approach published? I feel like I have a good intuitive understanding of Fuchsia's approach, but I don't think I understand your approach as well.

Too complex for my small brain

Posted Nov 11, 2024 15:49 UTC (Mon) by khim (subscriber, #9252) [Link] (4 responses)

You mean the code in production? That's not “my” code, just code I helped to develop. And I would need to spend a lot of time what dependencies we are tracking, how and why (it wasn't about lock tracking, rules there were different, that's why simple pair of before and after traits were not enough).

I think it would be better to concentrate on that simplified example that started the whole discussion.

I kinda assumed that it would be trivial to follow yet, apparently, it's not trivial… thus I would try to explain the ingredients that are used there and show how can they be combined together to detect violations of locking ordering.

Zero-sized types, in Rust, may carry compile-time payload. Specifically:
1. They can carry information about some other type
2. They can carry information about lifetime
3. They can carry values (integers or sequences of integers)
To check the correctness of locking we would use all three properties. Specifically:
1. We would be putting information about all currently taken locks in the context (by using 1.1 property, obviously)
2. We would do that by putting zero-sized reference to the previous context: if you take &'a mut T and convert into PhantomData<&'a mut T> then this zero-sized type acts precisely like a real reference… except you couldn't use it to derefence anything (but we don't have anything interesting in the context itself, the only interesting things are in the context type, thus we don't need to dereference anything) – in particular as long as PhantomData<&'a mut T> exists the original context is not accesible (by using 1.2, obviously)
3. And we also would add some constant to know what we have put in the that context, e.g. we may put all locks in one enum and then simply use LockPriority::IPV6LOCK us usize (by using 1.3, obvious)
And with all that placed in our context we only just need to verify that lock that we are taking right now is valid. In Rust 1.79+ that's trivial: you just used const { assert!(…) } or const { if … { panic!("…") } and that's it. That's a bit strange because it looks as if that's a runtime check, but no, that's compiler-time check. So… you have two constants, compare and you would know if lock is Ok to take or not (ask me if you need to backport it to older versions of Rust, there you needed somewhat more complicated dance, but it's also doable).
Note that you have access to all types in the chain here, even if indirect, and could implement arbitrarily complex verification scheme… it's only hindered by the fact that const functions couldn't be part of the trait, so you would probably want to put your machinery in separate module to not fill the namespace with helper functions.

And that's… essentially it. We are done. Just need some syntax sugar to make it more pretty. I guess you would want to add something like LockPriority trait:

pub trait LockPriority : sealed::LockPriority {
    const PRIORITY: usize;
}

impl<'a, T: ?Sized, const P: usize> LockPriority
for TakenLockPriority<'a, T, P> {
    const PRIORITY: usize = P;
}

mod sealed {
    pub trait LockPriority {}
    impl<'a, T: ?Sized, const P: usize> LockPriority
    for super::TakenLockPriority<'a, T, P> {}
}

Would also need some macro because you would be doing things like
let (mut ctx, guard) = mutex.lock(UsePriority(& mut ctx));
pretty often. Or maybe not, maybe it's better to keep that explicit. Attempts to use wrong context would be detected and stopped by the compiler, anyway.

P.S. The while thing relies on the fact that correctness checking for the const expression and detection of errors happens not when you define some generic function, but when it's instantiated. And it's obvious why it have to happen like this: correctness checking of something likes const { assert!(…); } couldn't happen during definition time because what goes into assert!(…); may define on constants that ass passed as arguments. And doing that in runtime wouldn't make any sense: this would mean that your constant value would, suddenly, depend on something in runtime…

Too complex for my small brain

Posted Nov 11, 2024 16:18 UTC (Mon) by daroc (editor, #160859) [Link] (3 responses)

Thank you for the explanation; I see what I was missing. I didn't understand that the assertions in the const block were actually being evaluated at compile time — with that understanding, the scheme makes more sense.

It seems like the big difference between the two schemes is really just about the use of a partial order [1] versus a total order. Fuchsia's scheme also involves carrying around a zero-sized context that gets compared to see whether one element is less than another element at compile time — the difference is that the comparison has to be encoded into traits, because Rust doesn't support using custom types as const generics (yet).

I personally think that there are reasons to prefer a partial order, but that probably a scheme that used const generics to pass around a type with a custom PartialOrd instance would be the best of both worlds.

[1]: https://en.wikipedia.org/wiki/Partially_ordered_set#Parti...

Too complex for my small brain

Posted Nov 11, 2024 17:24 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

> It seems like the big difference between the two schemes is really just about the use of a partial order versus a total order.

That's actually pretty minor issue. const functions can do arbitrary calculations, if you are worried that locks would be accepted in some random arbitrary order while you want the compile-time error (and explicit decision by developer who would declare whether storage or network have higher priority, instead) then this can be done via simple proc-macro which would generate an appropriate table. netstack3 is defining their locks in once place with pile of macros, anyway.

As I have said: I developed this scheme to validate more complicated situation, where nether total nor partial order work. I just used simple number for illustration purposes.

And yes, we had function that accepted two enum numbers and then answered whether they are compatible. Pretty obvious modification.

> the difference is that the comparison has to be encoded into traits

The main difference is when and how it's checked, not how it's encoded. With netstack3 function declares what it accepts in it's signature… and that's it.

But how would it know? Think about function that stores something… if it would send it to USB storage, if it would save it on SSD then there would be different locks, if that file on the NFS server then you would hit yet another situation.

And you need to specify one, single markup that would cover everything. Sure, instead of simple callback you may accept another trait which embeds information about what locks that callback would need… but what if you have two callbacks? What if they could be called in a different order?

Instead of one, simple, rule you start develop elaborated schemes that may describe what you are really trying to achieve… but you are doing in a pretty limited world of traits and generics… which is total: if generic function accepts some type of callback – then it always accepts that type of callback! And if the calling function could't prove at the declaration time that it have the right to call that callback… oh, well… now we need another trait… and if we use two callbacks… we need to decide (and declare) order that we plan to use in the function declaration, too! And if order can be different… well that's another indirection! Traits are multiplying and fill your code with something that it doesn't really need, that's only there to help with lock order checking!

As I have said: this leads to insane mess… and insane compilation times.

Now, constants are validated after instantiation. That means that you simply just don't even need to care about combinations that are not happening in real code. And if you use function with a callback… callback would determine validity of your function automatically. You don't need to create an additional trait, put callback and lock in there and pass that around. Any generic function or generic callback already carries the required data, you don't need to duplicate it in separate, manually written, traits!

> I personally think that there are reasons to prefer a partial order, but that probably a scheme that used const generics to pass around a type with a custom PartialOrd instance would be the best of both worlds.

Sadly that's not possible in Rust because traits couldn't carry const functions and PartialOrd couldn't be used in const context, too.

You may create helper function, though. If your locks are defined in one place then it's trivial and if they are not defined in one place then RED/BLACK scheme can be used (it's more-or-less inveitable with netstack3 approach, too, as was already discussed).

P.S. One mode that's trivial with const-checking and which is, essentially, impossible to implement (without additional const-coding, heh) with netstack3 approach: “just make me happy”. I have 2 (3, 4, 5, N) locks… how can I lock them all without causing the deadlock. How would you implement that? That's powerful capability, but it's not compatible with “just leave this machinery out” compile-speedup enhancement (where you just map everything to () and compiler accepts everything till you would compiler with checking-enabled flag)…

Too complex for my small brain

Posted Nov 11, 2024 18:18 UTC (Mon) by daroc (editor, #160859) [Link] (1 responses)

Ah, I see now. I can't speak for anyone else, but you've managed to convince me.

As for why the Fuchsia team went with a trait-based approach — your comment about when panics in const blocks stabilized raises another theory: netstack3 was started in 2019. I'm not sure exactly when the LockBefore solution was added, but it could have been before the approach you described became possible.

Too complex for my small brain

Posted Nov 11, 2024 19:21 UTC (Mon) by khim (subscriber, #9252) [Link]

> I'm not sure exactly when the LockBefore solution was added, but it could have been before the approach you described became possible.

I think that was possible even in 2019, but it sure looked awfully ugly back then.

Panic in const were always possible (just divide by zero and you get panic at compile time) and, indeed, in Rust 1.51 everything already works. But without const generics all these checks would have to be based on One and Zero types and associated consts in traits and that looks pretty awful. When Rust 1.57 stabilized panics it became easier and with Rust 1.79 the need for strange dance with extra struct and const implementation for it is no longer needed.

I don't remember where I discovered the use of const for invariants checking (I think it was on URLO, but could be reddit, too), but I remember that current version of Rust was 1.5x, which meant that I don't need to do crazy dance with types to encode number, can just simply use const generics, but still needed to use “divide by zero” trick. Discussion about the fact that panic in const would be stabilized “soon”, that I vaguely recall, means that was middle of 2021.

Now suspect that without const generics with the need to encode numbers in One and Zero types and then use pile associated constants for calculations it was just too ugly to really contemplate and one needed a leap of faith to believe that eventually it all would become much simpler and more usable to bet an implementation on that.

But for me, since I knew about that approach essentially from my first attempt to help some of my friends to write their first large Rust program… it just feels much more natural than traits-based checks. Closer to how one would do that in C++17 and thus more… “traditional”, I would say.

I guess is never occurred to me that it may be very obscure technique that is not known to everyone.

Too complex for my small brain

Posted Nov 8, 2024 19:10 UTC (Fri) by khim (subscriber, #9252) [Link] (10 responses)

> You'll need to explain what you mean by "avoiding virality" in a lot more detail; you can't arbitrarily rearrange locks without forcing significant changes

No? Why no? The “gold standard”, lockdep, does that.

Well, sure, if you were always doing storage.lock() after network_socket.lock() and now want to do that in the opposite order, network_socket.lock() after storage.lock() then you need to ensure that you atomically do that swap everywhere.

That's unavoidable and that what we are after, after all. If you would do them in one order in one place and in the opposite order in some other place… then you would hit deadlocks.

But if we want to do some additional “significant” changes on top of that in functions that are not directly touching these locks… that's viral.

Of course different people consider different changes “significant”. In the naïve const way of correctness checking with huge enum that lists all possible lock types when you swap the locks in the code you have to swap two lines in that huge enum, too.

That's a bit more “viral” than some people would like because it would mean that you still have to collect all these locks in one place… but I would say that it's mild “virality”: global list of everything are PITA to maintain but even if you have thousands of lock types (like kernel have) it's still possible to list them all in one enum. That can be avoided with an additional machinery, as I outlined already, but I'm not entirely convinced that developing such machinery and keeping working when toolchains change is better than just having one central list of possible lock types

Now, if these extra changes that you are forced to do are not limited to one, two or ten lines… when they touch other functions that are not even directly touching these locks that are rearranged… then I would say that your solution is too viral for my tastes.

Because now in additional to these few functions that directly touch these locks and, maybe, some externel “registry” of all types involved… I have arbitrary and not easily predictable work that would touch all these functions that shouldn't, actually, even know whether I lock storage.lock() before or after network_socket.lock() and don't even particularly care whether I even need storage and network_socket – they only touch GPU lock, after all, why should they care where GPU assets even come from… as long as an attempt to load GPU assets doesn't try to use GPU to show some messages on the screen everything should just work.

The key is that if I take something that impl LockBefore<lTowelLock> + LockBefore<SpongeLock>, I can accept anything that comes before both TowelLock and SpongeLock in the hierarchy as my lock marker.

Sure, but that means that any function that calls MakeMeClean now also have to know about both TowelLock and SpongeLock – and pass that knowledge to it's owner!

And generic function WashMyDishes<const USE_TOWER_OR_BLOWER: bool>… how do you express it's lock requirements at all?

> No changes are needed to add new locks to the hierarchy

Changes are not needed to the code of functions, but you still need to alter your registry to add new type of lock. Exactly like in a trivial solution with const way of correctness checking with central enum registry of locks.

And because of orphan rules it has to be central registry of locks, because to add lock between two other locks from two different components you would need to alter both of them.

Thus, in the end, we have two schemed where one scheme have a problem (how to attach priorities to locks that are defined in a different components) but the other, presumably good, scheme also have that problem but it also have a different problems that simple, naïve, const way of correctness checking doesn't have.

And I'm told that simple, naïve, scheme is, somehow, worse… Really? How come? Where is the advantage?

I was taught that if something is called “better” then it should have advantages over something something that's called “worse”.

If your “better” scheme have the same disadvantages that “worse” scheme also have plus more… then how the heck it can be called “better”?

I'm certainly missing something in that picture.

Too complex for my small brain

Posted Nov 11, 2024 9:34 UTC (Mon) by taladar (subscriber, #68407) [Link] (3 responses)

> The “gold standard”, lockdep

So the gold standard is something with almost impossible preconditions to work, namely running each code path at least once?

Too complex for my small brain

Posted Nov 11, 2024 11:23 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

Gold standard is something that is perceived to work and also something that is used in a project that produced code used by billions of people with hundred of billions devices.

People, naturally, are very reluctant to switch to something more intrusive unless it can be shown that it's really needed.

Attempt to bring Rust in kernel wouldn't have been possible if not for showing (with fuzzers) how many Rust-preventable bugs are there in kernel. It was adopted not because someone wanted to adopt it but because kernel needed some answer to the endless stream of security holes discovered semi-automatically. Time will tell if Rust will help to mitigate them, but pain was acute enough to try.

Similarly lockdep would stop being “gold standard” not because someone would continue to scream about “impossible preconditions”, but if it would be shown that it's not efficient and produces product of unacceptable quality – or if something that's equally easy to use may produce better results.

That's very tall bar, because deadlocks are usually not a security issue, just a nuisance, thus they have to happen often enough in real use but not often enough in the run with lockdep on a developer workstation for lockdep to stop being “gold standard”.

That's a very tall order thus approach that proposes to replace lockdep with something better have to ensure it would be easier to use. And mechanism used by netstack3… nope, doesn't look like it's easier to use, to me.

If you would propose C++-style approach without extra annotations and associated churn (only possible in Rust with the use of const AFAIK) then it may be argued that it would be comparable to lockdep and more robust. You sometimes need to do manual rearranging of locks in the “lock registry” or build and run code in special mode that would rearrange their priorities for you, but you have to run code in a special mode with lockdep, too. This leaves us with pretty simple and mechanical additions of ctx: impl Context in a few places vs the need to run code with decent enough coverage… I would prefer to add ctx: impl Context, mb would prefer to organize coverage… we may agree to disagree, but at least we are comparing different sorts of banana fruits.

Trait-based enforcement is entirely different beast, more like baobab, than banana fruit, it imposes much more on the developer without any clear benefits, that's why I'm surprised it was picked.

The running hypotheses is that the reason that was done because someone had no idea const checking happens after monomorphisation (like in C++) and/or just mechanically adopted scheme that would have worked great in C++ or Zig, but it insanely unwieldy in Rust.

Too complex for my small brain

Posted Nov 11, 2024 11:52 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

Lockdep does something fundamentally different to this scheme; it monitors use of locks, and alerts at runtime if you have two codepaths that take the same locks in different orders.

It's not the "gold standard" for compile-time lock ordering enforcement, for the simple reason that it's a runtime verifier of lock ordering, not a compile-time check. And all the normal issues with runtime verification kick in - for example, I'm not likely to test on a machine that runs the Broadcom BCM957608 driver, because I can't afford to do so.

Further, the trait based option is an impl Context in practice, where Context is spelt impl LockBefore<FirstLockITake>. That spelling says that you take any concrete context that's good enough to avoid a deadlock, and in practice you'll start with a Unlocked at top level, then just pass around the contexts you get from locking locks. You don't actually get forced to do any refactoring unless you've changed lock ordering in such a way that deadlocks can now happen where they could not happen before.

Too complex for my small brain

Posted Nov 11, 2024 14:52 UTC (Mon) by khim (subscriber, #9252) [Link]

> Lockdep does something fundamentally different to this scheme; it monitors use of locks, and alerts at runtime if you have two codepaths that take the same locks in different orders.

And in my scheme that can be done, too. Just turn TakenLockPriority into actual context with actual reference to previous context – and you can verify things at runtime.

But, more interestingly, you can also add link time and do topological sorting based on that. I was talking about that here.

> Further, the trait based option is an impl Context in practice, where Context is spelt impl LockBefore<FirstLockITake>.

Nope: that's not FirstLockITake but FirstLockSomeoneInOneOfBazillionFunctionsThatIMayEverCallTake.

And determining that lock is not easy, it changes with time, and, ironically enough, that's precisely the contention point that led to the infamous Ted's outburst and Wedson Almeida Filho resignation.

So, please, don't turn something that leads to heated discussions and burn-outs into something trivial, not even worth talking about.

> You don't actually get forced to do any refactoring unless you've changed lock ordering in such a way that deadlocks can now happen where they could not happen before.

IOW: this scheme makes your locking story consistent by preventing precisely and exactly the kinds of refactorings that kernel developers rountinely perform, plan to perform in the future and is precisely something that they need to support. Well… doh.

I guess cutting the legs to ensure that you would never step into danger is a solution… but the one that people rarely find acceptable.

Yet for the locks… that's what is expected? Why?

Too complex for my small brain

Posted Nov 11, 2024 11:32 UTC (Mon) by farnz (subscriber, #17727) [Link] (5 responses)

Sure, but that means that any function that calls MakeMeClean now also have to know about both TowelLock and SpongeLock – and pass that knowledge to it's owner!
And generic function WashMyDishes<const USE_TOWER_OR_BLOWER: bool>… how do you express it's lock requirements at all?

For the first, no it doesn't - it has to know about locks in general, but it's fine for MakeMeClean to only know about BathroomLock, since that comes before both TowelLock and SpongeLock in my sample lock ordering. All that MakeMeClean has to pass along is the knowledge that lock order must be respected. And for your generic function, you have choices; either say that your lock order requirement is a parent lock of both BlowerLock and TowelLock, or have the generic that determines which lock to take also determine which lock order is required:

trait DryingMethod { type Lock; }

struct TowelDry;
impl DryingMethod for TowelDry {
     type Lock = TowelLock;
}

struct BlowerDry;
impl DryingMethod for BlowerDry {
     type Lock = BlowerLock;
}

fn WashMyDishes<Drier: DryingMethod>(ctx: LockBefore<Drier::Lock>)

Again, no virality unless you change lock ordering around. And even if you change lock ordering around, no changes needed unless you've introduced a deadlock into your setup, since things will compile with any lock from higher in the hierarchy, even if you've changed ordering at a lower point in the hierarchy.

Also, you've not presented code for your "simple" scheme; as far as I can find, though, it involves some incredibly subtle and convoluted code in the form of TMP, and is basically impossible for you to explain clearly. I didn't write the Fuschia system, but I can explain how it works; you wrote your TMP solution, and can't explain it to me clearly.

Too complex for my small brain

Posted Nov 11, 2024 14:38 UTC (Mon) by khim (subscriber, #9252) [Link] (4 responses)

> Also, you've not presented code for your "simple" scheme

I did. In the very first message, in fact. Strange that you are discussing everything except actual code. Maybe that was a mistake because discussion quickly degenerated to the questions about how to assign priorities… but apparently traits-based checking doesn't work without some well-defined lock ordering, too, thus it's not an advantage of that method.

Sure, one may need/want to play with some macros to make it more understandable, but I'm not sure Fuschia code wasn't written in 20 minutes, either.

> as far as I can find, though, it involves some incredibly subtle and convoluted code in the form of TMP, and is basically impossible for you to explain clearly

On the contrary: it's so trivial that doesn't even need much explanation. You have generic TakenLockPriority type that includes priority of last taken lock in it's type and reference to all previously taken locks. When you take a lock you specify priority of that process and get new TakenLockPriority that contains phantom, zero-sized reference to previous TakenLockPriority and that priority inside.

And since constructor for a lock has two priorities it just compares them like this to detect rules violations:

        const {
            if PREVIOUS_PRIORITY >= PRIORITY {
                panic!("Improper use of lock is detetected")
            }
        }

And that's it! Like: literally, that's the whole scheme! Just try to play with priorities on Godbolt (again, link was in the very first message).

Maybe you haven't realized that panic! in the code above produced compiler-time error and not runtime error? That's how Rust works. There were talks about adding actual static_assert, but that haven't happened so far. Instead of that you can use assert! or panic! in a const context to generate compile-time error.

> you wrote your TMP solution, and can't explain it to me clearly.

And now you say that I haven't shown you any code. Have you looked on the presented code or not? And as for “can't explain it to me clearly”… That's not fair. I wrote the code because it I prefer to discuss things that way: code doesn't lie, while “clear explanations” might. But if you need an explanation then it can be summed in just three sentences:

You pass zero-sized context everywhere that contains zero-sized unique references to all taken locks and their priorities.
You pass that zero-sized context to the locking functions and that locking functions compares priorities of two locks: last takes and new one. And generates the compiler-time assertion in the way that Rust supports.
Three? There is no step three™.

That's it, what can be easier to explain and understand?

Yes, my solution is also kinda-sorta viral, because you would have that zero-sized ctx: impl LockOrdering everywhere, but because that's one trait implemented for one (albeit genetic) type… essentially you are just marking functions that deal with locks, which is mandatory for any compile-time scheme – and even if you make a mistake and they don't take any locks… this doesn't affect anyone!

Even better: all that compiler checking, of course, takes time – but you may easily turn TakenLockPriority into () and totally disable that compile-time checking during development, most of the time. Just run the full thingie on CI and it would tell you whether there are violations or not.

Too complex for my small brain

Posted Nov 11, 2024 16:04 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

Ah - I thought you had a simple scheme that worked better than Fuschia's one, not just one that's designed to force much more painful refactoring than Fuschia's, while being less effective at catching bugs. My bad.

The message you link has a solution that's almost the same in practice as Fuschia's one, but that as well as requiring you to order your locks, also requires you to give them priority numbers, and forces refactoring if you change priority numbers, even if the lock ordering is unchanged. It also requires you to refactor your code "virally" in every case where Fuschia's trait-based solution requires refactoring.

I had thought you were referring to a better solution than the Fuschia one that you're so critical of, but never mind.

Too complex for my small brain

Posted Nov 11, 2024 16:24 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

> The message you link has a solution that's almost the same in practice as Fuschia's one, but that as well as requiring you to order your locks, also requires you to give them priority numbers, and forces refactoring if you change priority numbers, even if the lock ordering is unchanged.

Have you ever heard about enums? They solve that issue pretty nicely.

Even just putting all locks in one large enum would be easier than declare that you may never-ever, under penalty of death, swap priorities of two locks.

> It also requires you to refactor your code "virally" in every case where Fuschia's trait-based solution requires refactoring.

Nope. All functions in that approach always, unconditionally, receive ctx: PhantomData<&mut impl LockPriority> (like this), you never mention the actual types of locks that you are taking in the function signature, only when you are taking them.

And, the most important, in practice: you don't need to specify types of locks that are called from callbacks used from your code!

If callback uses networking lock – your function, transparently, protects you from the deadlocks in networking code, if callback uses storage locks – your functions, transparently, protects your from the deadlocks in storage code.

You don't even need to do anything extra for that. No dances with traits, no parallel implementations or associated types… the rule is trivial: if you use locks – you accept ctx: PhantomData<&mut impl LockPriority> and pass it around… that's it.

If you swap priorities of two locks you need to swap them in the code, and in one more place: in the large enum where all locks are declared.

And if locks are separated in groups (like you are advocating for the netstack3 scheme, anyway) then it's enough to keep them in separate enums.

Too complex for my small brain

Posted Nov 11, 2024 16:28 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

Yet again, you're claiming that if I use your solution and tell it that I'm taking lock priority 3, then changing the meaning of "lock priority 3" doesn't force a refactor. This feels like an obvious falsehood.

In Fuschia's solution, you have a partial ordering of locks via LockBefore and LockAfter, and it does not matter if I rearrange as long as the rearrangement does not affect deadlock risk; your solution acts the same as IRQL in Windows NT, and requires me to also arrange that I never accidentally reuse lock priorities between two different locks.

Too complex for my small brain

Posted Nov 11, 2024 17:51 UTC (Mon) by khim (subscriber, #9252) [Link]

> This feels like an obvious falsehood.

Well, yes and no.

> Yet again, you're claiming that if I use your solution and tell it that I'm taking lock priority 3, then changing the meaning of "lock priority 3" doesn't force a refactor.

No, I'm claiming that that PREVIOUS_PRIORITY >= PRIORITY is unimportant implementation detail and you can put arbitrarily complex logic there. Heck, you may even encode all these LockAfter relationships there. Just simply encode all these types and their relationships in sequence of u8 bytes and implement virtual machine that can run arbitrary code in your const expression!

Whether that's feasible is another story, but that's certainly possible. const expressions are Turing-complete in Rust – on purpose.

Practically speaking you can just call the procmacro which can do whatever it wants to prepare comparison function. Including looking the answers in the SQL table on the other server! Again: not advisable, but possible.

Our whole discussion is like these famous clients that spent the entire meeting griping about the graphical appearance of the screen.

Your gripe is not about something that is hard to change, but about something that can be trivially change. It's not even part of my “scheme”, in real code I needed much complicated relations because partial order was not suitable.

> In Fuschia's solution, you have a partial ordering of locks via LockBefore and LockAfter,

Yes. And I couldn't implement anything else.

> and it does not matter if I rearrange as long as the rearrangement does not affect deadlock risk

IOW: netstack3 solution guarantees that my car doesn't ever go over speed limit by removing wheels from it. Sure, car without wheels would never go over speed limit… but how is it useful?

Why I would need automatic deadlock prevention for refactorings that are always correct, by construction? These are trivial.

The refactorings where I need help most are refactorings that may introduce deadlocks if they are not performed correctly – and these are precisely the refactorings which are extremely difficult and painful in netstack3 approach.

I guess making these refactorings, that are already complicated even more complicated… works for deadlock prevension – but approximately as well as removal of wheels from the car helps it to never pass the speed limit.

> your solution acts the same as IRQL in Windows NT, and requires me to also arrange that I never accidentally reuse lock priorities between two different locks.

Which is trivial if you only have 77 locks in one component (like netstack3 have). And I guarantee that this “car without wheels” would never be able to handle enough locks for the trivial solution of putting them all in one enum wouldn't be enough.

Too complex for my small brain

Posted Nov 12, 2024 13:50 UTC (Tue) by paulj (subscriber, #341) [Link]

Makes sense. And not even Rust specific in concept. Transfers to anything with constexprs. You could implement something like this in C even I think. Nice :)

Too complex for my small brain

Posted Nov 6, 2024 14:31 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> TBH this code looks pretty much write-only to me.

Why? This looks pretty easy to understand to me. A bit ugly, yes, both C++ and Rust syntax related to generics and templates is ugly, but quite simple to understand.

> Also, the almost complete lack of comments doesn't help it.

Type definitions are comments and they can be read by human as well as the compiler.

> Context: I develop a lot of code written in Rust

Context: I haven't wrote a lot of Rust code but at my $DAYJOB work with TMP is my bread and butter.

From where I'm sitting Rust generics look like a very limited tool compared to C++ templates (reflection in C++26 is very nice and even limited reflection in C++17 is much easier to use than pile of Rust macros), but given the fact that literally everything else is easier to use in Rust… I can live with them.

Too complex for my small brain

Posted Nov 6, 2024 18:54 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)

Rust generics are limited, in my understanding, because they don't want people to start using TMP to do things that are better accomplished with proc macros (and also for other reasons like "overload resolution is difficult to reason about"). Unfortunately, it is possible to do many (not all) of those things with nested/recursive macro_rules! instead of proc macros, and that's... probably not quite as bad as TMP, but it gives TMP a run for its money.

Definitions for people who don't Rust/C++:

* TMP: Template meta-programming, taking advantage of the C++ template system's overload resolution and backtracking to write arbitrarily complicated compile-time logic.
* Proc macros: Procedural macros, or macros that work by taking a stream of tokens, running it through some normal-ish Rust code at compile time, and feeding the result directly to the compiler.
* macro_rules!: Macros by example, or macro definitions that vaguely resemble C preprocessor macros (but with namespacing, greater flexibility, and stronger guardrails against doing things that don't make sense).
* Nested/recursive macro_rules!: Macro definitions that expand into calls to themselves and/or into additional macro definitions (which are subsequently used by the outer macro to do branching and the like). Usually involves the outer macro matching :tt and passing it along to the inner calls for further dissection.

Too complex for my small brain

Posted Nov 6, 2024 21:42 UTC (Wed) by khim (subscriber, #9252) [Link]

> Rust generics are limited, in my understanding, because they don't want people to start using TMP to do things that are better accomplished with proc macros

They are limited by the desire to get better diagnosis in the case of error. It's the same story as with statically typed language and dynamically typed languages: it's easier to write code in a dynamically typed languages but harder to debug it.

Metaprogramming template language in C++ is dynamically typed with optional static annotations (close to python in spirit if not in syntax).

Metaprogramming generics language in Rust is statically typed because of the best intentions: to make sure error messages would be readable. Unfortunately we all know where this road leads: error messages are kinda-sorta understandable if you can easily express what you need in an additive form, but if you couldn't… ho, boy that's where the fun begins.

Thankfully when we are talking about keeping invariants unbroken you don't really need that machinery and can safely ignore it. Instead of working with types in Rust, you can just go and use constant verifiers (like in my example above). It's much easier to encode “business logic” in const expressions because they don't need to be additive and you have plenty of normal logical operations there.

That's not a substitute for what one can do in C++ (because in C++ you can go back to types after working for a bit with constants), but it's more than enough to handle these “invariant verification” cases where you only need one of two ourcomes: code compiles (and, hopefully, works) or code doesn't compile (and you go and fix it).

That's what tilted the decision for me: 99% of time I need complex metaprogramming simply to reject the bad code, not to actually play type tetris. Verification with const is enough for that.

Uncheckable by the compiler does not imply `unsafe`

Posted Nov 6, 2024 6:35 UTC (Wed) by jbills (subscriber, #161176) [Link] (4 responses)

> Send is unsafe not because it is inherently dangerous, but just because it represents a property that the compiler cannot check.

This is not entirely true. `Send` and `Sync` are unsafe because incorrectly implementing it can allow you to violate a safety invariant, namely the mutable xor shared property. This cannot be checked by the compiler, but there are other uncheckable properties that aren't safety violations. Probably the most common example in Rust would be the `Ord` trait, where the properties that make it not a partial order cannot be guaranteed by the compiler. This in turn means that unsafe code cannot rely on those properties to ensure their safety. In general, traits in Rust can be used to mark unprovable properties, and unsafe traits can be used to mark unprovable safety properties.

This is something that takes newcomers to Rust a little bit to ingest, as there is often a temptation to mark safe functions unsafe in order to add a warning label to it rather than to mark a genuine safety issue.

Uncheckable by the compiler does not imply `unsafe`

Posted Nov 6, 2024 7:31 UTC (Wed) by emk (subscriber, #1128) [Link] (3 responses)

There are interesting philosophical questions about "safe" in kernel space. For example, assume an API allows you to safely talk to a hardware subsystem that can be used to write to arbitrary memory.

A userspace example of this is file I/O to /proc, which potentially allows you to write to the memory of a running process. The file I/O itself is safe, but it can come around through the back door and do unsafe things.

But it's silly to make file I/O as unsafe, even if it can theoretically be used to trigger undefined memory behavior.

So how do you mark functions that communicate with a programmable MMU?

Uncheckable by the compiler does not imply `unsafe`

Posted Nov 6, 2024 8:08 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

In userspace, the process is the default or presumed safety boundary. You can cause all kinds of bizarre misbehavior by talking to the OS, or by having other processes ptrace you, or whatnot, but that is not Rust's problem because those things are beyond the scope of your process, and Rust cannot be reasonably expected to protect you from the entire system.

In kernelspace, it's significantly harder. I would suggest something along these lines:

* If the system is secure boot enabled, there is no boundary. Everything on the system is within the scope of Rust's safety guarantee. This is because, in such a setup, you really are expected to prevent all (kernelspace) UB under all circumstances, no matter what userspace tries to do, so there is no natural point where we can draw a line and say "it is unreasonable to expect Rust to protect against this."
* If the system is not secure boot enabled, then the boundary is whatever subset of userspace has the necessary privileges to subvert the kernel (write to /proc/kcore, load kernel modules, etc.). You don't have to prevent that from happening, so Rust's safety guarantees should not extend to interfaces that are exposed to such privileged userspace processes. But even then, those guarantees do still extend to all other userspace processes, suggesting that there would need to be some kind of type-state API for distinguishing between privileged and unprivileged userspace processes (in the hypothetical where the whole kernel is Rust).

In either case, "safety" specifically means that the kernel must not perform UB. It does not impose any particular requirements on things that happen in userspace. If some userspace process corrupts its heap and crashes, and then the kernel properly cleans up the dead process, then the kernel has not performed UB, so Rust's safety rules have not been violated with respect to the kernel.

As for your MMU, note that Rust already considers all volatile reads and writes to be unsafe, because you can only do them through raw pointers, and raw pointer reads and writes are unsafe. I must admit that I have never written code that talks to an MMU before, but I find it hard to believe you can do it without some degree of raw pointer manipulation (regardless of whether it needs to be volatile or not), which will be unsafe if you want to dereference anything.

Uncheckable by the compiler does not imply `unsafe`

Posted Nov 6, 2024 8:57 UTC (Wed) by znix (subscriber, #159961) [Link] (1 responses)

What makes an MMU unique though is that you can't easily wrap it with a memory-safe API like you can with most other peripherals.

If you're setting up a network adapter for example, you can do what you want to any register (save for those relating to DMA) and it won't break memory safety. But for an MMU, if you map the wrong physical page to the wrong process it can clobber the kernel's memory. I can't really see how you can make any interface smaller than more-or-less the entire virtual memory manager safe.

So you have a function to map an arbitrary page into a process's address space. Calling the function wrong will never cause UB, nor will it cause the kernel itself to corrupt memory - so it's safe? But it could allow the process you're mapping it into to clobber the kernel's memory if you asked it to map the wrong page, so it's unsafe? I think this is the philosophical question emk was asking about.

My suggestion is that this is one of the rare cases where safe vs unsafe isn't a particularly useful distinction, since the rust code itself being memory safe isn't any more important than the overall correctness of the code.

Uncheckable by the compiler does not imply `unsafe`

Posted Nov 6, 2024 12:00 UTC (Wed) by matthias (subscriber, #94967) [Link]

> I can't really see how you can make any interface smaller than more-or-less the entire virtual memory manager safe.

This should be the same reason why it is unsafe to implement an allocator in userspace. If the allocater returns an incorrect address, then all bets are off. And of course, the kernel memory allocator has to be correct. But this is just one component of the virtual memory manager.

> So you have a function to map an arbitrary page into a process's address space.

Why should this function take arbitrary pages. You can have different types for pages allocated for userspace and kernelspace. And even for different purposes in kernelspace. And then it boils down to the "X-safe" feature. Do you encode the difference between user pages and kernel pages in the type system or not?

And of course there is no single correct answer to the question which (safety) properties should be encoded in the type system. There has to be a balance between complexity (of the encoding in the typesystem) and safety. And I think that this balance is much harder to find in kernel space. But also in userspace this is nowhere near trivial. There is the agrred border of unsound (undefined) behavior. But there are definitely people that want to avoid deadlocks or memory leaks also in userspace.

Real Rust type system usefulness

Posted Nov 6, 2024 9:37 UTC (Wed) by dottedmag (subscriber, #18590) [Link] (14 responses)

Thank you. Finally it clicked for me what type systems like one employed by Rust can bring to the code that is hard to express in other languages.

It would be nicer if constraints encoding was first-class language construct and not a side-effect of a type system, but in absence of such a faculty, an expressive type system is a passable substitute.

Real Rust type system usefulness

Posted Nov 6, 2024 12:20 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

> It would be nicer if constraints encoding was first-class language construct and not a side-effect of a type system,

Using the type system has one big advantage. You can encode arbitrary constraints. The constraints are spelled out in the documentation and are enforced by the module that implements the type. The enforcing is done by ensuring that only valid values can be constructed.

If you add first-class features, then you have a syntax for describing constraints. But this syntax limits the nature of constraints that can be encoded. Such a syntax would still be nice for types of constraints that are commonly used. But the type system has to be there to allow to encode everything that is not covered by the syntax for constraint specification.

Rust already allows to specify constraints directly in the language. If I write (bool,bool), then rust enforces that the two bytes of memory may only contain values 0 and 1. Specifying tuples, structs, and enums naturally gives constraints on the represented values. It just happens that until now, nobody has added the feature of defining lock orders as a first-class feature of the language. One could add such a feature, but then the next person comes and wants to encode special properties for MMU handling in the type system ;) So there will always be constraints that one wants to express, but which are not first-class features. Which types of contraints should be first-class features is of course open for debate.

Real Rust type system usefulness

Posted Nov 6, 2024 12:36 UTC (Wed) by daroc (editor, #160859) [Link] (6 responses)

There are some systems that let you write arbitrary pre- and post- conditions for functions. They don't suffer from the problem you're pointing out, because you can write your own custom constraints.

However, the line between systems like this and type systems can actually be a bit blurry. Languages with full dependent types, such as Idris, incorporate those constraints directly into the type system. Other languages (Ada SPARK, I think?) keep them separate.

Ultimately, like many other problems in programming language design, how you integrate features for making programs more correct into the rest of the language is a complex tradeoff between consistency, explainability, and what people expect from the language. Whether integrating these things into Rust's type system is ultimately the best approach for the language or not, I cannot say.

Real Rust type system usefulness

Posted Nov 6, 2024 16:37 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

There's clearly a spectrum between writing a function call which works max(4, 10) and nonsense like maxxx("Lots", 10, 1.6)

Almost any language will reject attempts to call a function which simply doesn't exist, many at compile time.

Fewer but still most will reject a call with the wrong number of parameters

Fewer again, but still a great many, have type constraints, "Lots" is a string, that's probably not OK

And then some will observe finer distinctions, "Lots" is not only a string, it's not even a string like "15" which can be coerced into a number, it's something else. This is rarely something available at compile time and when it is available it's usually magic, not general purpose.

Rust's compiler, faced with a call to std::str::from_utf8([/* Some values */]) for example, will cheerfully look at a constant array you're passing, decide whether it's not actually UTF-8 encoded text, and warn (which can be suppressed if appropriate) that as a result this call always errors. But that's magic, if we make our own type with different rules the same doesn't happen.

Shifting left is valuable. Finding out that C++ operator precedence means your logging doesn't have the desired effect... by reading the useless results in a customer bug report is bad. Finding out when the CI testing the potential new release builds fails six hours after you wrote the code is better but still inconvenient. Finding out five minutes after writing the code that now the unit test cases broke locally is enormously better but not perfect. Finding out because a red squiggly line appeared in the editor when you wrote the logging code is extremely good. Only "It just fucking works" is finally the ideal and unlike the other options that's a language design change and doesn't help with your next bug.

Real Rust type system usefulness

Posted Nov 6, 2024 22:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> There are some systems that let you write arbitrary pre- and post- conditions for functions.

But they typically are enforced only at runtime, with limited compile-time support.

A good example is Eiffel language.

Real Rust type system usefulness

Posted Nov 6, 2024 22:51 UTC (Wed) by daroc (editor, #160859) [Link] (3 responses)

Certainly, there are systems that enforce pre- and post- conditions at run time. But they're not so overwhelmingly common that I would call them "typical". Idris [1], LiquidHaskell [2], and Ada SPARK [3] all check their conditions at compile time.

[1]: https://www.idris-lang.org/
[2]: https://ucsd-progsys.github.io/liquidhaskell/
[3]: https://www.adacore.com/about-spark

Real Rust type system usefulness

Posted Nov 6, 2024 23:06 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

I'm not familiar with Idris and Haskell, but SPARK can't represent this kind of invariants at compile time.

It has something similar ("Type Invariant") that can be checked at compile time, but it's limited with the type itself. It can't express an invariant "there exist a path between this type and the root of the lock hierarchy".

Real Rust type system usefulness

Posted Nov 6, 2024 23:19 UTC (Wed) by daroc (editor, #160859) [Link]

Ah, alright. I haven't used SPARK, so I must have misunderstood. I thought that SPARK could put function preconditions through a SAT solver at compile time.

I have used the other two, though, and can confirm that they operate entirely at compile time.

Real Rust type system usefulness

Posted Nov 11, 2024 16:30 UTC (Mon) by zmower (subscriber, #3005) [Link]

I don't see why you would need more than the type itself as Ada does not have multiple inheritance. There's always a path to the root!

Real Rust type system usefulness

Posted Nov 6, 2024 12:40 UTC (Wed) by LtWorf (subscriber, #124958) [Link]

Ada exists, but I think it's not very used these days.

Real Rust type system usefulness

Posted Nov 6, 2024 16:36 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (4 responses)

There has been talk of first-class pattern types, which would be a significant step in this direction: https://github.com/rust-lang/keyword-generics-initiative/...

I have no idea if it's still being worked on or not, but there's no obvious indication that it has been abandoned.

Pattern Types.

Posted Nov 6, 2024 22:31 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Some day I want Pattern types so that I can make BalancedI8, BalancedI16 and so on which are types exactly like the signed integer types i8, i16 etc. but without their lop-sided most negative value.

Without this value, these types have a niche, meaning Option<BalancedI8> is a single byte, Option<BalancedI32> is only four bytes, and so on. Pattern Types unlock niches for ordinary stable Rust by third parties.

This delivers the same functionality as making these most negative values sentinels, but with the much improved ergonomics of Option. I think there are a lot of variables in software for which -10, +10 and crucially *zero* (which would be forbidden in the similarly sized but stdlib-blessed NonZeroI8 type) are reasonable values but -128 is not.

Today Rust's niches can be made at will by its own standard library (hence NonZeroI8 for example, but also OwnedFd and numerous others) but for us lay folk writing our own software this feature is permanently unstable compiler only magic (you can of course invoke this magic at your own risk, that's what my unstable "nook" crate does, but it's specifically marked as never-to-be-stabilised). The only lay person accessible new niches are from the sum types (enum). If I make an enum with say two hundred variants that contain no data, Rust will use two hundred of the 256 bit patterns for a byte to represent that enum and, it will mark all the remaining 56 patterns as a niche.

This is how the current era of CompactString works. The type has 24 bytes, the last byte is an enum, with 128 values representing ASCII characters, then 64 values representing non-ASCII last bytes of valid 24 byte UTF-8 strings, then lengths zero through to 23 inclusive signifying that the string didn't use this last byte because it was shorter, then one value signifying the strings was too big and lives on the heap instead and finally one more for static text. This leaves a large niche, so Option<CompactString> is a 24 byte data structure which can be None, or Some(string) where the string only needs a heap allocation for text longer than 24 UTF-8 code units. Fancy.

Real Rust type system usefulness

Posted Nov 8, 2024 0:42 UTC (Fri) by aszs (subscriber, #50252) [Link] (2 responses)

From lurking in the rust-lang Zulip it appears that it is progressing but it's a pretty invasive change to the type system so it's blocked on completion of the new Chalk solver and will thus take a while. To tie this into earlier comments, Rust patterns are essentially liquid types like liquid Haskell but with the patterns constrained to set operations to avoid having to require a SAT solver -- there's not much appetite to make Rust compile times NP Complete!

Real Rust type system usefulness

Posted Nov 8, 2024 16:53 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> there's not much appetite to make Rust compile times NP Complete!

Um. Rust compile times are already NP-hard… would promotion to NP-competeness do that much difference?

Real Rust type system usefulness

Posted Nov 11, 2024 17:22 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

If you want to start nitpicking, that blog post does not describe an instance of NP-hardness. It describes an instance of co-NP-hardness (i.e. you can only quickly verify that the compiler correctly checked for pattern refutability if the compilation fails and gives you an error identifying the refutation, so the problem being solved is the complement of SAT rather than SAT itself), which is not provably equivalent to NP-hardness (and is generally believed to be non-equivalent).

But anyway, I think the broader objection to this argument looks roughly like the following:

* It is not typical to match patterns of the size and complexity of the patterns shown in that blog post. They are a pathological edge case that nobody uses in practice.
* Refutable patterns (if let, match with a _ case, etc.) only need to be checked at runtime (and for structural correctness), which is also fairly straightforward unification logic.
* If you want to start allowing refutable pattern matching at compile time, things get significantly harder, because now every type has to carry around information about which patterns it is known to match, and you have to use that information to more precisely infer which other patterns are refutable, or else it's not doing much good (what is the point of knowing that my type matches 1..5 if I can't write separate match arms for 1, 2, 3..5 without a catchall _ arm?). That means you have to match patterns against each other, rather than merely matching objects or (simple/non-pattern) types against patterns. The unification logic gets a lot more complicated in that case, and it is far more plausible that it has an accidental complexity explosion in some edge case that users might really run into.