Insulating layer?

Posted Oct 16, 2024 7:44 UTC (Wed) by taladar (subscriber, #68407)
In reply to: Insulating layer? by paulj
Parent article: On Rust in enterprise kernels

As long as you are the only one working on your code and you don't make mistakes you might be able to rely on your assumptions but I think the strength of Rust (and similar strict languages) is that they can guarantee that as long as there aren't any compiler bugs all programmers working on your project, including you on a bad day where you do make mistakes, will follow the rules.

Insulating layer?

Posted Oct 17, 2024 9:19 UTC (Thu) by paulj (subscriber, #341) [Link] (13 responses)

Yes, I get that. Rust can enforce rules, and that's great.

My point is more that my own system of ad-hoc, non-compiler-enforced rules for being able to manage the problem of lifetimes of objects and their references is simpler than Rusts'. In my system I basically try to have just 2 scopes for references - the very local, and then the refcounted. The latter are "safe" use-after-free issues (I don't use usually use refcounting machinery that deals with concurrency, but you could).

Rust goes further than that, and provides for arbitrary scopes of lifetimes for references to different objects. Great, lot more powerful. But... it also becomes harder to reason about when you start to make use of that power, it seems to me. I can't keep track of more than a couple of scopes - that's why my ad-hoc safe-guards in lesser languages are so simple, and I think it's part of why I struggle to get my head around Rust's error messages when I try write more interlinked, complex data-representations, in my own toy test/learn coding attempts. The fact that a lot of code ends up going back to refcounted containers suggests I might not be alone, not sure.

What I'm saying is that simple programmers like me might be better off with a "safe" (i.e. enforcing the rules) language with a more constrained, simpler, object lifetime management philosophy.

Insulating layer?

Posted Oct 17, 2024 10:36 UTC (Thu) by Wol (subscriber, #4433) [Link] (12 responses)

This sounds a bit like alloca?

I wonder. Could you create a "head" object who's lifetime is the same as the function that created it, and then create a whole bunch of "body" objects that share the same lifetime?

These objects can now reference each other with much fewer restrictions, because the compiler knows they share the same lifetime. So there's no problem with the head owning the next link down the line etc etc, and each link having a reference to the link above, because they'll all pass out of scope together?

Of course that means you can't access that data structure outside the scope of the function that created it, but that would be enough for a lot of purposes?

Cheers,
Wol

Insulating layer?

Posted Oct 17, 2024 11:45 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

> that would be enough for a lot of purposes?

That would also be rather pointless for a lot of other purposes (among them: freeing a bunch of body objects, oops you suddenly have scoping problems anyway), so what's the point?

Insulating layer?

Posted Oct 17, 2024 12:25 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> so what's the point?

It feels pretty simple to me - cf paulj's comment that full blown Rust just seems to blow his mind.

Does Rust actually have a memory allocator, or does it have a "new"? That was the point of my mention of alloca - as the caller goes out of scope, any memory allocated by alloca goes out of scope as well. You don't free alloca memory.

Could the Rust compiler handle "this memory is now out of scope, just forget about it"? If standard Rust rules say an object cannot own an object with a lifetime longer than itself, just dropping the object can't do any harm, can it?

You're effectively setting up a heap, with its lifetime controlled by "head". So you just drop the entire heap, because every object in the heap has had the heap lifetime constraints imposed on it.

Cheers,
Wol

Insulating layer?

Posted Oct 20, 2024 11:11 UTC (Sun) by ssokolow (guest, #94568) [Link]

Does Rust actually have a memory allocator, or does it have a "new"?

That's the concern of the data type, not the language.

If you want a new, you give the struct at least one private member so it can't be initialized directly and write a public associated function (i.e. a public class method) which constructs and returns an instance (Named new by convention only.) ...but that doesn't automatically mean heap allocation. It's purely a matter of whether there are "correct by construction" invariants that need to be enforced.

If you want to heap-allocate, you either use a type which does it internally like Box<T> (std::unique_ptr in C++) or Vec<T> or you do as they do internally and use the unsafe wrappers around malloc/calloc/realloc/free in std::alloc.

Could the Rust compiler handle "this memory is now out of scope, just forget about it"?

It does. If you want a destructor, you impl Drop and, if you don't, a stack allocation will just be forgotten until the whole frame is popped and a heap allocation will go away when the owning stack object's Drop is run and frees the memory.

Rust's design does a good job of making its memory management appear more sophisticated than it really is. It's really just stack allocation, access control, destructors but no actual language-level support for constructors, and using RAII design patterns to implement everything else in library code on top of manual calls to malloc/realloc/free.

The borrow checker plays no role in machine code generation beyond rejecting invalid programs, which is why things like mrustc can exist.

Insulating layer?

Posted Oct 17, 2024 14:07 UTC (Thu) by paulj (subscriber, #341) [Link] (8 responses)

Samba has "talloc" which supports having heap objects allocated as a hierarchy. Freeing an object frees all its children in the hierarchy.

You could take that kind of tack and allocate stuff on the stack with alloca, sure.

Neither hierarchical allocation, nor stack allocation, address the issue of tracking validity of references though, as such. As you're implying, that requires something else - be it an ad-hoc system of rules that enforce guarantees, assuming the programmers' can hold themselves to applying those rules (and... they will fail to every now and then); or whether they are rules in the language and enforced in the compiler.

The question for me is: What is the most programmer friendly system of rules to guarantee safety?

Rust is one example of that. With a very general lifetime typing system (the most general possible?). That generality brings complexity. Yet, many Rust programmes have to step out of that compile-time lifetime-type system and use runtime ref-counting. So then the question is, if the lesson from Rust is that many many programmes simply go to runtime ref-counting for much of their scoping, would it be possible to just have a less general, simpler lifetime-typing system?

E.g., perhaps it isn't necessary at all to even need to represent lifetime types. Perhaps it would be sufficient to have 2 kinds of references - local and refcounted, with well-defined and safe conversion semantics enforced by the language.

Insulating layer?

Posted Oct 17, 2024 16:00 UTC (Thu) by khim (subscriber, #9252) [Link]

> E.g., perhaps it isn't necessary at all to even need to represent lifetime types. Perhaps it would be sufficient to have 2 kinds of references - local and refcounted, with well-defined and safe conversion semantics enforced by the language.

That's where Rust have started, more-or-less. The language which used this approach, Cyclone, is listed among many languages that “influenced” Rust.

Only it's not practical: language that you are getting as a result doesn't resemble C at all! Or, more precisely: language would look like a C, but it's APIs would be entirely different. Even most trivial functions like strchr are designed around ability to pass reference to local variable somewhere. It's not even possible to pass buffer that you would fill with some values to another function!

I'm not entirely sure all the complexity that Rust have (with covariance and contravariance, reborrows and so on) is needed… technically – but it's 100% wanted. Till Rust 2018 introduced Non-lexical lifetimes Rust was famous not for its safety, but for it's needless strictness. I think almost every article till that era included a mandatory part which was telling you tales how your “fight with the borrow checker” is not in vain and how it enables safety, etc.

After introduction of NLL, reborrows, HRBTs and so on people stopped complaning about that… and started complaining about complexity of the whole thing… but it's highly unlikely that people would accept anything less flexible: they want to write code and not fight with a borrow checker!

> So then the question is, if the lesson from Rust is that many many programmes simply go to runtime ref-counting for much of their scoping, would it be possible to just have a less general, simpler lifetime-typing system?

Sure. Swift does that.

It has quite significant performance penalty, but still much faster than many other popular languages.

Insulating layer?

Posted Oct 17, 2024 17:10 UTC (Thu) by Wol (subscriber, #4433) [Link] (5 responses)

> Neither hierarchical allocation, nor stack allocation, address the issue of tracking validity of references though, as such. As you're implying, that requires something else - be it an ad-hoc system of rules that enforce guarantees, assuming the programmers' can hold themselves to applying those rules (and... they will fail to every now and then); or whether they are rules in the language and enforced in the compiler.

But does it *have* to?

If you have a procedure-level heap, or some other data structure with a guaranteed lifetime, you apply exactly the same borrow-rules but on the heap level. If all references to the heap have the same or shorter lifetime than the heap, when the heap dies so will the references.

So you create a heap for your linked list as early as possible, and you can freely create references WITHIN that heap as much as you like. They'll die with the heap. It's not meant to be an all-singing-all-dancing structure, it's meant to have a couple of simple rules that make it easy to create moderately complex data structures. You probably wouldn't be able to store pointers in it that pointed outside of it, for example. You would have to be careful storing references to items with shorter lifetimes. But if you want to store something with loads of complex *internal* references, it would be fine precisely because of the "everything dies together" rule.

So you'd use the existing rust checking setup, just that it's a lot easier for you as the programmer to keep track of, precisely because "it's an internal self reference, I don't need to care about it".

Cheers,
Wol

Insulating layer?

Posted Oct 17, 2024 17:27 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

You may be interested in Vale, a programming language that is trying a bunch of new things around lifetime-based memory safety, including compiler support for "regions", which work almost exactly how you describe. The project also has some interesting ideas about tradeoffs between different memory management strategies. I want to write an article about it at some point.

Insulating layer?

Posted Oct 18, 2024 16:33 UTC (Fri) by paulj (subscriber, #341) [Link]

That is very interesting. The generational references sound useful for performant weak-references. Although, you can not use them for very frequently allocated objects (you can't reuse the memory past <generation size>_MAX).

Insulating layer?

Posted Oct 18, 2024 14:25 UTC (Fri) by taladar (subscriber, #68407) [Link] (2 responses)

That is essentially just an arena allocator. It solves the problem of forgetting to free something but doesn't solve e.g. accessing something that should not be used anymore after a certain operation.

Insulating layer?

Posted Oct 18, 2024 16:24 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> but doesn't solve e.g. accessing something that should not be used anymore after a certain operation.

And can't you just apply ordinary Rust rules to that? It's not intended as a way of escaping Rust's rules. It's just meant as a way of enabling the *programmer* to forget abut a lot of the rules on the basis that internal references, pointers, objects will all go invalid at the exact same time. So if A points to B and B points to A and they're in this structure you don't worry about cleanup because they both go poof and emit the magic smoke at the same time.

If A contains a pointer to C with a shorter (or longer) lifetime, Rust will need to check that A destroys the pointer as C goes out of scope, or alternatively that the entire heap cannot go out of scope until C is destroyed.

A simple structure for simple(ish) situations, and more complex structures for where simple doesn't work. And if those complex structures are hard to grasp, it helps if you've realised that the simple structures aren't up to the task (and why).

Cheers,
Wol

Rust has arena allocators

Posted Oct 18, 2024 16:35 UTC (Fri) by kleptog (subscriber, #1183) [Link]

There appear to be several arena allocators for Rust: https://manishearth.github.io/blog/2021/03/15/arenas-in-r...

The basic idea is you have a function with allocates an arena and keeps it alive. Within the arena objects can reference each other as much as they want, including cycles. They can also reference other objects, as long as they live longer than the arena. When your function exits, the arena is cleaned up in one go. Incompatible with destructors (though there are tricks for that), but otherwise looks like what you want.

I know them from PostgreSQL where they have an arena per query so interrupting the query safely throws away everything. There are plenty of use cases for them.

Insulating layer?

Posted Oct 18, 2024 14:23 UTC (Fri) by taladar (subscriber, #68407) [Link]

> many many programmes simply go to runtime ref-counting for much of their scoping

This is not my experience with Rust at all. Refcounting does happen occasionally but the vast, vast majority of cases doesn't use ref-counting at all in Rust. Usually it is no more than a few values (logically speaking, some of them can exist in many copies of course but the one that is referred to by the same name in the code and passed along the same code-paths) even in large programs.