Insulating layer?

Posted Oct 14, 2024 11:00 UTC (Mon) by paulj (subscriber, #341)
In reply to: Insulating layer? by farnz
Parent article: On Rust in enterprise kernels

I have tried playing with Rust. There's a lot to like. There's some stuff I dislike - the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

In the little I've played with Rust, the lifetime annotations make sense on simpler stuff. But it seems to get very baroque on more complex stuff. And I havn't gotten over a certain "barrier" on how to design the not-simpler kinds of inter-linked relationships that are unavoidable in real-world stuff. E.g. a network protocol, you may have objects and state for "peers", "connections", "TLS contexts", "Streams", "message entity" (plurality of these) with m:n relationships between them. A peer may have multiple connections, messages may have a lifetime related to a /set/ of peers and not just the peer it was received from, etc. You have to use RC for at least some of these - so now you're dealing with /both/ refcounting AND lifetime annotations for the refcounting objects (in other languages you would embed the RC information in the refcounted object, but in Rust the RC object holds the refcounted object), and there are rules for getting data into and out of RCs and cells and what not. And I end up with long screeds of compiler error messages that give me flashbacks to C++ - maybe Rust's are better, but I've been traumatised. The complexity of implementing simple linked lists in Rust, without unsafe, is a bit of a red flag to me.

I think the local stack variable lifetimes possibly could be dealt with in simpler ways, perhaps with more limited annotations, or perhaps none at all but just compiler analysis. It might be more limited than Rust, but be enough to deal with typical needs for automatic data in long-lived programmes.

What I'd like, for long-lived programmes with non-simple relationships in their data model (e.g., many network protocols) is a language that did ref-counting /really/ well - ergonomically, with low programming complexity, and minimal runtime overhead. Cause refcounting is how all non-trivial, long-lived programmes, that can not tolerate scannning GC, end-up managing state.

Maybe I'm just too stupid for Rust. Like I'm too stupid for Haskell. Maybe refcounting is the right answer for my lack of intelligence - but then it seems many other programmers are like me. And it seems maybe even the best Rust programmers too, given how they always end up reaching for (A)RC too.

Insulating layer?

Posted Oct 14, 2024 11:28 UTC (Mon) by mb (subscriber, #50428) [Link]

>the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language
>to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

Rust can do that, too. It's called proc-macro

Challenges learning how to model data structures in Rust

Posted Oct 14, 2024 12:26 UTC (Mon) by farnz (subscriber, #17727) [Link] (10 responses)

One of the challenges I found in moving from C++ to Rust is that in C++, you model ownership in your head; you sort-of know whether a given pointer represents owning something, shared ownership, or simply referring to something someone else owns. Rust doesn't let you play fast-and-loose with ownership like this; the borrow checker means that you have to be extremely clear about who owns what, and when, using T for "this T is embedded here", Box<T> for "this T is owned, but is stored out of line", and Arc/Rc<T> for "ownership of this T is shared".

You can also write a simple linked list in 40 lines of Rust without any unsafe at all - the use of unsafe is because you want a complicated linked list that meets more interesting properties than the simple linked list implementation does.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 9:09 UTC (Thu) by paulj (subscriber, #341) [Link] (9 responses)

I was referring more to "simple" doubly-linked lists on the complexity side.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 9:39 UTC (Thu) by farnz (subscriber, #17727) [Link] (8 responses)

Making a doubly-linked list in safe Rust isn't that much more complicated, although it does bring in the need for Rc, since a doubly-linked list has complex ownership (the previous pointer cannot be owning).

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 10:10 UTC (Thu) by paulj (subscriber, #341) [Link] (2 responses)

Yes, course you can do with with RC. But RC is a "fuck it, we'll do it live!" escape hatch to runtime, out from the compile-time lifetime typing system,

To stay within the typed lifetime-scope of Rust, without unsafe, is somewhere between extremely complex and impossible, isn't it? (I thought impossible, but then I thought I saw a tutorial claiming it was possible once - but I can't find it back right now).

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 14:23 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

The difficult bit is the ownership model if you don't use refcounting; you want each node to be individually mutable, but you also want each node to be shared by both the previous and the next nodes. That's inevitably difficult in a "shared, or mutable, but not both" model as Rust imposes, unless you use refcounting.

Challenges learning how to model data structures in Rust

Posted Oct 18, 2024 14:38 UTC (Fri) by taladar (subscriber, #68407) [Link]

Something would probably be possible with interior mutability but it likely would need locking or some other internal mechanism to prevent mutation from both ways to access it at the same time.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 11:48 UTC (Thu) by excors (subscriber, #95769) [Link] (4 responses)

That looks like a poor example - it doesn't implement any operations beyond "new" and "append", so it never even uses its "prev" pointers. The much more thorough exploration at https://rust-unofficial.github.io/too-many-lists/ calls a similar Rc<RefCell<...>> design "A Bad Safe Deque" and notes "It was a nightmare to implement, leaks implementation details, and doesn't support several fundamental operations". Eventually it gets to "A Production Unsafe Deque", which looks pretty complicated. Singly-linked lists are much easier (though still not trivial), the double-linking is the big problem.

(It also explains:

> Linked lists are terrible data structures. Now of course there's several great use cases for a linked list: [...] But all of these cases are _super rare_ for anyone writing a Rust program. 99% of the time you should just use a Vec (array stack), and 99% of the other 1% of the time you should be using a VecDeque (array deque). These are blatantly superior data structures for most workloads due to less frequent allocation, lower memory overhead, true random access, and cache locality.
>
> Linked lists are as _niche_ and _vague_ of a data structure as a trie. Few would balk at me claiming a trie is a niche structure that your average programmer could happily never learn in an entire productive career -- and yet linked lists have some bizarre celebrity status.

Generally, you probably shouldn't judge a language on the basis of how easily it can implemented linked lists. On the other hand, one of the listed "great use cases" is "You're writing a kernel/embedded thing and want to use an intrusive list", which is probably much more common amongst LWN readers than your average programmer; it's fair to have different priorities when judging languages.

But I think it's still important to consider whether your design uses (doubly-)linked lists because they really are the best data structure for your needs, or whether it's just because linked lists are really easy to implement in C (and an equivalent of Vec is hard) so they're your default choice for any kind of sequential container, and you're trying to apply the same mindset to Rust where they're really hard (but Vec is easy) and they should never be the default choice.)

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 12:22 UTC (Thu) by khim (subscriber, #9252) [Link]

> But I think it's still important to consider whether your design uses (doubly-)linked lists because they really are the best data structure for your needs, or whether it's just because linked lists are really easy to implement in C (and an equivalent of Vec is hard) so they're your default choice for any kind of sequential container, and you're trying to apply the same mindset to Rust where they're really hard (but Vec is easy) and they should never be the default choice.)

Tragedy of lists (and the reason Rust had to wait till 2015 to be viable) lies with the fact that while double-linked lists are very bad data structure today – that wasn't the case half-century ago when Computer Science was established and first courses that teach people how to program were developed.

Back then computers big, but their memories (and thus programs written for these computers) were small, while RAM was fast and CPU slow.

Linked lists really shined in such environment!

Today computers are tiny, but have memory measured in gagabytes, programs are crazy big and CPUs are hudnreds of times faster than RAM.

Today linked lists are almost never the right data structure for anything and even when they actually better than something like Vec the difference is not large.

But people are creatures of habits: after they have learned to deal with linked lists once they tend not to ever return back and use them as long as they could.

Worse yet: while it's true, today, that Vec is the way to go and linked lists are [mostly] useless… there wasn't any point in time when linked lists stopped being useful and Vec started shining. Transition was very slow and gradual: first Vec was awful and linked lists great, then Vec have become a tiny bit less inefficient and thus become better for some tasks, while linked list have become less efficient and thus worse for these some task, etc.

Thus even today, long after linked lists and all tricks associated with them stopped being important in 99% of cases, people try to apply what they know to Rust.

And yes, that inevitably sends us back to one funeral at time story.

Which is sad, because that means that new generation would have to relearn everything again: while linked lists are no longer useful many other lessons that “old beards” learned are still usefull… but if C/C++ would be replaced with Rust (or any other safe language) via one funeral at time way then all these lessons would be forgotten… and thus repeated.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 12:42 UTC (Thu) by smurf (subscriber, #17840) [Link]

Well, the kernel contains a veritable heap of linked lists, so we're stuck with them.

The key isn't to implement a doubly-linked-list in Safe Rust, which is not exactly a workable idea for obvious lifetime reasons, but to hide them behind an appropriate abstraction.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 14:00 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

I agree linked-lists are a terrible data-structure. But that isn't really the point here.

Doubly linked lists are just a well-understood form of data-structure with cycles. Cycles in data-representations in non-trivial programmes are common, even if those programmes (rightly) eschew linked-lists. The linked-list question speaks to the ability to do common kinds of data modelling.

Focusing on the linked-list part misses the point.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 15:57 UTC (Thu) by mb (subscriber, #50428) [Link]

> Focusing on the linked-list part misses the point.

I think that focussing on everything must be implemented in safe Rust is missing the point of Rust.
Yes, many things *can* be implemented in safe Rust, but that doesn't mean they should.

For implementing such basic principles as containers and lists, it's totally fine to use unsafe code.
If you can drop reference counting by implementing it in unsafe code and manually checking the rules, then do it. This is actively encouraged by the Rust community. The only thing that you must do is keep interfaces sound and do all of your internal manual safety checks.

Therefore, I don't really understand why reference cycles or linked lists are discussed here. They are not a problem. Use a safe construct such as Rc or use unsafe. It's fine.

Insulating layer?

Posted Oct 20, 2024 11:15 UTC (Sun) by ssokolow (guest, #94568) [Link]

I have tried playing with Rust. There's a lot to like. There's some stuff I dislike - the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

The declarative macro syntax is an interesting beast because it's somehow so effective at disguising that a macro is just a match statement to simulate function overloading with some Jinja/Liquid/Twig/Moustache/Handlebars/etc.-esque templating inside, disguised by a more Bourne-like choice of sigil.

...maybe because it's really the only place in the language where you have to code in a Haskell-like functional-recursive style instead of an imperative style if you want to do the fancier tricks?

In the little I've played with Rust, the lifetime annotations make sense on simpler stuff. But it seems to get very baroque on more complex stuff.

True. On the plus side, they are still actively working to identify places where they can remove the need for weird tricks. See, for example, the Precise capturing use<..> syntax section in this week's announcement of Rust 1.82.0.

Maybe I'm just too stupid for Rust. Like I'm too stupid for Haskell. Maybe refcounting is the right answer for my lack of intelligence - but then it seems many other programmers are like me. And it seems maybe even the best Rust programmers too, given how they always end up reaching for (A)RC too.

One of Rust's greatest weaknesses has always been that its "make costs explicit" philosophy has created a language that encourages premature optimization, and people in the community concerned with writing learning materials spend an inordinate amount of time encouraging people to write quick and dirty code (eg. String instead of &str, .clone() everywhere, etc.) first, learn the more advanced stuff later, and always profile before optimizing.

(Of course, to be fair, the same thing occurs more subtly at lower levels with people misjudging the cost of a given machine language opcode and how superscalar architecture will affect costs. It just doesn't result in compiler errors.)

There's no shame in reaching for a reference-counted pointer. (I'm no C++ programmer, but I've never seen any signs of this much agonizing over std::shared_ptr.) However, for better or for worse, having people averse to it does seem to contribute to the trend that Rust implementations of things have low and stable memory usage.