Toward a more approachable Rust

Posted Feb 23, 2017 9:52 UTC (Thu) by oever (guest, #987)
Parent article: Toward a more approachable Rust

I've been writing quite a bit of Rust in my spare time since half a year. To me the initial learning curve was not high until I hit issues with the borrow checker.

The borrow checker is challenging to learn, but when you understand it, you have not simply learned to appease the borrow checker, you've learned to write good code. Most of the issues that the borrow checker uncovers is unsafe cost. Code that would have been much harder to do right in C and code which I probably would not have done right in C but which might have worked well on the happy path.

Rust has many things that are easier to learn than in C. Macros are nice and sane. Building and packaging is simple. Writing tests is simple and writing code snippets in documentation is simple. In fact, Rust compiles and checks all code snippets in documentation! It even has benchmarking built into the packaging system, cargo.

Of course there are downsides. Some code is quite verbose. Rust does not have function overloading. It does not implicitly convert integer types. But these are trade-offs needed to achieve speed and safety. You have to tell the compiler exactly what you mean.

The most important aspect of Rust for me is that you can trust your code much better. There are way less hidden gotcha's in code. This makes it easier to read other people's code.

Before Rust, my favorite programming language was Haskell. Rust is less elegant in the mathematical sense, but it makes up for that by being very practical and still very fast and safe.

Toward a more approachable Rust

Posted Feb 24, 2017 8:26 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Before Rust, my favorite programming language was Haskell.

https://air.mozilla.org/introduction-to-rust-for-ocaml-pr...

Toward a more approachable Rust

Posted Mar 2, 2017 8:10 UTC (Thu) by ras (subscriber, #33059) [Link] (8 responses)

> Before Rust, my favorite programming language was Haskell.

Which goes some way to explaining why you didn't find the learning curve high at first.

> To me the initial learning curve was not high until I hit issues with the borrow checker.

It is always is that way, isn't it?

I discovered HackerRank, and taught myself Rust by doing a number of problems there - maybe 20, maybe 50 - I'm not sure, but definitely more than the 4 days ESR spent with it. He's about my age, and he if did manage to master the language in 4 days he's a much, much smarter man than I. Haskell and it's type inference isn't something I've come across in any language I am familiar with, so the learning curve was punishing. Even so I grew to like the language, and fell in head over heels in love with the build system. Towards the end I thought I was pretty good with the borrow checker, but one a nagging doubt remained - those practice programs were all very small.

I decided to test myself by doing some serious data structure work with Rust. I general, I could not do it, at least not without lots of unsafe blocks. If I was to summarise the pain point, it would be because the borrow checkers reasoning starts and ends within the stack, mostly within the current stack frame. Use stuff on the heap and things get very difficult very quickly. Try doing tree insertion without using recursion some time - I don't think it's possible. I ended up wishing Rust had some sort of "const" feature for struct's - fields the compiler could know would never change outside of constructors and destructors. That way the bloody borrow checker extend stack lifetimes to pointers within struct's, and so could derive information about the lifetimes on stuff on the heap too.

Still, I was a relative newbie at Rust, so I went along Servo talk. Servo is Mozilla engine written entirely in Rust. In fact it's the largest Rust project and has been a big driver of the language. Naturally it uses the heap a lot, so I wanted to see how they got around code littered with unsafe's. Sadly answer was they hadn't managed it. They were hoping one the top Rust language guru's would look at the worst library and fix it. Maybe he could, but if it took someone like that a mere mortal like myself had no hope in day to day day usage. I just don't have that sort of time at my disposal.

I didn't see the point of going to all that cognitive effort of figuring out how to appease the borrow checker, when for arguably the most dangerous part of memory - the heap, you can't do it. I gave up at that point.

Toward a more approachable Rust

Posted Mar 3, 2017 1:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (7 responses)

I've been working on a Rust codebase for a number of months (it's been deployed to production for the last month or so) and I haven't had to write any unsafe code. Granted, there are *lots* of calls out to the git binary (no, libgit2 doesn't support what we need), but the amount of confidence it gives me when I go in and refactor some functionality out for new features is amazing. No worries about rogue null pointers floating underneath the surface, easy to parallelize a loop without fear, easy access to a test suite, and more all have me sold.

Granted, I have also programmed in Haskell previously, but if anything, learning these languages has improved my code in Python, C++, and others because I can now more easily spot potential problems with pointer manipulation and the like.

There are times I fight the borrow checker, but in the end it has (so far) always been right short of the non-lexical lifetime issue (which has a path to resolution now). It lets me specify in my interface that some structure holds a resource necessary for another (e.g., a temp directory which is deleted when it goes out of scope).

What kinds of problems were you trying to solve that you needed so much unsafe code?

Toward a more approachable Rust

Posted Mar 3, 2017 2:41 UTC (Fri) by ras (subscriber, #33059) [Link] (6 responses)

> What kinds of problems were you trying to solve that you needed so much unsafe code?

I may not have been clear. In all the simple stuff I did there was no unsafe code, and I was never tempted reach for it. It was when I grew suspicious that I was solving only simple "Rust friendly" problems, and actively searched for places that it might fall down that the rot set in.

You don't have to look far to find buckets of unsafe code: have a read of the standard library some time. I often did that to figure out how the a rust expert would attack a problem. It was seeing all the unsafe code in there that first raised my suspicions: how was that different to what I was doing?

I gave you the answer: the problems started to arise when you use the heap, explicitly. In particular you have stuff on the heap referencing other objects on the heap. You can avoid explicit use of the heap a lot of the time (for me it was all of the time when doing simple stuff), by using Rusts inbuilt collections. These collections do use the heap internally of course, and as a consequence if you peak inside of them you will see many unsafe blocks. But a person using those collections to store simple objects won't need to use unsafe's. If you are doubting this try writing your own collection with iterators, without using Rusts collections internally. For extra bonus points don't use recursion.

One common "cheat" I've seen used several time (along with "see, you don't need unsafes in Rust") is to put the objects you want to reference into an array, and then store the index into that array. It's cheating because every index lookup does contain an unsafe block, and that unsafe block costs time (the index bounds check) at run time.

The canonical Rust solution to heap pointers to other heap objects it to wrap everything in RefCell's. Aside from leading to some truly ugly type descriptions [0] it means you are absorbing a runtime hit every time you move between pointers. One of the borrow checkers purposes in life is to allow you to move around data structures without incurring that hit by doing the checks at compile time. It works well only when everything is on the stack.

[0] From stdio.rs: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>

Toward a more approachable Rust

Posted Mar 3, 2017 13:25 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (5 responses)

This happens with any language with safety guarantees. What unsafe let's you do is point out where tricky business is going on and check that code more carefully. Unsafe code still has to abide by the rules, but it is enforced by developers rather than the compiler. Haskell has the same thing. Want a global state? You need unsafePerformIO. It's just how it's done.

One other reason for unsafe is performance because Rust is also not blind to the realities of hardware. This helps it reach actual use rather than things like Haskell where things like laziness and space leaks become an issue.

I suggest you look at some of the lower libraries like Rayon which has (I believe) just a handful of unsafe bits. Everything else can be checked and verified by the compiler. Also check out the improvements to the sorting algorithms which are 90%+ safe code (last I saw it).

The type you gave describes what it is: thread-shared line buffered output that may or may not exist, can be accessed via reentrant calls. What would it be in C++? Just another pointer with a sidecar mutex. Yep, no way to screw that up.

Yes, you need unsafe somewhere because not everything does, internally, abide by the rules. But when you make those blocks and make the rules conform, the code using it can be reasoned about much more easily. Even if it's not perfect, it's still way better than what is available in C, C++, D, or Go. What else would you suggest these days?

Toward a more approachable Rust

Posted Mar 3, 2017 18:48 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> I suggest you look at some of the lower libraries like Rayon which has (I believe) just a handful of unsafe bits.
There's also RustBelt that attempts to prove correctness of the unsafe bits if they are encapsulated correctly.

Toward a more approachable Rust

Posted Mar 4, 2017 3:50 UTC (Sat) by ras (subscriber, #33059) [Link] (3 responses)

> What else would you suggest these days?

I guess what I was looking for is something that I would use instead of Python. I was willing to give up simplicity for that, although it was a bit of a shock to see just how complex things became. "Code flowing off the fingers" is not something I've experienced with Rust. However, I did expect to get more than I actually received in exchange for that complexity.

Perhaps an example would help. Take a doubly linked list implementation. Lets call an element a DListElt. It has a number of operations operations you can perform on it. Lets look at two of them: DListElt.insert() and DListElt.remove(). You can call insert() only when it's not in a list, and remove() only when its in a list. It would be nice to be able to verify this at compile time (thus avoiding the overhead of run time checks, so it would be as fast a C), so for example if you try to call remove() when it's in a list, you get a compile error.

Rust can't do this. What it can do is very limited: it can attach a borrowed state to a type, but this only works for stuff in a stack frame. It's not hard to understand why this stack limitation exists: the compiler needs to know the lifetime of a borrow and it has only two sources of known lifetimes: variables on the stack whose lifetime is the same as the stack frame, statics who live forever. This means the compiler can not reason about anything on the heap. So when confronted with my dinky little DListElt type that only has a grand total of two states which can be represented by a simple boolean (borrow's state is far more complex) that lives on the heap, it can't help. Rust's does have an inbuilt LinkedList of course, but because of this it is burdened with unsafe's and run time checks for most operations.

The frustrating thing for me is it not hard to do. (I've invented some syntax for what follows, hopefully it's meaning is obvious.) Just let the programmer create his own compile time state information, and attach it to a type. Lets invent DListElt:Inuse<bool>. Remove() and insert() then become a casts of sorts. It is defined only for DListElt:Inuse(True).remove(), and returns a DListElt:Inuse(False). Similarly insert() is defined only for DListElt:Inuse(True).insert(), and returns a DListElt:Inuse(False).

Interestingly, a very early version Rust did do something like this but they abandoned the idea in favour of the current borrow syntax. I'm not sure why. I suspect that borrow is all they needed to avoid the bulk of memory corruption issues they saw and they could not come up with generic type system that was powerful enough to implement it. I also suspect most of their examples they looked at were smallish programs, because once you need complex data structures that live on the stack the borrow checkers limitations really bite hard. At least they they bit me hard, and they bit servo too.

So yes, they have come up with a language that occupies some unique niche in the (fast, safe) space. But it's not safe enough to replace Python. It is definitely safer than C / C++, and yes as you say the unsafe blocks show you where to look for the bits that are not. If you do look you will find checks in the unsafe blocks that perform at runtime the checks the compiler could not do at compile time. This is what makes it as safe a Python for simple programs that don't need their own unsafe blocks, but it comes at the cost of making it slower than C and C++.

> I suggest you look at some of the lower libraries like Rayon which has (I believe) just a handful of unsafe bits.

I'd hope not. It just an implementation of thread polling with work stealing. The person who writes the threads (ie, *you*) still has some responsibly for ensuring those threads don't stomp on each other. https://github.com/nikomatsakis/rayon contains this truly weird statement, which to sounds me like he is trying to redefine words to wriggle out of saying just that:

> While there are threadsafe types that offer similar APIs, caution is warranted because, in a threadsafe setting, other threads may "interject" modifications in ways that are not possible in sequential code. While this will never lead to a data race --- that is, you need not fear undefined behavior --- you can certainly still have bugs.

He is talking about multiple threads executing code like this: atomic.set(atomic.get() + 1). It's true only if you use the Rust definition of undefined behaviour: don't reference memory that isn't yours. It sounds like a deliberately knobbled definition to me, but one that happens to line up with what their borrow checker can guarantee, albeit for stack only.

Toward a more approachable Rust

Posted Mar 4, 2017 4:19 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> DListElt.insert() and DListElt.remove(). You can call insert() only when it's not in a list, and remove() only when its in a list.
The problem here is suspicious design - your entities are both elements of a list and objects in their own right.

So if you hold a reference to any entry you can traverse the whole list. To guaranteed its safety for the Rust's borrow checker you need to make sure that there are no other mutating references to it.

And since it looks like you want elements to live their own livetimes, you really need to have runtime locking that tell Rust's borrow checker that it's OK. I'm not sure how typestate (that was in early version of Rust) is going to help you here.

Toward a more approachable Rust

Posted Mar 4, 2017 9:34 UTC (Sat) by ras (subscriber, #33059) [Link]

> So if you hold a reference to any entry you can traverse the whole list.

As a point of order, no you can't. What you call the entry is called struct Node in linked_list.rs. It is private to LinkedList, and besides has no traits that could define a next() method, or even a way to get to the list head. I presume you meant "if you hold a reference to the LinkedList you can traverse it". That's certainly true (there wouldn't be much point to it otherwise). It's done that way so your next statement holds true:

> To guaranteed its safety for the Rust's borrow checker you need to make sure that there are no other mutating references to it.

Another point of order. There is simply no way to inform Rust's borrow checker you have borrows on all the nodes in the list. Well that's not quite true, you can use recursion. But if you restrict yourself to one stack frame you can have only borrow per local variable so borrowing an arbitrary number of objects is impossible. The implementer of LinkedList is constrained by this too, so he must bypass the borrow checker and wrap all references to Node's in unsafe's. So the guaranteed safety you mention has nothing to do with the borrow checker. It stems purely from you trusting the guy who wrote the LinkList code.

In one way none of this matters. The LinkedList author succeeds in creating the illusion a borrow on the LinkList applies to all the elements on it, so the users of the library can blissfully write code without a single unsafe under that assumption. But it does have it's costs. In the current implementation a loop iterating over all nodes must first create the integrator object on the stack (copy 3 words), and then do the same pointer load the C implementation would do. As a consequence the LinkedList has 0 or 1 nodes the C version will use a fraction of the cycles. But perhaps a different implementation could approach the C version.

Unfortunately, it can't hope to have the flexibility of the traditional C implementation. For example, in a C implementation if you follow pointers in the heap to a node on the linked list removing it is a O(1) operation. For that matter, so would operation you mentioned above: "if you hold a reference to any entry you can traverse the whole list". The Rust implementation doesn't provide either of these operations, at all. If it did provide a "delete this element" function it would be O(length_of_list). I don't see how a Rust library could provide an O(1) delete for a linked list. You could do it yourself of course, but then you end up where I and servo did: code littered with unsafe's.

Which is exactly the point I was making when I first responded to to this series of posts. Try to use Rust to implement a largish program with complex data structures, and you will end up with code littered with unsafes because just having a borrow checker is too limited. I don't think it's a huge improvement on C. There are lots of things that are improvements of course: the wonderful build system, the doco, cargo.io, the community, play.rust-lang. It's pretty extraordinary effort really. But the language itself - it's still underdone.

Toward a more approachable Rust

Posted Mar 4, 2017 12:14 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

> "Code flowing off the fingers" is not something I've experienced with Rust. However, I did expect to get more than I actually received in exchange for that complexity.

Yes, once you get down into implementing specialized structures, you'll need some unsafe, but when properly wrapped up, you shouldn't need it. I've certainly had moments where code "flowed off of the fingers" and all I was left with, once it compiled, was minor fixes rather than things like "did you remember to check for that None that can sneak in here this way?" kind of things.

> Lets call an element a DListElt. It has a number of operations operations you can perform on it. Lets look at two of them: DListElt.insert() and DListElt.remove(). You can call insert() only when it's not in a list, and remove() only when its in a list. It would be nice to be able to verify this at compile time (thus avoiding the overhead of run time checks, so it would be as fast a C), so for example if you try to call remove() when it's in a list, you get a compile error.

This is not idiomatic in Rust anyways. Instead, you make node just the data, possibly with some accessor methods. For insert and remove, you instead make new structures which represent the action and can have their own lifetime associated with it. Look at the Entry structure for a map to see an example of this.

> So yes, they have come up with a language that occupies some unique niche in the (fast, safe) space. But it's not safe enough to replace Python. It is definitely safer than C / C++, and yes as you say the unsafe blocks show you where to look for the bits that are not. If you do look you will find checks in the unsafe blocks that perform at runtime the checks the compiler could not do at compile time. This is what makes it as safe a Python for simple programs that don't need their own unsafe blocks, but it comes at the cost of making it slower than C and C++.

Ha! Python has all kinds of unsafeties from its type system that even C++ will catch. Rust has better inference, so things are more ergonomic with the type system than C++ with things like template-derived types and the like.

> I'd hope not. It just an implementation of thread polling with work stealing. The person who writes the threads (ie, *you*) still has some responsibly for ensuring those threads don't stomp on each other.

Anything stronger would (ISTM) require a proof system because you'll always have a way to escape unless you go the Erlang route and just say "nope, can't assume an OS, so you have to use Erlang threads instead of native threads" which isn't really suitable for a language that you'd like to run directly on hardware (unless the Erlang runtime is a suitable OS I suppose).

It seems to me that you're arguing from a "you need unsafe in one place, so the whole thing is invalid" whereas those kinds of things usually go into their own libraries with test suites and should be hiding all of the unsafe blocks from you. You're also expecting magic (proof systems) instead of what Rust does guarantee: write your unsafe code (usually, data structures) carefully and your use of that can be checked and enforced by the compiler with no overhead (that is, no ASan or TSan necessary).