Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 10:10 UTC (Fri) by farnz (subscriber, #17727)
In reply to: Who's afraid of a big bad optimizing compiler? by PaulMcKenney
Parent article: Who's afraid of a big bad optimizing compiler?

I wonder if there's a human language barrier here; Rust uses "unsafe" code to mean "the human in front of the computer is responsible for proving absence of data races and invalid pointer dereferences". The Nomicon" describes what, exactly, unsafe opens up to you - basically, safe Rust contains no way to trigger the dreaded Undefined Behaviour, while unsafe Rust provides for 6 ways to trigger UB if you misuse language features. The idea is that you carefully encapsulate the risk of UB into well-validated and tested code, and provide a safe Rust interface on top that is guaranteed to not have UB no matter how it's used.

For example, the implementation of a RwLock is full of unsafe code. It has to be to implement a concurrency primitive - the underlying platform is a C ABI, which is implicitly unsafe in Rust, as the compiler cannot verify that your C is free of UB. However, as a user of RwLocks, I don't need to care about the unsafety of RwLock internals - I can trust that the humans who've reviewed and tested that code ensured that safety guarantees are met.

I would expect the same to apply to advanced synchronization - you have to use unsafe to implement it, because you're doing things the compiler cannot verify are defined behaviour, but the resulting primitives are safe because there's no way to misuse them. There is an implementation of RCU in Rust that shows how this works - internally, it's unsafe; for starters, it uses the bottom 2 bits of a pointer as tag values, and it keeps raw pointers around, converting them back and forth to references (effectively safe pointers), but the external interface is entirely safe Rust.

Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 12:54 UTC (Fri) by excors (subscriber, #95769) [Link] (4 responses)

> safe Rust contains no way to trigger the dreaded Undefined Behaviour

It may be more accurate to say it can't *directly* trigger Undefined Behaviour. It can still indirectly trigger UB if it calls into imperfectly-implemented Unsafe code with an interface that's marked as Safe.

I expect it takes a lot of discipline and skill to write non-trivial Unsafe modules that guarantee it's absolutely impossible to trigger UB through their Safe interface. https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html gives the example of BTreeMap with a Safe-but-incorrect Ord, which is the kind of subtlety that seems likely to trip up many programmers writing their own Unsafe code in pursuit of performance. And there's the situation described in https://dev.to/deciduously/the-trials-and-tribulations-of... where a popular Rust framework had "an understanding gap regarding what unsafe means in Rust", leading to much unhappiness.

(It still seems better than C/C++ where you have to apply discipline and skill to *all* code, if you don't want your application to crash all the time. Rust tells you where that attention needs to be focused. But you still need good programmers and good review processes to avoid safety problems sneaking in.)

Based on zero practical experience of Rust, I imagine there are cases where it's infeasible to provide a guaranteed-Safe interface to an interesting bit of Unsafe code (like particularly clever synchronisation primitives, though RcuCell may be a counterexample). You can give it an Unsafe interface and make it the caller's responsibility to guarantee safety, because it's easier to make guarantees at a higher level, but the tradeoff is you've now got more code running unsafely. Rust seems to rely on the hypothesis that, in practice, unsafe code can be kept small and isolated, so a large majority of the codebase gets the benefits of guaranteed safety - is there much evidence for or against that yet? (particularly when writing kernel-like code that has to deal with a lot of fundamentally unsafe hardware)

Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 13:24 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

In terms of evidence for unsafety not growing, there's Redox OS, which has so far been successful at encapsulating unsafe code (at least in terms of apparent safety - it'd take a formal audit to be confident that there are no erroneous safe wrappers). Similarly, Servo (and the parts backported to Firefox) show that it's also practical to keep unsafe to a small area when working in a big project like a web browser.

My practical experience of Rust is that I have yet to encounter a piece of code which has to be unsafe and cannot provide a safe interface to the rest of your project; I have encountered several places where unsafe is used because it's easier to violate UB guarantees than to design a good interface. As an example, I've written several FFIs to internal interfaces where the C code has a "flags tell you which pointers in this structure are safe to access" design, and I've ended up creating a new datastructure that has nested Rust structs and makes use of Option to identify which sub-structs are valid right now.

Personally, I see "unsafe" as a review marker; it tells you that in this block of code, the programmer is making use of features that could lead to UB if abused (the Rust compiler warns about unnecessary use of unsafe), and that you need to be confident that they are upholding the "no UB via Safe code" guarantee that the compiler expects. It's by no means a perfect system, and as Actix shows, it can be abused, but it does show you where you need to hold the code author to a higher standard than normal.

FWIW, having spent the last 2 years working in Rust, I think it's a decent advance on C; I spend longer getting code to actually compile than I did with C11 or C++17, but it then works (both when tested and in production) more often than my C11 or C++17 did.

Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 14:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> I spend longer getting code to actually compile than I did with C11 or C++17, but it then works (both when tested and in production) more often than my C11 or C++17 did.

A note is that the amount of *thinking* is decreased overall. Rust front-loads it on the compilation phase (with a decent amount of hand holding) while C likes to back-load it on the debugging phase with a possibly unbounded timesink (where I, at times, feel like gdb is watching me like the Chesire cat). The number of times our Rust stuff has fallen over (mostly "unresponsive"[1] or hook deserialization errors[2]) is less than a dozen. But, an error (not a crash!) in the deserializer is much easier to diagnose than "someone threaded a `NULL` *here*‽" questions I find myself asking in C/C++.

[1] Someone pushed a branch as a backport that caused the robot to go checking ~1000 commits with ~30 checks each. The thinking here was actually about how to detect that we should avoid checking so much work (since the branch is obviously too new to backport). It was churning through it as best it could, but it was still taking ~20 minutes to do the work.
[2] GitLab (and GitHub) webhook docs are so anemic around the types they are using for stuff, so that a field is nullable is sometimes surprising. I can't wait for GraphQL to become more widespread :) .

Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 14:10 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

> I imagine there are cases where it's infeasible to provide a guaranteed-Safe interface to an interesting bit of Unsafe code

I suspect things like RcuCell are fine because you can still teach Rust about them through an API. For my keyutils library[1], it's a bit harder because, while there's no pointer juggling, the lifetime of things are tied to kernel objects outside of the Rust code's control. Luckily synchronization is handled by the kernel as well, so I don't have to worry about *that*, but it does mean that your "owned" data is not really such since they're just handles. It does mean that pretty much every function needs to return a Result<> because basically any operation can end up with a `ENOKEY` result if something else ends up deleting it out from underneath you.

[1] https://github.com/mathstuf/rust-keyutils

Who's afraid of a big bad optimizing compiler?

Posted Aug 16, 2019 15:06 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

It will be interesting to see Writing A Kernel Driver in Rust at the upcoming Linux Plumbers Conference. From what I have seen, things like split counters, sequence locks, and base RCU (as opposed to RCU use cases like RcuCell) have been carefully avoided in earlier Rust-language Linux-kernel drivers. Maybe they will take them head-on in this talk. Either way, looking forward to it!

I agree that encapsulating difficult-to-get-right things into libraries is a most excellent strategy where it applies, but there are often things that don't fit well into a safe API. Again, I find a number of Rust's approaches to be of interest, but I do not believe that Rust is quite there yet. As I have said in other forums, my current feeling is that current Rust is to the eventual low-level concurrency language as DEC Alpha is to modern microprocessors.