A deeper look into the GCC Rust front-end
Herron started by saying that he initially found the project too difficult; the Rust language was simply too volatile to try to develop a compiler for it. So he gave up for a while. He kept getting questions about when the work would be done, though, so he eventually restarted the project. The language has been mostly stable since 2015, so the task has gotten a little easier.
There are a few goals for the gccrs project beyond simply compiling Rust code. The work needs to end up in the GCC mainline once it's ready. It should reuse as much of the GNU toolchain as possible. There is also an effort to make the gccrs code as easy as possible to backport to older versions of GCC. Finally, advanced features like link-time optimization should be supported for Rust code.
The first step toward those goals was to create a parser for the language, then to start implementing Rust's data structures. Then came traits and generics; those features are complex, he said, but they are also at the core of how the language works. Control flow, and especially the match expression came next; after that was macro expansion. Const generics are in progress now, he said, while work on intrinsics and built-ins is just beginning. No work has been done on borrow checking; it is not needed to generate valid Rust code, so it can come later. Work on running the Rust test suite is also being done.
Another in-progress task is compiling the libcore library. This library not only has a number of important functions, it also defines many of the low-level features of the language. Without it, Herron said, "you can't do much". Current work is targeting an older version of libcore and is "getting there".
A look inside gccrs
One way in which the Rust front-end differs from many others in GCC is in its use of a special abstract syntax tree structure. It is needed to support features like macro expansion and name resolution. This tree is a sort of high-level, internal representation of a Rust program; at that point in the compilation, there is no distinction between functions and methods, and all macros have been expanded. It's used for type checking and error verification; once that's done, it can be translated and handed to the GCC mid-layer.
Cohen took over at this point to talk about macro expansion. Rust macros differ significantly from those supported by C or C++. They have typed arguments, can include both statements and expressions, have visibility modifiers, and more. Rust macros can use both repetition and recursion with results that are, he said, "cool but abstract". They support Kleene operators, and their specification requires follow-set ambiguity restriction which, he said, "is as scary as it sounds".
As a (relatively) simple example, he put up a macro that just computes the sum of its arguments:
macro_rules! add { ($e:expr) => { $e }; ($e:expr, $($es:expr).*) => { $e + add!($($es).*) };
Invocation of this macro can be as simple as:
add!(1); // Yields 1
But it can also be more complex:
add!(1, add!(2, 3), five(), b, 2 + 4);
It gets more complex from there. Rust macros, he said, enable the creation of complex domain-specific languages. It's a nice feature, but it also means that "Rust" is actually several languages in one, and all of them have to be implemented to actually have a Rust compiler.
Herron returned to talk about the type system and why it drove the creation of a separate internal representation. Rust's type system has a number of complex features, not all of which are well documented; he had to spend a fair amount of time digging through the rustc code to figure it all out. First on the list of features is name shadowing, which allows (and even encourages) frequent redeclaration of variables with the same name; shadowing "works for Rust" but wouldn't for many other languages, he said.
A trickier aspect is type inference; Rust allows the declaration of variables with neither a type nor an initializer, with the expectation that the compiler will eventually figure something out. He put up this sequence of code:
let a; // No type or initializer a = 123; // a is some type of integer let b: u32 = 3; // b is a 32-bit integer let c = a + b; // 32-bit math all around now
The gccrs internal representation makes this sort of type inference work, he said.
Then, there is the interesting concept of the Rust never type. In Rust it is valid to write code like:
let a = return; // a = !
The return statement does what one would expect, but the statement also has the result of assigning the never type (denoted "!") to the variable a. A more realistic example might be a return statement in one arm of a match expression. In any case, it is then legal to write code like:
let b = a + 1; // b = ! + integer let a = 123; // ! can be coerced to other types
The never type is an unstable feature. Cohen jumped in to say that nobody would ever write code like the above, but that the never type enables interesting things.
Other challenges mentioned by Herron include automatic dereferencing of struct fields, another undocumented behavior that took some time for him to figure out. Monomorphization also took some work and a fair amount of special-case handling. Cohen mentioned the (somewhat) object-oriented features of Rust and the extra checks they require. Visibility, in particular, is interesting; pub makes an object visible to the entire binary into which it is linked, while pub(crate) limits visibility to the current crate, and pub(super) makes an item visible to the parent module. Managing all of this gets hard, he said. A Rust compiler must also implement unsafe, of course, which disables a lot of the checks that the compiler makes.
Generating code
Faust talked briefly about challenges at the code-generation stage. The bulk of this work is translating the gccrs internal representation into the tree structures used by the GCC backend. Some structures, like if and loops, are relatively straightforward. Others are not.
He specifically called out the match expression, which he described as "switch statements on a lot of steroids — probably illegal ones". The simple cases can just map to a switch statement, but the whole point of match is that it need not be simple. Matches involving tuples, for example, must try matching a single element at a time, which is something that the GCC internal representation wasn't designed to do. Arm guards (essentially an extra if controlling whether a specific match occurs) also complicate things, since the variables set by the match must be bound before the guard expression can be executed.
Gccrs now has a good module for const evaluation; it was derived from the C++ evaluator by a Google Summer of Code student. The rustc developers recently had to update their compiler to fix a const-evaluation bug, but, much to the satisfaction of its developers, gccrs was already handling that case correctly.
So, Herron continued, when will gccrs be ready? It can mostly compile libcore now, and things work. There are other core libraries, including liballoc, that are yet to be done, but that should be easier, he said. On the other hand, Cohen said, the code that implements procedural macros is going to be harder; it forces the compiler to act as a server, sending tokens to a separate libproc executable. That means implementing a remote procedure call server in the compiler front-end.
Then, Herron said, there is borrow checking, which is an inherent part of the language. Without borrow checking, gccrs will not be a Rust compiler, and it currently does not have one. The plan here is to use Polonius (which is being developed for rustc) and avoid duplicating all of that work.
As a sort of postscript, Herron mentioned that he has been talking with the Rust-for-Linux developers about compiling kernel code. Rust versioning is based on the notion of "editions", which form the core of its compatibility guarantees. But the kernel code cannot rely on such guarantees now due to its use of a large number of unstable features, some of which have "no clear path" toward stabilization. Creating a useful compiler is hard, Herron said, when there is no language standard. The gccrs developers are working toward adding kernel modules to their test cases, but properly supporting kernel development may take some time. At the close of the session, Mark Wielaard asked whether the kernel is alone in its use of unstable features; the answer was that "everybody uses them".
[Thanks to LWN subscribers for supporting my travel to this event.]
Index entries for this article | |
---|---|
Conference | GNU Tools Cauldron/2022 |
Posted Oct 10, 2022 16:17 UTC (Mon)
by Tobu (subscriber, #24111)
[Link] (2 responses)
Quite a few of the features the gccrs guys found unusual: type inference, the never type, shadowing, destructuring match, match guards are ML or OCaml features. The core is from 1987. It's nice they are being exposed to a new audience of compiler developers, but I can see why many compiler front-ends are ML-based and/or self-hosted: it must feel a little bit masochistic developing and refining the complex parts of a language, which tend to increase compiler complexity as well, without being able to use the new abstractions to your advantage.
Posted Oct 22, 2022 10:25 UTC (Sat)
by ssokolow (guest, #94568)
[Link] (1 responses)
Do you have a citation for that?
The sources I checked give the early 1970s (Wikipedia chose 1973) for ML and I'm wondering if maybe this is a "much of Rust existed for years before 2013 and 2014 got rid of the Go-esque green threading system and migrated various special pointer types from sigils to standard library constructs and then 2015 brought v1.0" sort of situation.
The early 70s certainly would make it much more contemporary with ALGOL 68 as the progenitor of a family of languages.
Posted Dec 7, 2022 12:16 UTC (Wed)
by Tobu (subscriber, #24111)
[Link]
I had picked a date for the origins of Caml and it was old enough to make my point that people should know better by now, but now that you asked I dug a bit, the original development of ML (as part of the LCF proof assistant) took place in 1973-1978, and I can recognize type inference in this 1978 paper at least.
Algebraic datatypes can be traced back at least to Hope in the 1970s, with named records traceable to Luca Cardeli in his VAX ML.
Match patterns go back to at least Standard ML (early 1980s).
There's a lot of info starting from this page about the history of Standard ML
Posted Oct 10, 2022 16:32 UTC (Mon)
by josh (subscriber, #17465)
[Link] (12 responses)
Many people throughout the ecosystem experiment with them; that's why they exist, so that people can try them out and see if they work well to solve the problems they're meant to solve. But most people don't use nightly compilers and unstable features. In the 2020 survey, only 28% of Rust users said they use nightly Rust, down from 30% in 2019, and IIRC in 2021 it continued to decline though I don't have the numbers handy. Every time we stabilize key things people want, fewer people have a reason to use nightly.
Posted Oct 10, 2022 16:58 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (6 responses)
However ...
There are things I want which aren't yet possible in stable. I recently wanted to make BalancedI8, which is a type like NonZeroU8 except that instead of zero being stolen for use as the niche, it steals -128 since almost nobody needs that. The resulting type is nicely balanced (its extents are -127 and +127) yet it has a niche so Option<BalancedI8> is the same size as i8 (ie one byte). As a bonus its abs() function can't fail since it is balanced.
Turns out I can make BalancedI8... much more easily than I'd initially assumed - but only with the permanently unstable rustc internal "Here is where the niche lives" attributes which are not available in stable Rust (unless you're the standard library). I still intend to make a crate which provides BalancedI8 and similar types which seem useful and can only be constructed this way, but it would make me very glad to have a way to signal, presumably unsafely, that I am doing this specific thing - making an opaque integer type with a niche where one value would fit but I promise never to store that value - in stable Rust.
Still, in one sense the answer is "everybody uses them" but via the standard library. The standard library is *full* of unstable stuff, some of it will be stabilised in the foreseeable future, some is in the "We'd love to but..." category, and some is deliberately unstable forever because touching it involves compiler internals, today that means the rustc / LLVM teams, some day it will also involve GCC of course.
Lots of my favourite inner workings of Rust are in the middle or latter category. Pattern, TrustedLen, much of Wrapping<T>, all of Saturating<T>, the entirety of Arai's Provenance Experiment.
Posted Oct 10, 2022 17:21 UTC (Mon)
by josh (subscriber, #17465)
[Link] (5 responses)
I'd love to see that attribute stabilized. The last couple of times we've gotten hung up on all the *future* things we want to support there (e.g. alignment niches), but I think we should *just* support `#[niche(range = "...")]` and allow for the possibility of adding more in the future.
Posted Oct 10, 2022 19:36 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (4 responses)
#[simple_niche=0x80]
That's enough to enable the Option-like niche optimisation guarantee, and thus turn all the cases where a C programmer would use a single sentinel value into type safe Rust. There are a lot of cases where saving a few bytes isn't actually important, choosing unstable to do this would be grossly disproportionate. But philosophically, the difference between "There's barely any overhead" and "The machine code is literally identical but thanks to Rust this is type safe" is huge.
Posted Oct 10, 2022 22:25 UTC (Mon)
by josh (subscriber, #17465)
[Link] (3 responses)
Posted Oct 11, 2022 0:38 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
Posted Oct 11, 2022 8:16 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
This feels like the "gotten hung up on all the *future* things we want to support there" comment you made earlier. A single valued niche is not stabilized, is definitely useful, is a subset of things that can reasonably be supported (indeed, it's a subset of things you can support today), and doesn't put as much of a long-term support burden on you as a range, or a more complex way to specify niches.
Perfect as enemy of good, and all that.
Posted Oct 11, 2022 9:37 UTC (Tue)
by josh (subscriber, #17465)
[Link]
Posted Oct 11, 2022 6:48 UTC (Tue)
by roc (subscriber, #30627)
[Link]
Posted Oct 11, 2022 17:17 UTC (Tue)
by Gaelan (guest, #145108)
[Link] (3 responses)
Posted Oct 11, 2022 19:44 UTC (Tue)
by zev (subscriber, #88455)
[Link] (2 responses)
Posted Oct 11, 2022 21:41 UTC (Tue)
by khim (subscriber, #9252)
[Link] (1 responses)
Suddenly? No. Eventually? Yes. Examples include -fthis-is-variable, signatures, Java Exceptions or access for scope variables. Yes, these are mostly from the time when C++ wasn't as ossified as it is today, but it happened before, it may happen again.
Posted Dec 7, 2022 13:03 UTC (Wed)
by nix (subscriber, #2304)
[Link]
I'd say the extensions you cite, and many more, are mostly from the time more or less before GCC 2.7.x, when GCC was happy to accept barely-standardized experiments with piles of more-or-less-obscure places where behaviour is not defined and crashes of the generated code or even the compiler are expected: statement expressions are a notorious case here, which unusually cannot be deprecated because they're so widely used, even though their interaction with (in particular) C++ non-POD types is best described as "hilarity abounds". I don't think requiring new features to be defined precisely enough that their behaviour is clear and they don't explode the compiler by accident is actually a bad thing to do, though obviously in a language as complex as C++ that is a damn hard thing to do.
(And thanks to git, experimental branches on which crashy half-defined experiments like this can be added are easy now, without inflicting their manifold sharp edges on anyone else.)
Posted Oct 10, 2022 18:40 UTC (Mon)
by flussence (guest, #85566)
[Link] (2 responses)
But if you require official standards and then limit yourself to them, you get a C compiler that can't compile the kernel because it uses -std=gnu11.
Posted Oct 11, 2022 11:15 UTC (Tue)
by LtWorf (subscriber, #124958)
[Link]
Posted Oct 20, 2022 2:40 UTC (Thu)
by dxin (guest, #136611)
[Link]
Posted Oct 10, 2022 18:51 UTC (Mon)
by djc (subscriber, #56880)
[Link] (1 responses)
Posted Oct 28, 2022 13:11 UTC (Fri)
by Xiphoseer (guest, #161855)
[Link]
Posted Oct 10, 2022 19:09 UTC (Mon)
by mb (subscriber, #50428)
[Link] (14 responses)
I think this could be misunderstood by people not knowing Rust, yet.
(Yes, one could argue that in safe Rust these additional features are forbidden by a safety check...)
Posted Oct 10, 2022 19:39 UTC (Mon)
by JoeBuck (subscriber, #2330)
[Link] (13 responses)
See this example from the Rust book.
Posted Oct 10, 2022 19:47 UTC (Mon)
by mb (subscriber, #50428)
[Link] (12 responses)
Saying that unsafe disables checks is misleading.
It just adds *more* ways to manipulate data:
Saying that the unsafe keyword disables checks leads to nowcomers thinking that just adding unsafe disables basic safety guarantees. Which is not the case. If you write code that only accesses safe functions and features in an unsafe block, then all safety checks are still upheld.
Posted Oct 10, 2022 21:54 UTC (Mon)
by JoeBuck (subscriber, #2330)
[Link] (11 responses)
Posted Oct 11, 2022 8:19 UTC (Tue)
by mb (subscriber, #50428)
[Link] (8 responses)
And the unsafe keyword doesn't disable the borrow checker.
Posted Oct 11, 2022 16:21 UTC (Tue)
by JoeBuck (subscriber, #2330)
[Link] (7 responses)
Posted Oct 11, 2022 17:08 UTC (Tue)
by mb (subscriber, #50428)
[Link]
> It sounds like you are refusing to believe the Rust project's own book.
Ehm, wat?
> Go read it, it explains the issue.
I read it a long time ago.
If you write safe code that doesn't pass a Rust safety check, then merely adding `unsafe` to your code will never result in a running program. It will throw the exactly same errors as before.
Posted Oct 11, 2022 18:02 UTC (Tue)
by steveklabnik (guest, #114343)
[Link] (4 responses)
https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
> It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside of an unsafe block.
Posted Oct 11, 2022 18:26 UTC (Tue)
by mb (subscriber, #50428)
[Link] (3 responses)
Thanks for the book. It's one of the best technical books that I ever read.
Posted Oct 11, 2022 22:23 UTC (Tue)
by steveklabnik (guest, #114343)
[Link] (2 responses)
Posted Oct 12, 2022 4:54 UTC (Wed)
by buck (subscriber, #55985)
[Link] (1 responses)
As long as you are coming out of the woodwork here, let me also pile on:
Yes, a very good book, and one for which I and, I'm sure, many (thousands of) others are extremely grateful and obliged to you and Ms. Nichols. It's also one of the pillars on which the success of the language is founded/building, I have to believe, as I would find it hard to fathom there are many (any?) who have bypassed your book when they set out to do Rust, unless they want to miss out on the most thorough and comprehensive explanation of the language one can get, at its length, and, obviously, at its extreme of affordability and accessibility, and with its didactic rock-solidness and sweep, from people just kicking the tires on Rust to those who want to grasp the language implicitly. It is, in a word, a gateway drug.
Well, maybe I'm overstating some of that, and I'm not exactly a Rust developer myself, so maybe I don't have the read on it exactly right, but I'm guessing it is no exaggeration to say that the reason so many are undeterred by the knock on Rust that it's hard is because your book is right there to give them a pretty thorough understanding of what exactly Rust is, why the hard parts are the way they are and what you gain by way of recompense, and is welcoming and instructive in places where the compiler alone is a little less friendly a learning companion. At least it made me want to go program something up. (Alas, the only thing I've been able to find are the exercises in the Command-line Rust book, which were thoroughly engaging, but no particular itch of my own to scratch.)
Posted Oct 12, 2022 18:50 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
Posted Oct 12, 2022 9:32 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
The book does not say that the safety checks are "disabled" (your assertion). It says that in an unsafe block, you can use functionality that has weaker checks applied to it than safe Rust does - but if you don't use that extra functionality, you get exactly the same checking as any other Rust code.
And the reason for having some functionality gated behind "unsafe" is also explained; the Rust developers intend that the checks that apply to "safe" Rust guarantee that the code has no Undefined Behaviour, whereas no such guarantee is made for "unsafe" Rust.
It sounds like you need to read the Rust book again, before accusing people of refusing to believe it.
Posted Oct 11, 2022 8:23 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
The borrow checker is still running in the example code, though, and still says that all the properties the borrow checker enforces are correct in the code given in the example. It's just that you use functionality (ptr::add, slice::from_raw_parts_mut) that isn't available in Safe Rust, and that does things whose properties are not checked by the compiler.
I know this feels a lot like nit-picking, but the thing about Unsafe Rust is not that safety checks in Safe Rust get turned off - it's that when you use unsafe, you get access to extra features of Rust that are not safety checked. You still have all the same safety checks as any other Rust code - but the operations that require unsafe do so because some of the burden of checking safety is pushed onto the user.
Posted Oct 22, 2022 10:32 UTC (Sat)
by ssokolow (guest, #94568)
[Link]
Inside or outside an unsafe block, the invariants of the safe Rust constructs must be upheld.
Posted Oct 11, 2022 12:46 UTC (Tue)
by beagnach (guest, #32987)
[Link] (3 responses)
But... Why?
Posted Oct 11, 2022 13:12 UTC (Tue)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
Note however that the link to libproc is incorrect. I am not sure if the authors meant proc_macro (https://doc.rust-lang.org/proc_macro/index.html) or something else.
Posted Oct 11, 2022 14:11 UTC (Tue)
by corbet (editor, #1)
[Link] (1 responses)
Posted Oct 12, 2022 16:47 UTC (Wed)
by mjw (subscriber, #16740)
[Link]
Posted Oct 12, 2022 8:40 UTC (Wed)
by MrWim (subscriber, #47432)
[Link]
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
The core is from 1987.
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
It doesn't disable any of the safety checks.
It just adds a handful of additional unsafe features, like dereferencing of raw pointers.
No, it does disable safety checks, locally, to permit the creation of safe abstractions over unsafe code, like splitting a slice into two slices. It's then up to the developer to assure that the safety properties are preserved.
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
All safety checks are still upheld. (Like borrow checking, for example)
https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#u...
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
split_at_mut is a safe method. It doesn't require you to use an unsafe block.
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
But that block doesn't disable any safety check.
That's why I showed you the part which explains that unsafe doesn't disable any check and merely adds a handful of additional unsafe features.
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
Re: Rust-book kudos
Re: Rust-book kudos
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
A deeper look into the GCC Rust front-end
Hmm yes that link is clearly wrong, I should have noticed that. I'm not sure what should be linked, so I've just taken the wrong one out. Apologies for the confusion.
libproc
proc_macro
Compiling Rust for Linux with rustc_codegen_gcc