Progress toward a GCC-based Rust compiler
The gccrs project is an ambitious effort started in 2014 to implement a Rust compiler within The GNU Compiler Collection (GCC). Even though the task is far from complete, progress has been made since LWN's previous coverage, according to reports from the project. Meanwhile, another hybrid and more mature approach to GCC Rust code generation is available in rustc_codegen_gcc.
In 2022, the goal of gccrs was to be included in the GCC 13 release, but this expectation has not been met. The team is currently aiming for inclusion in GCC 14 (likely to be released by mid-2024), judging from its November 2023 monthly report.
On October 13, Arthur Cohen gave a talk titled "The road to compiling the standard library with gccrs" (the video is available) at EuroRust 2023. In his talk, Cohen gave a little bit of general background on gccrs but mainly focused on what work has recently gone into compiling the Rust standard library, and why gccrs cannot do it yet.
Gccrs targets a specific Rust version, 1.49, released at the end of 2020, rather than trying to keep up with the rapidly developing Rust language. This version was chosen because it is the latest version predating support for const generics, which were introduced in 1.50. However, Cohen expressed regret in his talk that the project has not been able to ignore const generics after all, because they are in use in the standard library, even in 1.49. They were "stabilized" for general availability in 1.50, but there is internal standard library usage in earlier versions as well. Const generics have since been fully implemented, however, and this issue is no longer a hindrance.
A lot of care is being put into gccrs not becoming a "superset" of Rust, as Cohen put it. The project wants to make sure that it does not create a special "GNU Rust" language, but is trying instead to replicate the output of rustc — bugs, quirks, and all. Both the Rust and GCC test suites are being used to accomplish this.
The Rust standard library consists of a number of "crates", which is what software packages are called in Rust lingo. Cohen explained that gccrs is working on supporting compilation of the two most important ones: core and alloc. The core crate is the foundation of the standard library, implementing features such as primitive types and macros; alloc deals with heap-memory allocation and various container types.
Currently gccrs is not able to compile these crates because of various shortcomings, such as incorrect behavior in macro-name resolution and incomplete support for decorator macros. The lack of a borrow checker (discussed more below), while not blocking compilation, means that the compiler cannot properly check the safety of the code. An additional hurdle is formed by missing compiler intrinsics in GCC. Rustc uses some intrinsics provided by LLVM that are not supported by GCC, which means the gccrs team needs to spend time implementing them in GCC.
Another talk (slides available) was given by Pierre-Emmanuel Patry at the GNU Tools Cauldron in September 2023. He mainly focused on progress toward inclusion in GCC 14 as well as macros, which seem to be an interrelated issue because the approach to implement procedural macros necessitates changes to the GCC build system. Procedural macros are function-like macros that emit token streams rather than plain source code text like C or C++ macros. They are implemented in a built-in crate called proc_macro. Such macros are notoriously tricky to implement but also powerful; they form the core of features such as #[attribute] and #[derive()] decorators, and can be used to create compile-time evaluated, domain-specific languages.
In the GNU Cauldron talk, Patry also mentioned that gccrs had more than 800 commits waiting to be upstreamed to GCC.
Taking advantage of the GCC ecosystem
Cohen's EuroRust talk highlighted that one of the major reasons gccrs is being developed is to be able to take advantage of GCC's security plugins. There is a wide range of existing GCC plugins that can aid in debugging, static analysis, or hardening; these work on the GCC intermediate representation. Gccrs intends to support workflows where developers could reuse these plugins with Rust code. As an example, Cohen mentioned that "C programmers have been forgetting to close their file descriptors for 40 years, [so] there are a lot of plugins to catch that". Gccrs intends to enable Rust programmers to use existing GCC plugins and static analyzers to catch bugs in unsafe code.
Cohen listed a few things that gccrs is already useful for. According to him, the Sega Dreamcast homebrew community uses gccrs to create new games for the Dreamcast gaming console, and GCC plugins can already be used to perform static analysis on unsafe Rust code. The Dreamcast community's interest stems from the fact that rustc's LLVM backend does not support the Hitachi SH-4 architecture of the console, whereas GCC does; even in its incomplete state, gccrs is helpful for this embedded use case.
Additionally, he mentioned that the gccrs effort has revealed some unspecified language features, such as Deref and macro name resolution; in response, the project has been able to contribute additions to the Rust specification. Currently Rust does not have a formal specification, but work is underway to create one, as proposed in RFC 3355. "The gccrs people want to be a part" of that effort, Cohen said.
One more reason for gccrs to exist is Rust for Linux, the initiative to add Rust support to the Linux kernel. Cohen said the Linux kernel is a key motivator for the project because there are a lot of kernel people who would prefer the kernel to be compiled only by the GNU toolchain.
Things under development
Gccrs is still missing a lot of core functionality. Cohen listed several important features, such as async/await, LLVM intrinsics that are absent in GCC, and the format_args!() macro used by output macros such as println!(). The borrow checker, which is a compiler subsystem that enforces the reference rules of the language, is a key Rust feature that gccrs will need to provide. Cohen briefly mentioned that the likely solution is a separate borrow-checker project called Polonius, and said Gccrs will most likely have it integrated a few months down the line. Contributor Jakub Dupak has made progress on this in the past few months.
Polonius is a library that implements a borrow checker that is semantically equivalent to the (not quite flawlessly implemented) checker in rustc today, by approaching the computation of reference lifetimes with a radically different algorithm. Polonius aims to one day resolve the shortcomings and corner cases of rustc's current borrow checker. Once it has matured, rustc itself will likely also adopt it in the future.
According to the gccrs monthly report for November 2023, work has begun on the format_args!() macro. This helper macro is responsible for constructing parameters for other string-formatting macros. It involves the Display and Debug traits, and is a necessity for preparing arguments that are later passed to other macros such as format!() and println!(). Without format_args!(), a Rust program cannot create formatted output; this feature is thus necessary before gccrs can compile a "Hello, World" program.
For a deep dive on format_args!(), see Mara Bos's recent blog post.
rustc_codegen_gcc
There is another GCC-based Rust project, called rustc_codegen_gcc, that is more mature and more limited in scope compared to gccrs. It is not a full implementation of a Rust compiler from the ground up; instead, it uses the libgccjit library to hook into an API of the LLVM backend used by rustc. This approach performs much of the compilation with rustc and turns to GCC at a later stage. Despite the "JIT" (just in time) in the name of the library, rustc_codegen_gcc is intended for ahead-of-time compilation. Its stated primary goal is to enable Rust code generation on platforms unsupported by LLVM.
As of October 2023, rustc_codegen_gcc can now compile Rust for Linux without any additional patches. Over the past year, the project seems to have made good progress on many fronts; for example, it has added support for SIMD (single instruction, multiple data) operations and link-time optimization, both of which were earlier identified as causes for test failures. Cohen deferred to rustc_codegen_gcc at several points in his EuroRust talk, encouraging attendees to use it instead of gccrs for now. It is, in fact, already upstreamed into the Rust language repository.
Rust for Linux
Currently, the Rust for Linux project provides documentation for using either rustc or rustc_codegen_gcc to build Rust code for the kernel. The kernel also contains documentation for the minimal supported versions of various build tools, including compilers. For rustc, the version is considered an exact match, rather than a minimum. The currently stated supported rustc version is 1.73.0 (released in October 2023), much more recent than the 1.49 targeted by gccrs. Rust for Linux support is also a stated goal for gccrs, but because of this significant discrepancy, it seems to be quite far off.
Gccrs has progressed nicely in the year since we last looked at it: the
repository has well over 3,000 commits since January 1, 2023.
However, it is not yet in a usable state for almost any practical purpose,
since as a complete implementation from the ground up, gccrs is much more
ambitious in scope than rustc_codegen_gcc. The latter is already
merged to the upstream Rust repository and sees real-world use with Rust
for Linux. We are not yet in a world with multiple implementations of a
compiler for the Rust language, but it is getting closer.
Index entries for this article | |
---|---|
GuestArticles | Koistinen, Ronja |
Posted Dec 15, 2023 15:41 UTC (Fri)
by Bigos (subscriber, #96807)
[Link] (9 responses)
> Polonius refers to a few things. It is a new formulation of the borrow checker. It is also a specific project that implemented that analysis, based on datalog. Our current plan does not make use of that datalog-based implementation, but uses what we learned implementing it to focus on reimplementing Polonius within rustc.
https://blog.rust-lang.org/inside-rust/2023/10/06/poloniu...
I understand that as they working on reimplementing Polonius within rustc (not a separate crate). It will use the original Polonius ideas but work in a different way.
Posted Dec 15, 2023 16:23 UTC (Fri)
by atnot (subscriber, #124910)
[Link] (2 responses)
Posted Dec 16, 2023 13:37 UTC (Sat)
by khim (subscriber, #9252)
[Link] (1 responses)
What “recent development” are you talking about? Polonius is already integrated into rustc and was on the way to becoming default in Rust 2024 only two months ago. What happened in these two months???
Posted Dec 16, 2023 14:21 UTC (Sat)
by atnot (subscriber, #124910)
[Link]
This is a recent development, because for the projects multi-year existence, the general consensus was that polonius-the-implementation was going to be used in rustc.
What has been implemented in rustc is, thus, polonius-the-mathematical-model.
Posted Dec 15, 2023 22:31 UTC (Fri)
by josh (subscriber, #17465)
[Link] (3 responses)
Posted Dec 16, 2023 13:40 UTC (Sat)
by khim (subscriber, #9252)
[Link] (2 responses)
What talks? Where? By whom? Who abandoned Polonius? Nightly Rust still accepts -Zpolonius flag today…
Posted Dec 18, 2023 20:12 UTC (Mon)
by SAI_Peregrinus (subscriber, #151778)
[Link] (1 responses)
Posted Dec 19, 2023 12:58 UTC (Tue)
by ssokolow (guest, #94568)
[Link]
Posted Dec 16, 2023 13:35 UTC (Sat)
by khim (subscriber, #9252)
[Link] (1 responses)
I think you are mixing story of Polonius and Chalk. Attempts to bring Chalk into rustc were, indeed, abandoned. And new replacement is supposed to be built, more tightly integrated with rustc. Polonius, on the other hand, was integrated long ago. It's not enabled by default, but that is supposed to replace current borrow checker in Rust 2024.
Posted Dec 16, 2023 20:14 UTC (Sat)
by josh (subscriber, #17465)
[Link]
Posted Dec 17, 2023 0:25 UTC (Sun)
by Phantom_Hoover (subscriber, #167627)
[Link] (14 responses)
Actually I am a bit confused by this part of the article:
> There is a wide range of existing GCC plugins that can aid in debugging, static analysis, or hardening; these work on the GCC intermediate representation.
Wouldn’t the GCC codegen for rustc be able to take advantage of this, given it’s emitting GCC IR?
Posted Dec 17, 2023 1:10 UTC (Sun)
by roc (subscriber, #30627)
[Link] (12 responses)
I think it will also be useful for bootstrapping.
Posted Dec 17, 2023 2:28 UTC (Sun)
by atnot (subscriber, #124910)
[Link] (11 responses)
Posted Dec 18, 2023 1:20 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (10 responses)
Posted Dec 18, 2023 2:13 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Dec 18, 2023 2:13 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (8 responses)
Posted Dec 19, 2023 14:05 UTC (Tue)
by rrolls (subscriber, #151126)
[Link] (7 responses)
Surely the approach of "the final recursive build will discover [any invalid borrows] and fail" isn't 100% safe, because if an invalid borrow exists it's _possible_ (unlikely but possible) that it creates an edge case that allows exactly that invalid borrow through the borrow checker that was just compiled, right?
I'd describe it as 99% safe, since in practice it'd be much more likely that an invalid borrow in the bootstrap compiler would cause some noticeable effect somewhere else and _not_ prevent its own discovery... but I think for 100% safety, you'd have no option but to write the borrow-checker in a lower-level, already-proven language.
Do correct me if I'm wrong, though!
Posted Dec 19, 2023 14:49 UTC (Tue)
by timon (subscriber, #152974)
[Link]
Being borrow-checked is more a property of the input (source code) rather than the output (binary), so I’d say you can reasonably treat it as orthogonal to your bootstrap chain.
The problem of having a not-borrow-checked borrow-checker is similar to the “trusting trust” problem. For countering the trusting trust problem, you can do “diverse double-compiling” [1].
I woud propose “borrowing borrow-checkers” as an even better countermeasure here. Just throw your bootstrap compiler Rust source at as many borrow-checkers as you can find, and optimally they should all agree whether your borrows are fine.
Posted Dec 19, 2023 15:17 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
The bootstrap compiler can only be safely used on sources that have been built successfully by a compiler on another platform that never went through a stage without a borrow checker; that way, if there is a bug caused by an invalid borrow (noting that borrow checking is purely a safety net - it does not affect codegen at all), the pre-existing compiler on another platform will have found it.
Posted Dec 19, 2023 16:36 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 19, 2023 16:37 UTC (Tue)
by steveklabnik (guest, #114343)
[Link] (2 responses)
At one point mrustc produced a byte-identical compiler to rustc, I am not sure if that's a one-time thing or if they always do that, though.
Posted Dec 20, 2023 8:20 UTC (Wed)
by rrolls (subscriber, #151126)
[Link] (1 responses)
One more complication, then. Is "safety" _purely_ a property of the source code, or does it depend on the environment it's being compiled in as well? For example, you use rustc to borrow-check your source code on a typical consumer-grade x86-64 linux platform, and now you compile the same code on some esoteric architecture and operating system. Presumably, the environment can affect, for example, how generics get instantiated, which might turn something that was deemed safe on the first platform into something that's not fine at all on the second? Or does Rust insist that code is only deemed safe if every possible instantiation is safe? I suppose you could request cross-compilation in your borrow-check step, which might address this, but it does add some complication.
Posted Dec 20, 2023 13:13 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
The part of safety that the borrow checker is responsible for is purely a property of the source code; it confirms that (for example) that the shared XOR mutable invariant is maintained by safe Rust.
Anything which can't be machine-checked at the source level as maintaining the safety invariants is supposed to marked with unsafe, which indicates that the human programmer is responsible for checking that things are safe. This means that anything outside Rust source (such as machine-specific primitives) should be marked as unsafe, and it's on the humans writing a safe wrapper to ensure that the wrapper maintains invariants at all times.
Rust does insist that all instantiations are either safe, or marked appropriately with unsafe; additionally, the caller of an instantiation of a generic that's marked with unsafe must mark their calling block with unsafe to indicate that they're OK with this. You cannot have an unsafe instantiation of a safe generic in safe code - you need the markers to tell Rust that you've thought about this and are going to uphold the language invariants, even if the compiler can't check your working (which is true of platform interfaces, for example, where the compiler can't check the platform behaviour).
Posted Dec 21, 2023 18:55 UTC (Thu)
by dvdeug (guest, #10998)
[Link]
So it's conceivable that a bad bootstrap compiler could break the borrow checker in rustc. It'd be unlikely to carry over to next generations, though.
Posted Dec 17, 2023 19:34 UTC (Sun)
by mcon147 (subscriber, #56569)
[Link]
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Bootstrapping and borrow-checking
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler
Progress toward a GCC-based Rust compiler