Compiling Rust with GCC: an update
gccrs
Philip Herron and Arthur Cohen started off with a discussion of gccrs, which is a Rust front-end for the GCC compiler. This project, Herron said, was initially started in 2014, though it subsequently stalled. It was restarted in 2019 and moved slowly until funding for two developers arrived thanks to Open Source Security and Embecosm. Since then, development has been happening much more quickly.
Currently, the developers are targeting Rust 1.49, which is a bit behind the state of the art — it was released at the end of 2020. In some ways the developers are still playing catch-up with even that older version; there are a number of intrinsics missing, for example. In other ways they are trying to get ahead of the game; there has been some work done on const generics that, so far, is really only a parser, "but it's a start". An experimental release of gccrs as part of GCC can be expected in May or June 2023.
Cohen talked about building the Rust for Linux project (the
integration of
Rust with
the Linux kernel) specifically. That project is currently targeting Rust
1.62, which is rather more recent than the 1.49 that gccrs is aiming at;
there is thus a fair amount of ground yet to cover even once gccrs hits
its target. There are not many differences in the language itself, he
said, but there are more in the libraries. Even with the official
compiler, Rust for Linux has to set the RUST_BOOTSTRAP variable to
gain access to unstable features; gccrs is trying to implement the ones
that are needed for the kernel. Generic
associated types are also needed.
Eventually, the goal is for gccrs to be able to compile Rust for Linux.
One thing he pointed out is that gccrs is making no attempt to implement the same compiler flags that the existing rustc compiler uses. That would be a difficult task and those options are "not GCCish". A separate wrapper is being implemented for the Cargo build system to allow it to invoke gccrs rather than rustc.
An important component for the kernel — and just about everything else — is the libcore library. It includes fundamental types like Option and Result, without which little can be done, for example. The liballoc library, which implements the Vector and Box types among others, is also needed. Cohen noted that this library has been customized for the kernel, but Rust for Linux developer Miguel Ojeda said that the changes are minimal.
Testing is currently done by compiling various projects with gccrs; these include the Blake3 crypto library and libcore 1.49. The rustc test suite is also being used. Plans are to add building Rust for Linux to the testing regime as well.
What else is missing at this point? Herron said that borrow checking is a big missing feature in current gccrs. Opaque types are not yet implemented. Plus, of course, there are a lot of bugs. Cohen added that the test suite needs work. A lot of the tests are intended to fail, so gccrs "passes" them, but for the wrong reason. He is working on adding the proper use of error codes so that only the right kinds of failures are seen as the correct behavior.
Future plans include a lot of cross-compiler testing. Eventually it would be good to start testing with Crater, which attempts to compile all of the crates found on crates.io, but that will take longer. With regard to borrow checking, Cohen added, they are not even trying to come up with their own implementation; instead they will be integrating Polonius. This, it is hoped, is a harbinger of more code sharing to be done with the Rust community in the future.
The code repository can be found on GitHub.
rust_codegen_gcc
Antoni Boucher then gave an update on his project, which is called rust_codegen_gcc. The rustc compiler is built on LLVM, he began, but it includes an API that allows the substitution of a different backend for code generation. That API can be used to hook libgccjit into rustc, enabling code generation with GCC. The biggest motivation for this work is to support architectures that LLVM cannot compile for. That is needed for Rust for Linux support across all of the architectures the kernel can be built for, and should be useful for other embedded targets as well.
Over the last year, rust_codegen_gcc has been merged into the rustc repository. It has gained support for global variables and 128-bit integers (though not yet in big-endian format). Support for SIMD operations has improved; it can compile the tests for x86-64 and pass most of them. It is also now possible to bootstrap rustc with rust_codegen_gcc, which is a big milestone — but some programs don't compile yet. Alignment support has improved, packed structs are supported, inline assembly support is getting better, and some function and variable attributes are supported. Boucher has added many intrinsics that GCC lacks and fixed a lot of crashes.
Almost all of the rustc tests pass now, and the situation is getting
better; most of the failures are with SIMD or with unimplemented features
like link-time optimization (LTO). With regard to SIMD, the necessary
additions
to libgccjit are done, as is the "vector shuffle" instruction. About 99%
of the LLVM SIMD intrinsics and half of the Rust SIMD intrinsics have been
implemented. A lot of fixes and improvements (128-bit integers, for
example) have gone into GCC from this work.
There is, of course, a lot still to be done. Outstanding tasks include unwinding for panic() support, proper debuginfo, LTO, and big-endian 128-bit integers. Support for the new architectures has to be added to the libraries, and SIMD support needs to be expanded beyond x86. There are function and variable attributes, including inline, that are yet to be implemented. Supporting distribution via rustup is also on the list.
There are a number of things that could be improved in the rustc API. For example, GCC distinguishes between lvalues (assignment targets) and rvalues, while LLVM just has "values"; that mismatch creates difficulties in places. LLVM has a concept of "landing pads" for exceptions, while GCC uses a try/catch mechanism. The handling of basic blocks is implicit in the API, but needs to be explicit in places. More fundamentally, GCC's API is based on the abstract syntax tree, while LLVM's is based on instructions, leading to confusion at times. LLVM uses the same aggregate operations for structs and arrays, while they are different in GCC.
On the libgccjit side, there is room for improvement in type introspection, including attributes. The time required for code generation and the resulting binary size are both worse than with LLVM. There are optimizations that are missed by libgccjit as well.
Boucher demonstrated the compilation of some basic Rust kernel modules with the GCC-backed compiler. Wedson Almeida Filho asked whether any testing had been done with an architecture that is not supported by LLVM, but that has not yet happened. There will probably be "details to deal with" when that test is done, Boucher said.
There are some potential complications for Rust for Linux. The ABI used by the generated code differs on some platforms. There is also the question of whether backports should be done to support older versions of the compiler. That is complicated by the fact that patches to GCC are needed to make rust_codegen_gcc work now.
Herron and Cohen joined Boucher at the end of the session, where they were asked about their timelines. Herron answered that it will require most of a year for gccrs to get to the point where rust_codegen_gcc is now. When asked about compilation times, Cohen said that benchmarks would not be meaningful now, when the focus is still on just getting something to work. He expects that gccrs is "probably very slow" at this point. Boucher said that rust_codegen_gcc is slower than rustc, but that there are optimizations yet to be done to improve the situation.
[Thanks to LWN subscribers for supporting my travel to this event.]
| Index entries for this article | |
|---|---|
| Conference | Kangrejos/2022 |
