The performance of the Rust compiler
Sparrow Li presented virtually at RustConf 2024 about the current state of and future plans for the Rust compiler's performance. The compiler is relatively slow to compile large programs, although it has been getting better over time. The next big performance improvement to come will be parallelizing the compiler's parsing, type-checking, and related operations, but even after that, the project has several avenues left to explore.
As the projects written using Rust get larger, compilation latency has become increasingly important. The Rust compiler's design causes slow compilation, Li said, when compared to languages with implementations suitable for fast prototyping such as Python or Go. This slowness does have a cause — the compiler performs many sophisticated analyses and transformations — but it would be easier to develop with Rust if the language could be compiled faster.
To give some idea of how badly Rust performs compared to other languages, Li presented a graph showing the compile time of programs with different numbers of functions across several languages. Rust's line was noticeably steeper. For small numbers of functions, it was no worse than other languages, but compile times increased much more as the Rust programs grew larger. Improving compile times has become an area of focus for the whole community, Li said.
This year, the speed of the compiler has more than doubled. Most of this is due to small, incremental changes by a number of developers. Part of this improvement is also from work on tools like rustc-perf, which shows developers how their pull requests to the Rust compiler affect compile times. The efforts of the parallel-rustc working group (in which Li participates) are now available in the nightly language builds, and end up providing a 30% performance improvement in aggregate. Those improvements should land in stable Rust by next year.
There are a handful of other big improvements that will be available soon. Profile-guided optimization provides (up to) a 20% speedup on Linux. BOLT, a post-link optimizer, (which was the topic of a talk that LWN covered recently) can speed things up by 5-10%, and link-time optimization can add another 5%, Li said. The biggest improvement at this point actually comes from switching to the lld linker, however, which can improve end-to-end compilation time by 40%.
The majority of the speedups have not been from big-ticket items, however. "It
is the accumulation of small things
" that most improves performance, Li
said. These include introducing laziness to some areas, improved hashing and
hash tables, changing the layout of the types used by the compiler, and more.
Unfortunately, code optimizations like these require a good understanding of the
compiler, and become rarer over time as people address the lower-hanging fruit.
So, to continue improving performance, at some point the community will need to address the overall design of the compiler, Li said. Parallelization is one promising avenue. Cargo, Rust's build tool, already builds different crates in parallel, but it is often the case that a build will get hung up waiting for one large, commonly-used crate to build, so parallelism inside the compiler is required as well. There is already some limited parallelism in the compiler back-end (the part that performs code generation), but the front-end (which handles parsing and type-checking) is still serial. The close connection between different front-end components makes parallelizing it difficult — but the community would still like to try.
Proposed in 2018 and established in 2019, the parallel-rustc working group has been planning how to tackle the problem of parallelizing the front-end of the compiler for some years. The group finally resolved some of the technical difficulties in 2022, and landed an implementation in the nightly version of the compiler in 2023. The main focus of its work has been the Rust compiler's query system, which is now capable of taking advantage of multiple threads. The main technical challenge to implementing that was a performance loss when running in a single-threaded environment. The solution the working group came up with was to create data structures that can switch between single-threaded and thread-safe implementations at run time. This reduced the single-threaded overhead to only 1-3%, which was judged to be worth it in exchange for the improvement to the multithreaded case.
The query system was not the only component that needed to adapt, however. The working group also needed to change the memory allocator — while the default allocator was thread-safe, having too many threads was causing contention. The group solved this by using separate allocation pools for each thread, but Li warned that this wasn't the only source of cross-thread contention. When the compiler caches the result of a query, that result can be accessed from any thread, so there is still a good deal of cross-thread memory traffic there. The working group was able to partially address that by sharding the cache, but there's still room for improvements.
While performance improves when using up to eight threads in testing, Li said, performance actually goes down between eight and sixteen threads; there are too many inefficiencies from synchronization. The project has considered switching to a work-stealing implementation so that threads do not spend as much time waiting for each other, but that is hard to do without introducing deadlocks. Finally, some front-end components are still single-threaded — particularly macro expansion. At some point, those will need to be parallelized as well.
In all, though, Li thought that Rust's compilation performance has "a shiny
future
". There are a handful of approaches that still need to be explored,
such as improving the linker, using thin link-time optimization, etc., but the
community is working hard on implementing these. Rust's incremental compilation
could also use some improvement, Li said. C++ does a better job there, because
Rust's lexing, parsing, and macro expansion are not yet incremental.
Li finished the talk with a few areas that are not currently being looked into
that might be fruitful to investigate: better inter-crate sharing of compilation
tasks, cached binary crates, and refactoring the Rust compiler to reduce the
amount of necessary global context. These "deserve a lot of deep
thought
", Li said. Ultimately, while the Rust compiler's performance is worse than
many other languages, it's something that the project is aware of and actively
working on. Performance is improving, and despite the easy changes becoming
harder to find, it is likely to continue improving.
| Index entries for this article | |
|---|---|
| Conference | RustConf/2024 |
