The performance of the Rust compiler

By Daroc Alden
October 28, 2024

RustConf 2024

Sparrow Li presented virtually at RustConf 2024 about the current state of and future plans for the Rust compiler's performance. The compiler is relatively slow to compile large programs, although it has been getting better over time. The next big performance improvement to come will be parallelizing the compiler's parsing, type-checking, and related operations, but even after that, the project has several avenues left to explore.

As the projects written using Rust get larger, compilation latency has become increasingly important. The Rust compiler's design causes slow compilation, Li said, when compared to languages with implementations suitable for fast prototyping such as Python or Go. This slowness does have a cause — the compiler performs many sophisticated analyses and transformations — but it would be easier to develop with Rust if the language could be compiled faster.

To give some idea of how badly Rust performs compared to other languages, Li presented a graph showing the compile time of programs with different numbers of functions across several languages. Rust's line was noticeably steeper. For small numbers of functions, it was no worse than other languages, but compile times increased much more as the Rust programs grew larger. Improving compile times has become an area of focus for the whole community, Li said.

This year, the speed of the compiler has more than doubled. Most of this is due to small, incremental changes by a number of developers. Part of this improvement is also from work on tools like rustc-perf, which shows developers how their pull requests to the Rust compiler affect compile times. The efforts of the parallel-rustc working group (in which Li participates) are now available in the nightly language builds, and end up providing a 30% performance improvement in aggregate. Those improvements should land in stable Rust by next year.

There are a handful of other big improvements that will be available soon. Profile-guided optimization provides (up to) a 20% speedup on Linux. BOLT, a post-link optimizer, (which was the topic of a talk that LWN covered recently) can speed things up by 5-10%, and link-time optimization can add another 5%, Li said. The biggest improvement at this point actually comes from switching to the lld linker, however, which can improve end-to-end compilation time by 40%.

The majority of the speedups have not been from big-ticket items, however. "It is the accumulation of small things" that most improves performance, Li said. These include introducing laziness to some areas, improved hashing and hash tables, changing the layout of the types used by the compiler, and more. Unfortunately, code optimizations like these require a good understanding of the compiler, and become rarer over time as people address the lower-hanging fruit.

So, to continue improving performance, at some point the community will need to address the overall design of the compiler, Li said. Parallelization is one promising avenue. Cargo, Rust's build tool, already builds different crates in parallel, but it is often the case that a build will get hung up waiting for one large, commonly-used crate to build, so parallelism inside the compiler is required as well. There is already some limited parallelism in the compiler back-end (the part that performs code generation), but the front-end (which handles parsing and type-checking) is still serial. The close connection between different front-end components makes parallelizing it difficult — but the community would still like to try.

Proposed in 2018 and established in 2019, the parallel-rustc working group has been planning how to tackle the problem of parallelizing the front-end of the compiler for some years. The group finally resolved some of the technical difficulties in 2022, and landed an implementation in the nightly version of the compiler in 2023. The main focus of its work has been the Rust compiler's query system, which is now capable of taking advantage of multiple threads. The main technical challenge to implementing that was a performance loss when running in a single-threaded environment. The solution the working group came up with was to create data structures that can switch between single-threaded and thread-safe implementations at run time. This reduced the single-threaded overhead to only 1-3%, which was judged to be worth it in exchange for the improvement to the multithreaded case.

The query system was not the only component that needed to adapt, however. The working group also needed to change the memory allocator — while the default allocator was thread-safe, having too many threads was causing contention. The group solved this by using separate allocation pools for each thread, but Li warned that this wasn't the only source of cross-thread contention. When the compiler caches the result of a query, that result can be accessed from any thread, so there is still a good deal of cross-thread memory traffic there. The working group was able to partially address that by sharding the cache, but there's still room for improvements.

While performance improves when using up to eight threads in testing, Li said, performance actually goes down between eight and sixteen threads; there are too many inefficiencies from synchronization. The project has considered switching to a work-stealing implementation so that threads do not spend as much time waiting for each other, but that is hard to do without introducing deadlocks. Finally, some front-end components are still single-threaded — particularly macro expansion. At some point, those will need to be parallelized as well.

In all, though, Li thought that Rust's compilation performance has "a shiny future". There are a handful of approaches that still need to be explored, such as improving the linker, using thin link-time optimization, etc., but the community is working hard on implementing these. Rust's incremental compilation could also use some improvement, Li said. C++ does a better job there, because Rust's lexing, parsing, and macro expansion are not yet incremental.

Li finished the talk with a few areas that are not currently being looked into that might be fruitful to investigate: better inter-crate sharing of compilation tasks, cached binary crates, and refactoring the Rust compiler to reduce the amount of necessary global context. These "deserve a lot of deep thought", Li said. Ultimately, while the Rust compiler's performance is worse than many other languages, it's something that the project is aware of and actively working on. Performance is improving, and despite the easy changes becoming harder to find, it is likely to continue improving.

Index entries for this article
Conference	RustConf/2024

Build artefacts aren't cached?

Posted Oct 28, 2024 21:40 UTC (Mon) by Karellen (subscriber, #67644) [Link] (9 responses)

it is often the case that a build will get hung up waiting for one large, commonly-used crate to build

...

Li finished the talk with a few areas [...] that might be fruitful to investigate: [...] cached binary crates

I find it very surprising that this is not already the case.

Build artefacts aren't cached?

Posted Oct 29, 2024 10:03 UTC (Tue) by Wol (subscriber, #4433) [Link] (8 responses)

Okay, I gather the way Rust works makes it hard to create object files, but would it be possible to cache things at the point you cease processing a source file?

Something along the lines of "pull in all include files, run md5 over the resulting artifact, and if there's a cache with that name that's the processed output. If there's no cache, run the 'source to IR' pass, plus any local optimisations, and dump that to cache".

Dunno whether it's practical, or how much it would save, but it would certainly save reprocessing the same source time and again, and if there's a decent bunch of optimisations you can run before you start linking with other files, it would save that too ...

Cheers,
Wol

Build artefacts aren't cached?

Posted Oct 29, 2024 11:46 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

For that, `sccache` works fairly well. It only does source -> object caching, but another intermediate level of caching might be worth it; needs measurements to be sure though.

Build artefacts aren't cached?

Posted Oct 29, 2024 13:10 UTC (Tue) by intelfx (subscriber, #130118) [Link]

> For that, `sccache` works fairly well. It only does source -> object caching

sccache works on crate level, not source file level. That’s exactly the opposite of what GP wanted (and it’s far from perfect, at that).

Build artefacts aren't cached?

Posted Oct 29, 2024 14:31 UTC (Tue) by josh (subscriber, #17465) [Link] (5 responses)

Rust does something like this (but at an even more fine-grained level) when doing incremental compilation. And not just at a file level: it even avoids recompiling functions which haven't changed within files that have.

Build artefacts aren't cached?

Posted Oct 29, 2024 14:59 UTC (Tue) by Wol (subscriber, #4433) [Link] (4 responses)

Is that a well known thing then?

Sounds like that's almost what people are looking for when iterating through the development process.

Cheers,
Wol

Build artefacts aren't cached?

Posted Oct 30, 2024 16:09 UTC (Wed) by josh (subscriber, #17465) [Link] (3 responses)

Incremental is on by default in dev builds (which are the default). They're off by default in release builds, for various reasons. That's something we may want to reconsider.

Build artefacts aren't cached?

Posted Oct 30, 2024 16:31 UTC (Wed) by Wol (subscriber, #4433) [Link]

That actually makes a lot of sense.

So small changes while coding should compile pretty quickly.

Personally, I'd be a lot happier with "incremental=off" for release builds, simply because it gives an aura (maybe false?) of reproducibility. Or rather, it addresses the assumption that dev builds can't be reproducible "because".

Cheers,
Wol

Build artefacts aren't cached?

Posted Oct 30, 2024 16:39 UTC (Wed) by intelfx (subscriber, #130118) [Link] (1 responses)

> They're off by default in release builds, for various reasons. That's something we may want to reconsider.

Is there something that elaborates on these reasons?

Build artefacts aren't cached?

Posted Oct 31, 2024 9:26 UTC (Thu) by taladar (subscriber, #68407) [Link]

I believe the main reason is that some optimizations work better when they can consider the whole instead of just an incremental part.

Are those languages really more suitable for fast prototyping?

Posted Oct 29, 2024 8:51 UTC (Tue) by taladar (subscriber, #68407) [Link] (8 responses)

> The Rust compiler's design causes slow compilation, Li said, when compared to languages with implementations suitable for fast prototyping such as Python or Go.

I question this commonly made statement.

When fast prototyping you usually do not have huge programs.

You also do not know the structure of your program very well so a compiler that checks for you that you don't use your data structures or functions wrong saves a lot of time when compared to only figuring that out when you reach that point in the code at runtime, even if the compiler run is slow compared to other languages.

When prototyping you also want to later be able to read the resulting code to turn it into the actual production implementation, spamming copy&paste code all over the place a la Go error handling or similarly verbose, less expressive languages seems counter-productive to that goal.

And last but not least, you wouldn't want to prototype in a language that is fundamentally very different from one suitable for the final implementation, so the whole idea to prototype in one language and then write the actual implementation in another is questionable at best.

Are those languages really more suitable for fast prototyping?

Posted Oct 29, 2024 10:07 UTC (Tue) by aragilar (subscriber, #122569) [Link] (2 responses)

It naturally depends on what you are prototyping, but as far as I know no-one has written a rust REPL, so being able to get instant feedback (because you just ran the code you're going to run) is much better than waiting some amount of time for the compiler to finish. I've done this numerous times in Python, and if you're willing to install tools like bpython or ipython where you can dump your session to a file, and with a bit of cleanup you have an initial working version.

As for prototyping in one (slow, interpreted) language and then implement the core algorithms (or all of it) in a fast language, this is pretty common in the scientific computing space, as it helps with testing and validation of the fast code.

Are those languages really more suitable for fast prototyping?

Posted Oct 30, 2024 9:15 UTC (Wed) by taladar (subscriber, #68407) [Link] (1 responses)

> as far as I know no-one has written a rust REPL

There are a few projects for that kind of thing but I haven't tried them, just came across them the other day when looking for something unrelated.

https://lib.rs/crates/irust

https://lib.rs/crates/thag_rs

Are those languages really more suitable for fast prototyping?

Posted Oct 31, 2024 9:50 UTC (Thu) by moltonel (subscriber, #45207) [Link]

I sometimes use https://github.com/evcxr/evcxr, it even has a jupyter kernel. Another option is https://play.rust-lang.org/ (which can be run locally as well), it's not a repl but it's great for quick experiments.

Are those languages really more suitable for fast prototyping?

Posted Oct 29, 2024 10:09 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

It matches my experience of working in Rust and Python; I prefer to write in Rust, because I'm rarely genuinely writing a prototype, but rather writing "version 0" of a program, and I will not normally be given time to rewrite in another language later. But if I was genuinely allowed to prototype things most of the time, I'd be writing the prototype to throw away in Python, and writing "version 0" in Rust.

Yesterday, I wrote a short Rust program (10 lines of my code, plus dependencies) to create a PNG representing squared paper. It took about 15 minutes computer time (and about 30 minutes elapsed time), of which about 5 was spent waiting on the compiler to compile my code as the person I was doing this for changed their mind about what the PNG should look like (colour/greyscale, line thickness, square size).

Today, after reading your comment, I redid the same work in Python, including making the same changes; it took under 10 minutes, because changes to my code were near-instant.

Neither program did a significant amount of compute (after all, when prototyping, you don't work on large data sets if you can avoid it), both came up with the same result. The Rust program uses significantly less CPU time when compiled in the debug profile, but neither of them used enough CPU time to make a difference to me as the person running the code. And the Rust program is more robust against errors if I come back round to it - e.g. instead of using a naming convention, as in Python, Rust's type system will prevent me from modifying constants, or assigning to the wrong thing - but I'm very unlikely to ever run this code ever again.

I still used Rust first, because I'm habituated to working in environments where "never run this code again" is a dangerous statement, but if Rust could build faster, it'd also have been the faster language to use. Recompiling my code, there are two things that would speed it up considerably:

Python used precompiled dependencies for PyPNG, and thus didn't cost me 20 seconds just building dependencies.
Python's runtime was about 1 second for each experiment; Rust in dev profile took about 250 ms to determine that no building needed to take place, and about 1 second (according to cargo build) to build and link my program after a change to a constant. On the other hand, Rust takes about 800 ms to run in dev profile, where Python took 1 second - and just for fun, I rebuilt in release profile, which ran in 11 ms, but increased the "change to a constant" time to about 2 seconds.

If Rust could get the wall-clock time for a small change down below 200 ms, it'd be faster than Python for prototyping here on this laptop. And that's not impossible with parallelism; for most of the build time, I have only one CPU thread of the 16 on this laptop saturated; if Rust in dev profile could saturate 5 CPU threads on average, then Rust in dev profile (1 CPU-second) would have a faster edit/compile/run loop than Python does.

And note that I'm mostly ignoring the time it takes to build dependencies here; at least in theory, Rust could eliminate that 20 seconds of highly parallel work by downloading pre-compiled artefacts instead of building from scratch.

Are those languages really more suitable for fast prototyping?

Posted Oct 30, 2024 9:19 UTC (Wed) by taladar (subscriber, #68407) [Link] (1 responses)

Okay, a batch program like that where you run it and it immediately does what you want is about the optimal case for Python though since it minimizes runtime for testing.

If you have some sort of GUI or web app where you need to navigate to your functionality to test each time those few seconds of compilation time are quickly dwarfed by that repetitive walltime-overhead during the test run.

Are those languages really more suitable for fast prototyping?

Posted Oct 30, 2024 13:41 UTC (Wed) by farnz (subscriber, #17727) [Link]

But by the time I'm putting together a GUI that needs testing, and navigation to functionality, Python is still faster because I run the prototype inside pdb or similar, and tweak it "live", instead of having to recompile and navigate again. Once I've finished prototyping, I might have to restart and re-navigate a few times, but that's when I'm going from the prototype to version 0.

Are those languages really more suitable for fast prototyping?

Posted Oct 29, 2024 18:11 UTC (Tue) by Paf (subscriber, #91811) [Link] (1 responses)

I regularly prototype small changes to a large program, so compiler speed is a concern (or at least an annoyance) for me. I think lots of folks who work on large projects are the same?

Are those languages really more suitable for fast prototyping?

Posted Oct 30, 2024 9:21 UTC (Wed) by taladar (subscriber, #68407) [Link]

I don't think changes to existing programs are the type of prototyping people mean when they say they would choose one language or the other for prototyping. That sort of implies that you do not have to stay compatible with an existing code base.

C++ incremental compilation vs Rust’s

Posted Oct 29, 2024 11:54 UTC (Tue) by mmechri (subscriber, #95694) [Link]

> Rust's incremental compilation could also use some improvement, Li said. C++ does a better job there, because Rust's lexing, parsing, and macro expansion are not yet incremental.

I didn’t understand the comparison with C++. What is it that C++ does better than Rust when it comes to incremental compilation?

Why parallelise the parser?

Posted Nov 1, 2024 13:58 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

Does Rust generate huge output from text-macro processing? Or what makes the parsing such a time-consuming operation that parallel processing of that part appears to be a promising way towards smaller elapsed compile times? My expectation is that the compiler spends nearly all of its time in the checking parts and in the LLVM back end.

Why parallelise the parser?

Posted Nov 1, 2024 15:49 UTC (Fri) by atnot (subscriber, #124910) [Link]

My probably outdated memory is that the main point is being able to kick stuff to LLVM sooner. As mentioned the codegen is parallel already, but those threads only get spawned sequentially and all of them have to finish, across all crates, before linking can start. This means that if, say, you get unlucky and a heavy task gets started towards the end, you'll be waiting a while. With a parallel frontend you get to start those tasks earlier, which indeed doesn't affect the bulk of the runtime, but lessens the impact from stagglers. The parser is particularly relevant bottleneck there because you can't really start to divide up the work until you've traced all of the modules and imports and know what you need to compile.