|
|
Log in / Subscribe / Register

Cranelift code generation comes to Rust

Cranelift code generation comes to Rust

Posted Mar 16, 2024 10:51 UTC (Sat) by HadrienG (subscriber, #126763)
In reply to: Cranelift code generation comes to Rust by willy
Parent article: Cranelift code generation comes to Rust

If I understand the article correctly, the cranelift design is actually more amenable to parallelization than the LLVM one.

In a fixed pass order design like LLVM, there is an inherent sequential dependency chain, where each pass must run to completion before another pass can start. Each pass can, in principle, use parallelism internally, but usually parallelizing tiny workloads with running times in milliseconds leads to disappointing results due to task spawning and synchronization overheads dwarfing all the benefits of extra parallelism.

In cranelift's E-graph based design, on the other hand, it is in principle possible to repeatedly run all passes in parallel on the current E-graph[1] until a fixed point is reached. This will not use CPU time as efficiently as running them sequentially because each optimization pass will see less new input E-graph nodes on each run, and more pass runs will be needed to reach the fixed point, which will increase the costs associated with starting/ending passes. But if you are latency bound (no other compilation unit is being built concurrently), using CPUs inefficiently can be better than not using them at all.

Assuming this pass parallelization scheme works well, the running time would eventually be bottlenecked by the final fastest e-graph representation selection step, but if this pass were parallelizable too (and it is if it works by assigning each node a score and searching for the lowest score, or by comparing nodes with each other), that's not an issue.

Ultimately, assuming sufficient L3 cache capacity, larger builds will probably get better overall performance by using coarser-grained compilation unit based parallelism. I wonder how well build systems will cope with the mixing of different levels of parallelism that combining multiple compilation units with parallel compilation of individual compilation units produces.

---

[1] Fine-grained E-graph write synchronization can be used to ensure that each pass sees as many E-graph nodes from other passes as possible on startup, reducing the number of times each pass needs to run and the number of global joins (wait for all passes to finish before moving on) at the expense of a more complex synchronization protocol that will slow down individual accesses to the E-graph.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds