Yeah, LLVM is sweet

Posted Apr 17, 2010 0:32 UTC (Sat) by daglwn (guest, #65432)
In reply to: Yeah, LLVM is sweet by emk
Parent article: Dispatches from the compiler front

I hack on LLVM for my job and onm my own time. Just a few comments.

> LLVM is about the nicest compiler I've ever hacked on.

Agreed.

> LLVM's decision to use a strongly-typed assembly language with just a few
> opcodes is pure genius. It has a bunch of neat consequences:

It does, but to be fair, LLVM wasn't the first compiler to provide this.

> 1) The retention of basic type information throughout the entire
> compilation process makes it much easier to answer all sorts of
> questions, and to verify that the generated intermediate code makes some
> sort of sense.

Yes. LLVM's Verifier pass helps a ton. It's saved us many times.

> 2) The decision to use an assembly language (as opposed to an AST or a
> more exotic representation) makes it easy to dump the output from any
> optimization stage and examine it by hand.

Debatable. I've worked on compilers that use very high-level IR representations and it was usually easier to understand what the compiler was doing. One could grasp larger programs much more easily. There are tradeoffs. LLVM uses a low-level representation because the community wants to expose all kinds of machine-level micro-optimizations. IME, the debugging tools surrounding the compiler are as important, if not more, than the IR itself for fixing bugs.

> 3) The decision to use a _single_ assembly language (instead of the huge
> number of intermediate languages which seem to be used by GCC) makes it a
> lot easier for novices to find their way around the code base, and it
> means that you can build up large libraries of helper functions.

This statement simply isn't true. LLVM does not have a single IR. It has at least five now: the Instruction IR, the ScalarEvolution IR, the SelectionDAG/ScheduleDAG IR, the MachineInstr IR and the MCInst IR (from the MC project).

This isn't necessarily a bad thing. Different representations allow easier manipulations for certain phases. One can't represent machine instructions with the higher-level Instruction IR. There is some duplication, however. SCEV passes in particular duplicate a lot of logic other passes that use the LLVM IR already have.

> 4) The decision to use a _small_ assembly language means that any given
> optimizer only needs to know about a small, fixed set of instructions.

Again, there are tradeoffs. One is that to do anything machine-specific requires intrinsics and the optimizer doesn't understand those. There are certainly instructions I would like to see added to the IR (a robust vector representation, for example) but it's not critical right now. Instructions have been added over the course of the project. I predict we will see quite a few new ones over the next several years.

> LLVM is a really sweet compiler, and there's some friendly and super-
> productive hackers working on it. I think it has a great future, with or
> without GCC.

100% agreed. Not only is LLVM used in lots of projects, it's been able to spark a renewed interest in compiler technology among students. This is going to be critical as we keep packing more cores onto chips. The era of "free" speedup via higher clocks is over. The compiler is more important than ever.