By Jake Edge
March 21, 2012
The LLVM compiler infrastructure project and
its Clang C front-end have been making
strides over the last few years, to the point where some distributions are
looking into using these tools more widely. We have already seen
efforts to build the Linux kernel with Clang, but members
of the Fedora
and Debian communities (at least) are discussing going beyond that and
building the
entire distribution with an LLVM-based toolchain. While there are obvious
benefits to trying that, it will likely be a ways off—if
ever—before the
benefits outweigh the costs of such a move.
The LLVM project started in 2000, but it was its adoption by Apple in 2005
that spurred much of its growth. The notoriously GPL-averse company has
wanted to move away from using GCC for some time, and the BSD-ish licensed
LLVM is the path that
it has chosen. It is more than just a licensing issue, though, as LLVM
has some technical advantages as well. But it is thought that GCC's move to
GPLv3, with its more explicit patent provisions and anti-Tivo-ization language, has made the GCC to LLVM
move that much more important to Apple. In any case, Xcode 4, the most
recent version of the tool set shipped
for building Mac OS X and iOS applications, now includes LLVM rather
than GCC.
That change led "jonathan" to post a query
to the fedora-devel mailing list about the status of using LLVM for
building Linux packages. The query was a bit cryptic, but it spurred an
interesting discussion about where LLVM is, and why (or why not) a
distribution like Fedora might be interested in heading down that path.
Matthew Garrett pointed out that LLVM is
already used by the hfsplus-tools package (which provide utilities for the
HFS+ filesystem) and the Mesa software rasterizer (llvmpipe), but beyond that:
In
terms of it being the general compiler - it needs to work on all the
architectures we care about, it needs to have a level of maintenance in
Fedora at least as good as gcc, it needs to build better code than gcc
and (most importantly) it needs somebody to actually take responsibility
for proving all of that and making the transition happen.
In essence, Garrett is outlining the requirements for any new feature to be
adopted into Fedora. Certainly performance is one of the advantages that
is touted for LLVM; Apple claims that it compiles twice as fast as GCC
while producing faster code. Undoubtedly, there are some programs where
that is the case, but Adam Jackson has done some testing with the X server,
and didn't find any huge performance increases for the LLVM-generated code:
I have actually
tried building xserver with clang and running the standard set of
microbenchmarks. I found one relevant path where the clang build was
~15% faster [1]. Something like 60% of the rest were within ±3%. For
everything else clang was uniformly worse by usually about 5%.
This isn't especially surprising. Both llvm and gcc have a robust set
of high quality optimization passes. Changing compiler is in this sense
little different from changing CFLAGS. [...] The
performance problems in Linux - in software in general - are almost
always algorithmic, and no compiler is going to magically fix broken
algorithms.
(Jackson's footnote says that he should re-run the tests and file a GCC bug.)
In addition to Jackson's tests, Vladimir Makarov has done some benchmarking of GCC
vs. LLVM (see the links in the lower left) that don't seem to show any
major performance advantages for LLVM—in fact GCC looks like it more
than holds it own. In addition, as he notes, the compilation speed comparison is often
done with both compilers using the -O2 optimization level, but
that it is fairer to compare GCC's -O1 with LLVM's -O2 in
terms of generated code quality. When doing so, the 2x speed increase
touted by Apple disappeared in his tests.
Debian developer Sylvestre Ledru has run an experiment
rebuilding the Debian archive with LLVM. His focus was not on
benchmarks, rather it was looking at how easily LLVM could handle the large
diversity of C/C++ code in the archive. He was surprised to find
relatively few problems: "I was expecting many issues and bugs caused
by clang but I have been surprised to notice that most of the issues are
either difference in C standard supported, difference of interpretation or
corner cases."
Of the 15,600+ packages built, nearly 1400 failed
and Ledru documented the kinds of
problems he found. Some of the failures were for things like warnings that
Clang emitted that GCC didn't, resulting in compilation failure because of
the -Werror flag (turn all warnings into errors). In addition, he
pointed out that the problems building packages are being fixed rapidly.
The Clang 2.9 release in September 2011 failed on 14.5% of packages,
while the 3.0 release from January only failed on 8.8%.
Those warnings produced by Clang are actually part of what Ledru was after
with his test. One of the aims of LLVM is to generate better warnings and
diagnostic
information, which will be useful even for packages and distributions that
never intend to use LLVM for production. In the end, Ledru concluded:
My personal opinion is that clang is now stable and good enough to rebuild most of the packages in the Debian archive, even if many of them will need minor tweaks to compile properly.
In the next few years, coupled with better static analysis tools, clang
might replace gcc/g++ as the C/C++ compiler used by default in Linux and
BSD distributions.
Whether that happens remains to be seen, of course, and in any event, it's
not likely to happen overnight. But LLVM is definitely improving, and some
of the BSDs are working on switching permanently (FreeBSD, for example;
OpenBSD still seems most interested in pcc). Not having a BSD-licensed
compiler that can build the kernel and user space has always been somewhat
controversial in BSD-land, so switching to LLVM, which has a non-copyleft
license (the
University
of Illinois/NCSA Open Source License), would be a step in the right
direction.
There are definitely still plenty of hurdles to clear, especially for a
distribution like Debian that supports lots of different architectures.
GCC has a multi-year head start on supporting various CPU architectures, so
LLVM may not be available for all of Debian's needs. Jackson pointed out
some other deficiencies with LLVM, at least from his perspective:
Also, LLVM doesn't support anything newer than DWARF3. I'm not thrilled
about the idea of generating worse code _and_ worse debugging info.
Particularly not if it means switching to a compiler written in a far
worse language, with far less tribal knowledge in the community I have
to interact with.
One thing that the rise of LLVM has done is to provide competition for GCC,
which has certainly been helpful in pushing GCC in new and interesting
directions. As Jackson put it:
It's nicely lit a fire under gcc's ass about
plugins being a thing that we really have needed for the last 20+ years,
dammit. Maybe someday it'll prompt gcc into being usable as a JIT too.
It happens to be the code generation backend for a number of languages,
and it's an okay JIT which is pretty sweet for things like llvmpipe.
While the Linux kernel has been mostly built using LLVM, it is still a work in
progress. GCC is still required to build some parts and there are changes
needed to LLVM before that can change. Jeff Garzik said that he has been working on it, but "LLVM still needs several obscure
compiler changes before we can even boot a no-op kernel".
There is little question that experiments and tests with LLVM are a good
thing. Various bugs, in packages and compilers, will be ferreted out and
both GCC and LLVM will improve. Some are concerned that Apple will rule
the project in ways that could be detrimental to other projects (a
la CUPS, which was mentioned by several
fedora-devel posters), but there is little evidence of that occurring. It
is free software, so, if that happens, there will be no barriers to
continuing development. A bigger problem could be patents, which are not
addressed by the LLVM license—some day a contributing patent-holder
could potentially start suing. But that isn't a problem that is confined
to LLVM.
It will be interesting to see how the adoption of LLVM goes. It seems
likely to start with FreeBSD, but Linux distributions may eventually try it
out as well. There will need to be compelling reasons to do so, but with
the progress the compiler suite has made, those reasons may not be all that
long in coming. Until correctly working kernels can be built,
distributions obviously can't switch over completely. But, a complete
switch is not
necessarily a requirement, and there is nothing stopping interested
distributions from building some of their packages from LLVM today.
(
Log in to post comments)