|
|
Subscribe / Log in / New account

Distributions looking at LLVM

By Jake Edge
March 21, 2012

The LLVM compiler infrastructure project and its Clang C front-end have been making strides over the last few years, to the point where some distributions are looking into using these tools more widely. We have already seen efforts to build the Linux kernel with Clang, but members of the Fedora and Debian communities (at least) are discussing going beyond that and building the entire distribution with an LLVM-based toolchain. While there are obvious benefits to trying that, it will likely be a ways off—if ever—before the benefits outweigh the costs of such a move.

The LLVM project started in 2000, but it was its adoption by Apple in 2005 that spurred much of its growth. The notoriously GPL-averse company has wanted to move away from using GCC for some time, and the BSD-ish licensed LLVM is the path that it has chosen. It is more than just a licensing issue, though, as LLVM has some technical advantages as well. But it is thought that GCC's move to GPLv3, with its more explicit patent provisions and anti-Tivo-ization language, has made the GCC to LLVM move that much more important to Apple. In any case, Xcode 4, the most recent version of the tool set shipped for building Mac OS X and iOS applications, now includes LLVM rather than GCC.

That change led "jonathan" to post a query to the fedora-devel mailing list about the status of using LLVM for building Linux packages. The query was a bit cryptic, but it spurred an interesting discussion about where LLVM is, and why (or why not) a distribution like Fedora might be interested in heading down that path. Matthew Garrett pointed out that LLVM is already used by the hfsplus-tools package (which provide utilities for the HFS+ filesystem) and the Mesa software rasterizer (llvmpipe), but beyond that:

In terms of it being the general compiler - it needs to work on all the architectures we care about, it needs to have a level of maintenance in Fedora at least as good as gcc, it needs to build better code than gcc and (most importantly) it needs somebody to actually take responsibility for proving all of that and making the transition happen.

In essence, Garrett is outlining the requirements for any new feature to be adopted into Fedora. Certainly performance is one of the advantages that is touted for LLVM; Apple claims that it compiles twice as fast as GCC while producing faster code. Undoubtedly, there are some programs where that is the case, but Adam Jackson has done some testing with the X server, and didn't find any huge performance increases for the LLVM-generated code:

I have actually tried building xserver with clang and running the standard set of microbenchmarks. I found one relevant path where the clang build was ~15% faster [1]. Something like 60% of the rest were within ±3%. For everything else clang was uniformly worse by usually about 5%.

This isn't especially surprising. Both llvm and gcc have a robust set of high quality optimization passes. Changing compiler is in this sense little different from changing CFLAGS. [...] The performance problems in Linux - in software in general - are almost always algorithmic, and no compiler is going to magically fix broken algorithms.

(Jackson's footnote says that he should re-run the tests and file a GCC bug.)

In addition to Jackson's tests, Vladimir Makarov has done some benchmarking of GCC vs. LLVM (see the links in the lower left) that don't seem to show any major performance advantages for LLVM—in fact GCC looks like it more than holds it own. In addition, as he notes, the compilation speed comparison is often done with both compilers using the -O2 optimization level, but that it is fairer to compare GCC's -O1 with LLVM's -O2 in terms of generated code quality. When doing so, the 2x speed increase touted by Apple disappeared in his tests.

Debian developer Sylvestre Ledru has run an experiment rebuilding the Debian archive with LLVM. His focus was not on benchmarks, rather it was looking at how easily LLVM could handle the large diversity of C/C++ code in the archive. He was surprised to find relatively few problems: "I was expecting many issues and bugs caused by clang but I have been surprised to notice that most of the issues are either difference in C standard supported, difference of interpretation or corner cases."

Of the 15,600+ packages built, nearly 1400 failed and Ledru documented the kinds of problems he found. Some of the failures were for things like warnings that Clang emitted that GCC didn't, resulting in compilation failure because of the -Werror flag (turn all warnings into errors). In addition, he pointed out that the problems building packages are being fixed rapidly. The Clang 2.9 release in September 2011 failed on 14.5% of packages, while the 3.0 release from January only failed on 8.8%.

Those warnings produced by Clang are actually part of what Ledru was after with his test. One of the aims of LLVM is to generate better warnings and diagnostic information, which will be useful even for packages and distributions that never intend to use LLVM for production. In the end, Ledru concluded:

My personal opinion is that clang is now stable and good enough to rebuild most of the packages in the Debian archive, even if many of them will need minor tweaks to compile properly. In the next few years, coupled with better static analysis tools, clang might replace gcc/g++ as the C/C++ compiler used by default in Linux and BSD distributions.

Whether that happens remains to be seen, of course, and in any event, it's not likely to happen overnight. But LLVM is definitely improving, and some of the BSDs are working on switching permanently (FreeBSD, for example; OpenBSD still seems most interested in pcc). Not having a BSD-licensed compiler that can build the kernel and user space has always been somewhat controversial in BSD-land, so switching to LLVM, which has a non-copyleft license (the University of Illinois/NCSA Open Source License), would be a step in the right direction.

There are definitely still plenty of hurdles to clear, especially for a distribution like Debian that supports lots of different architectures. GCC has a multi-year head start on supporting various CPU architectures, so LLVM may not be available for all of Debian's needs. Jackson pointed out some other deficiencies with LLVM, at least from his perspective:

Also, LLVM doesn't support anything newer than DWARF3. I'm not thrilled about the idea of generating worse code _and_ worse debugging info. Particularly not if it means switching to a compiler written in a far worse language, with far less tribal knowledge in the community I have to interact with.

One thing that the rise of LLVM has done is to provide competition for GCC, which has certainly been helpful in pushing GCC in new and interesting directions. As Jackson put it:

It's nicely lit a fire under gcc's ass about plugins being a thing that we really have needed for the last 20+ years, dammit. Maybe someday it'll prompt gcc into being usable as a JIT too. It happens to be the code generation backend for a number of languages, and it's an okay JIT which is pretty sweet for things like llvmpipe.

While the Linux kernel has been mostly built using LLVM, it is still a work in progress. GCC is still required to build some parts and there are changes needed to LLVM before that can change. Jeff Garzik said that he has been working on it, but "LLVM still needs several obscure compiler changes before we can even boot a no-op kernel".

There is little question that experiments and tests with LLVM are a good thing. Various bugs, in packages and compilers, will be ferreted out and both GCC and LLVM will improve. Some are concerned that Apple will rule the project in ways that could be detrimental to other projects (a la CUPS, which was mentioned by several fedora-devel posters), but there is little evidence of that occurring. It is free software, so, if that happens, there will be no barriers to continuing development. A bigger problem could be patents, which are not addressed by the LLVM license—some day a contributing patent-holder could potentially start suing. But that isn't a problem that is confined to LLVM.

It will be interesting to see how the adoption of LLVM goes. It seems likely to start with FreeBSD, but Linux distributions may eventually try it out as well. There will need to be compelling reasons to do so, but with the progress the compiler suite has made, those reasons may not be all that long in coming. Until correctly working kernels can be built, distributions obviously can't switch over completely. But, a complete switch is not necessarily a requirement, and there is nothing stopping interested distributions from building some of their packages from LLVM today.



to post comments

Distributions looking at LLVM

Posted Mar 22, 2012 7:55 UTC (Thu) by halla (subscriber, #14185) [Link] (2 responses)

LLVM is also used in OpenGTL (http://opengtl.org) -- which is a project that provides a free and unencumbered version of AmpasCTL, which cannot be used in free software because of its weird license, and of Hydra, which is not open at all.

OpenGTL in turn is used by Krita (but there are also bindings for Gegl) to provide the basic building blocks for the HDR colorspaces, as well as filters and pixel generators. It works really, really well, and makes developing new color models or filters really easy. The OpenCTL and OpenShiva "scripts" are compiled to native code once and then execute just as fast as native C++ filter implementations.

Distributions looking at LLVM

Posted Mar 22, 2012 10:43 UTC (Thu) by danieldk (guest, #27876) [Link] (1 responses)

Cool! Let me ask a naive question: does this also mean that e.g. OpenShiva + LLVM can generate different code on the fly if the user has an appropriate GPU available for computation?

Distributions looking at LLVM

Posted Mar 22, 2012 11:47 UTC (Thu) by halla (subscriber, #14185) [Link]

Not a really naive question -- it's not possible yet, but that's just because the relevant backend hasn't been written. There's no built-in limitation, except that we haven't had time! We sort of feel that this would be a perfect project for an advanced student working on thesis :-).

Distributions looking at LLVM

Posted Mar 22, 2012 13:13 UTC (Thu) by jwakely (subscriber, #60262) [Link] (3 responses)

Clang doesn't have a C++ stdlib that works on GNU/Linux. I'm sure that could change, but it's unlikely to be a priority for Apple employees, and LLVM's libc++ is developed by and for Apple.

Switching a distro to use clang-with-libstdc++ doesn't remove the dependency on GCC but adds a dependency on a combination that isn't properly supported by the vendor of clang or by libstdc++ upstream. That combination is likely to be fragile without an influx of libstdc++ contributors who care about clang, so any distro making the switch would need to do the work themselves, rather than expecting upstream GCC maintainers to help a non-copyleft compiler steal our lunch.

Distributions looking at LLVM

Posted Mar 29, 2012 12:47 UTC (Thu) by gowen (guest, #23914) [Link] (2 responses)

Absolutely, this was the showstopper for me, last time I tried clang. Any C++ header that (implicitly) included <type_traits> (IIRC) barfed on the fact that in the g++ headers these are implemented with use g++'s variadic macro support. That rules out (among other things) a fair few STL containers like <map>.

Distributions looking at LLVM

Posted Mar 29, 2012 20:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

FYI, I'm using clang 3.1 (svn snapshot) just fine with Boost versions 1.44.0 through 1.49.0 (Python, Filesystem, Threads, Graph, and other libraries) and the codebase itself uses quite a few STL containers. I haven't tested 3.0 (older Boosts aren't compiling with GCC 4.7.0, so testing is harder) and 2.8 chokes on Boost.

Distributions looking at LLVM

Posted Mar 30, 2012 12:17 UTC (Fri) by jwakely (subscriber, #60262) [Link]

(I assume you mean variadic templates, not macros.)

Some time ago Fedora shipped a version of clang (2.8 IIRC) that was completely incompatible with the system GCC headers, which was silly - they should have shipped an older libstdc++ (alongside the system one) just for use by clang. Clang 3.0 and later has far fewer issues and work well with all but the very bleeding-edgiest libstdc++ headers.

Distributions looking at LLVM

Posted Mar 22, 2012 13:44 UTC (Thu) by tshow (subscriber, #6411) [Link] (5 responses)

I've been using XCode for a while now (doing iOS versions of our games...), and I wouldn't exactly describe the switch from gcc to llvm as a positive one. Granted, some of the problem is that XCode wasn't very good to start with and gets worse with every version, but some of it is definitely llvm as well.

One of the things I keep hearing about llvm is that the error reporting is supposed to be top notch; it's supposed to be really good at pinpointing the actual error instead of pointing to wherever the compiler finally gave up.

Sadly, in practice, llvm is the worst compiler I've worked with for errors since... well, the early 90s, at least. I've had a typo in a variable name in a single location generate a cascade of hundreds of errors, *none* of which referenced the actual syntax error. Leave out a semicolon by accident and you'll see an error cascade. Put 1,0f instead of 1.0f, error cascade.

Somehow none of these trip up gcc at all.

Debugging was a lot more reliable under gcc as well. Since the llvm switch, two times out of three trying to watch a variable brings XCode down in flames.

Code generation seems no better (our games take the same amount of time to build, roughly, and we've seen no measurable performance difference in the games), and we haven't noticed much else different except for the occasional llvm crash.

Perhaps it's better when it isn't running on OSX, but personally I'm filing llvm under "lots of potential, needs to mature".

Distributions looking at LLVM

Posted Mar 22, 2012 21:34 UTC (Thu) by Yorick (guest, #19241) [Link] (4 responses)

My own experience is quite the opposite - Clang's diagnostics are both more useful and precise, and there are also more warnings about potential mistakes. Perhaps you used an old version?

I maintain a medium-sized proprietary application, a few million lines of C and C++ code. We use GCC and are mainly happy with it, but do build with Clang from time to time, just to see if it catches something that GCC didn't - and it often does. I also sometimes use Clang for particularly messy C++ work because of its clearer diagnostics.

Interestingly, Clang builds our code base slower than GCC. This is most likely because our compile times are dominated by a few very large (>100000 lines) machine-generated C files with very large functions, and apparently Clang doesn't handle this quite as well as GCC. For more reasonably-sized files, Clang is faster.

The quality of the generated code for our purposes (branchy integer code, very little FP, nothing vectorisable) is comparable between the compilers - the difference is usually not significant.

Distributions looking at LLVM

Posted Mar 23, 2012 8:54 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

We use GCC and are mainly happy with it, but do build with Clang from time to time, just to see if it catches something that GCC didn't - and it often does.

This is biased selection. Both GCC and Clang have cases where one compiler produces garbage and another gives you nice and clean message, but if you if only run Clang when GCC produced garbage then you are missing cases where GCC gives clean messages and Clang blows up.

Interestingly, Clang builds our code base slower than GCC.

Again: YMMV. Often Clang is faster but in our codebase there are file which GCC compiles in 50seconds with full optimization while Clang needs 9 minutes - that's 10x slowdown (MSVC is two times worse then Clang).

This is most likely because our compile times are dominated by a few very large (>100000 lines) machine-generated C files with very large functions, and apparently Clang doesn't handle this quite as well as GCC.

LOL. Our case is similar, too. Clang and GCC produce code of similar speed and size in the end even if one needs 10x more time then the other.

Distributions looking at LLVM

Posted Mar 23, 2012 14:57 UTC (Fri) by Yorick (guest, #19241) [Link] (1 responses)

This is biased selection. Both GCC and Clang have cases where one compiler produces garbage and another gives you nice and clean message, but if you if only run Clang when GCC produced garbage then you are missing cases where GCC gives clean messages and Clang blows up.

Certainly — swap GCC and Clang, and my statement would have been equally valid. I'm happy that we have not just one but two free compilers of very high quality, that implement most of the same language extensions and even take the same command-line options.

We do run both GCC and Clang with -Wall -Werror, by the way, forcing ourselves to fix even minor complaints from either compiler. This has proven very effective.

Distributions looking at LLVM

Posted Mar 23, 2012 19:33 UTC (Fri) by oak (guest, #2786) [Link]

It's nice to hear that Clang has significantly improved!

I've used GCC v4.4 and Clang v1.1 / LLVM v2.7 in Debian stable to compile largish C programs and with these old versions GCC provides much superior detection of issues (while the errors Clang reports are more readable).

Which versions of GCC and Clang/LLVM you were using?

Btw. I would recommend adding quite a few extra warning options to GCC & Clang as -Wall misses quite a few things that can go wrong. For example: -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Wcast-qual -Wbad-function-cast -Wpointer-arith -Wwrite-strings -Wformat-security -Wshadow.

Distributions looking at LLVM

Posted Mar 30, 2012 16:35 UTC (Fri) by mlopezibanez (guest, #66088) [Link]

My own experience is quite the opposite - Clang's diagnostics are both more useful and precise, and there are also more warnings about potential mistakes. Perhaps you used an old version?
I would encourage you and everybody else to report bugs for GCC diagnostics that you find to be worse than Clang's. GCC diagnostics have improved a LOT in the last few releases. In the order of hundreds of patches and probably close to a hundred bugs fixed per release.

Fortunately, many issues are quite trivial, and they can be fixed by changing one line. Unfortunately, the entry barrier for submitting a one-line patch is so huge that very few external contributors ever suggest such patches.

Unfortunately x 2, some issues are not so easy to solve and there are some known limitations of GCC diagnostics that makes it look worse than Clang. Overcoming these limitations does not seem to be a priority to GCC maintainers. It is true that despite the bitching in blogs and forums about how awful GCC diagnostics are, the developers don't actually see that many reports about it. But I think if enough people reported the problems that they find, something will eventually be done about it.

Distributions looking at LLVM

Posted Mar 22, 2012 19:49 UTC (Thu) by jhhaller (guest, #56103) [Link] (3 responses)

The worst thing a compiler can do is generate incorrect code.

I recently saw a case where clang generated incorrect code when a novice programmer used something like "for(d=0.0; d < 5.0; d++)", and d had the wrong value in the loop. Now, I would never use a double as a loop index, but I would rather see an error than wrong code. If all that is done to validate a compiler is to be sure it compiles things properly, this kind of problem won't be found until someone tries to use the software with incorrect code generation. That's not to say one wouldn't have the same issue moving from one release of gcc to another, just that any change in compilers can have unexpected consequences.

Distributions looking at LLVM

Posted Mar 22, 2012 21:47 UTC (Thu) by NAR (subscriber, #1313) [Link] (2 responses)

What was the invalid value? 4.999999 instead of 5.0?

Distributions looking at LLVM

Posted Mar 22, 2012 21:53 UTC (Thu) by jhhaller (guest, #56103) [Link] (1 responses)

No, it was one less than it should have been, like it waited to do the ++ part of d++ until after the loop iteration rather than before, or used the wrong register inside the loop. ++d worked fine.

Distributions looking at LLVM

Posted Mar 23, 2012 14:48 UTC (Fri) by jezuch (subscriber, #52988) [Link]

Looks like a corner case triggered by code no one would ever write. Did you file a bug report? I would also point out that GCC has tons of bugs like this fixed in every minor release, so it's nothing special (yet).

Building code with multiple compilers

Posted Mar 23, 2012 4:56 UTC (Fri) by pflugstad (subscriber, #224) [Link]

I'm firmly of the opinion that you should try and build your code with as many compilers as possible - GCC (old and new - yes, I've got a GCC 2.95 variant around), ICC, MSVC, etc... Each one has it's strengths and weaknesses and each warns about different things. At the end of the day, you end up with better code, which should be everyone's goal.

Some people prefer investing in GPL licensed software instead

Posted Mar 23, 2012 15:06 UTC (Fri) by walex (guest, #69836) [Link]

While for some people like the BSD distributions the non-GPL license of LLVM is an advantage, for people like me it is the opposite: non-GPL/non-copyleft licensed software is something I would rather not invest my own time on.

Indeed I would be probably using one of the BSD based distributions if they were GPL licensed, but I use GNU/Linux in large part because it is GPL-licensed, and it makes me unhappy that there is no practical GPL licensed alternative to the X Window System.

BTW there is another compiler suite that is BSD licensed, the British TenDRA suite, which generates ANDF.

Distributions looking at LLVM

Posted Mar 23, 2012 20:30 UTC (Fri) by scientes (guest, #83068) [Link] (3 responses)

> Apple LLVM is fast. It compiles code twice as quickly as GCC, yet produces applications that also run faster.

So Apple is behind the FUD and misinformation.

Distributions looking at LLVM

Posted Mar 23, 2012 21:37 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

Well, kinda. Apple does not lie, the problem with Apple's “facts” is tiny GPLv3-related wrinkle: they consider GPLv3 so poisonous they don't ever touch GCC 4.3+. And of course when they compare Apple's LLVM they compare it with Apple's GCC. This means all these comparisons are with five-year old version of GCC!

Of course applefans don't know (or don't care) about that fact - that's where FUD comes from.

It is conceivable that eventually Clang will win comparison with GCC even in apples-to-apples style - but this battle will not be easy or fast.

Distributions looking at LLVM

Posted Mar 23, 2012 22:06 UTC (Fri) by scientes (guest, #83068) [Link] (1 responses)

> this battle will not be easy or fast.
especially when they are probably running on hardware that wasn't manufactured 5 years ago, and in that case LLVM probably compiles for CPU features and architecture optimizations that the five-year-old GCC 4.2 couldn't possibly know about.

Distributions looking at LLVM

Posted Mar 26, 2012 11:10 UTC (Mon) by jwakely (subscriber, #60262) [Link]

Apple's GCC isn't 5 years old, it is forked from code that is 5 years old, but they've surely made changes in that time, and could have added support for newer processors.

Distributions looking at LLVM

Posted Mar 23, 2012 23:31 UTC (Fri) by PaXTeam (guest, #24616) [Link] (8 responses)

> Jeff Garzik said that he has been working on it, but "LLVM still needs
> several obscure compiler changes before we can even boot a no-op kernel".

i don't know what he's been doing, but i've been compiling linux with clang for almost 2 years now and since clang v3.0+few commits it can be done without any clang/llvm patches, only the linux side needs patching.

Distributions looking at LLVM

Posted Mar 23, 2012 23:43 UTC (Fri) by mstefani (guest, #31644) [Link] (5 responses)

I think he tries to use it with sparse as the front-end.

Distributions looking at LLVM

Posted Mar 24, 2012 8:41 UTC (Sat) by PaXTeam (guest, #24616) [Link] (4 responses)

but that means that the actual code optimization/generation part (llvm) is the same between clang and his sparse based work, so if the former can produce a working kernel (and has been able to do so for quite some time now) then his tool should be able to as well. sure, there're some gcc/gas features that clang/llvm don't have yet but they can be patched around on the linux side (my diff is about 60kB, 60 files changed, 258 insertions(+), 213 deletions(-)).

Distributions looking at LLVM

Posted Mar 24, 2012 18:54 UTC (Sat) by rahulsundaram (subscriber, #21946) [Link] (3 responses)

Instead of speculating, you can just send a email and coordinate.

Distributions looking at LLVM

Posted Mar 24, 2012 21:45 UTC (Sat) by PaXTeam (guest, #24616) [Link] (2 responses)

err, i'm not interested in sparse ;).

Distributions looking at LLVM

Posted Mar 25, 2012 1:09 UTC (Sun) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

You were wondering what he was doing in your first comment in this thread. So I assumed you were interested in the answer.

Distributions looking at LLVM

Posted Mar 25, 2012 15:12 UTC (Sun) by PaXTeam (guest, #24616) [Link]

does rhetorical question mean anything to you? :P obviously if clang/llvm work fine (for me) then the problems must be somewhere else. now that it turned out to be the sparse/llvm interface, i consider the matter settled.

Distributions looking at LLVM

Posted Mar 25, 2012 20:28 UTC (Sun) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

That may represent a difference in emphasis. If your main goal is to get the compile working, you may be willing to patch whatever gets you there with the least effort, whether it's the compiler or the kernel source. But if your main interest is in improving the compiler so it can compile anything that GCC can, you have to stick with a vanilla kernel source and keep patching the compiler until it works.

Distributions looking at LLVM

Posted Mar 25, 2012 22:53 UTC (Sun) by PaXTeam (guest, #24616) [Link]

> But if your main interest is in improving the compiler so it can compile anything that GCC can[...]

clang developers do not want to achieve this, see http://clang.llvm.org/compatibility.html and http://clang.llvm.org/docs/UsersManual.html for some of the details (linux gets bitten by VLAs in structures and nested functions, among others).


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds