By Jonathan Corbet
April 14, 2010
Your editor has recently noticed a string of interesting announcements and
discussions in the GCC and LLVM compiler communities. Here is an attempt
to pull together a look at a few of these discussions, including resistance
to cooperation between the two projects, building an assembler into LLVM,
and more.
The up-and-coming LLVM compiler has been an irritation to some GCC
developers for some time; LLVM apparently comes off as an upstart trying to
muscle into territory which GCC has owned for a long time.
So it's not surprising that occasionally the
relationship between the two projects gets a little frosty.
Consider the case of DragonEgg, a
GCC plugin which replaces the bulk of GCC's optimization and
code-generation system with the LLVM implementation. DragonEgg is clearly
a useful tool for LLVM developers, who can focus on improving the backend
code while making use of GCC's well-developed front ends.
Jack Howarth recently proposed the addition
of DragonEgg as an official part of the GCC code base. Some developers
welcomed the idea; Basile Starynkevitch, for example, thought it would make a good plugin example.
But from others came complaints like this:
So, no offense, but the suggestion here is to make this subversive
(for FSF GCC) plugin part of FSF GCC? What is the benefit of this
for GCC? I don't see any. I just see a plugin trying to piggy-back
on the hard work of GCC front-end developers and negating the
efforts of those working on the middle ends and back ends.
It's not clear that this is a majority opinion;
some GCC developers see DragonEgg as an
easy way to try out LLVM code and
compare it against their own. If LLVM comes out on top, GCC developers can
then figure out why or, possibly, just adopt the relevant LLVM code. Those
developers see only benefit in some cooperative competition between the
projects.
Others, though, see the situation as more of a zero-sum game; when viewed
through that lens, cooperation with LLVM would appear to make little
sense. But free software is not a zero-sum game; the more we can learn
from each other, the better off we all are. GCC need not worry about being
displaced by LLVM (or anything else) any time in the near future. Barring
technical issues with the merging of DragonEgg (and none have been
mentioned), accepting the code seems like it should be ultimately
beneficial to the project.
In a side discussion, GCC developers wondered why LLVM seems to be more
successful in attracting developers and mindshare in general. One
suggestion was that LLVM has a clear leader who is able to set the
direction of the project, while GCC is more scattered. Others have a
different view; in this context, Ian Lance Taylor's notes are worth a look:
What I do see is that relatively few gcc developers take the time
to reach out to new people and help them become part of the
community. I also see a lot of external patches not reviewed, and
I see a lot of back-and-forth about patches which is simply
confusing and offputting to those trying to contribute. Joining
the gcc community requires a lot of self-motivation, or it takes
being paid enough to get over the obstacles.
There is also the matter of the old code base, the lack of a clean
separation between passes, and, most important, weak internal
documentation.
Some of these issues are being fixed; others will take longer. It seems
clear that attending to these problems is important for the long-term
future of the project.
Lest things look too grim, though, it's worth perusing this
posting from Taras Glek on his success with the GCC "profile-guided
optimization" (PGO) feature. PGO works by instrumenting the binary, then
rebuilding the program with optimization driven by the profile
information. With Firefox, Taras was able to cut the startup time by one
third and to reduce initial memory use considerably as well. Taras says:
I think the numbers speak for themselves. Isn't it scary how
wasteful binaries are by default? It amazes me that Firefox can
shrug off a significant amount of resource bloat without changing a
single line of code.
There's no shortage of interesting, development-oriented tools being
integrated into GCC, and the addition of the plugin architecture can only
result in an acceleration of this process. Things have reached a point
where more projects should probably be looking into the use of these tools
to improve the experience for their users.
Meanwhile, on the LLVM side, the developers have recently unveiled the LLVM
MC project. "MC" stands for "machine code" in this context; in short,
the LLVM developers are trying to integrate the assembler directly into the
compiler. There are a number of reasons for doing this, including
performance (formatting text for a separate assembler and running that
assembler are expensive operations), portability (not all target systems
have an assembler out of the box), and the ability to easily add support
for new processor instructions. Much of this functionality is required
anyway for LLVM's just-in-time compiler features, so it makes sense to just
finish the job.
This work appears to be fairly well advanced, with much of the basic
functionality in place. Chris Lattner says:
If you're interested in this level of the tool chain, this is a
great area to get involved in, because there are lots of small to
mid-sized projects just waiting to be tackled. I believe that the
long term impact of this work is huge: it allows building new
interesting CPU-level tools and it means that we can add new
instructions to one .td file instead of having to add them to the
compiler, the assembler, the disassembler, etc.
In summary: there is currently a lot going on in the area of development
toolchains. Given that all of us - including those who do no development -
depend on those toolchains, this can only be a good thing. Computers can
do a lot to make the task of programming them easier and more robust;
despite the occasional glitch, developers for both GCC and LLVM appear to
be working hard to realize that potential.
(
Log in to post comments)