LWN.net Logo

GCC unplugged

GCC unplugged

Posted Nov 20, 2007 7:55 UTC (Tue) by jordanb (subscriber, #45668)
Parent article: GCC unplugged

So I've reread this piece and I'm having a really hard time trying to see author's perspective
on this. 

First, I've yet to see a plugin system that allows plugins to be inserted at "arbitrary"
rather than predetermined points in the system. All plugin systems necessarily will be limited
to giving access to those points where the programmers set up the infrastructure. In some
systems, there are places where lots of flexibility make obvious sense, device drivers for
kernels, for instance. Where does such flexibility make sense for compilers? For the most
part, it seems to me, if something like static analysis would be improved by new functionality
at a particular point then perhaps that functionality should just be put in the compiler.

So even with a plugin ssytem, I'm having a lot of trouble imagining anything but a relatively
sparse set of plugins at each point, doing things the compiler should probably have built in.
I'm also imagining having to satisfy plugin requirements on top of GCC version and library
requirements to compile something. That doesn't exactly give me warm fuzzies.

There is a point where modularity makes a lot of sense in a compiler: putting language
specific stuff in a modular front-end, hardware specific stuff in a modular back-end, and the
transcendental stuff in a common middle-end, allowing the compiler to be used for arbitrary
languages on arbitrary hardware.

But GCC already does that.

The editor gives evidence that GCC code is poorly organized and poorly documented. But the
developers apparently don't disagree, and their solution isn't to throw another big system on
top to hide the mess.. but to clean up and document the code. Is there something I'm missing
somewhere? Because that seems like an staggeringly sensible idea. And frankly, without the
editor's rather unsubstantiated argument for plugins, this article would have been "GCC code
needs cleaned up, developers agree." I bet that wouldn't have been quite so long.

From what I've seen of the GCC build process -- and correct me if I'm wrong -- it is "known to
be challenging" because it's a self-hosting compiler with a rather impressive testing suite.
How is making the build process less "challenging" anything but skimping on testing? Frankly
as a user of GCC, I *like* it that every change has to build itself and get wringed through
the testing suite. Providing a means to avoid that isn't the path towards a better working
compiler.


(Log in to post comments)

GCC unplugged

Posted Nov 20, 2007 8:35 UTC (Tue) by JamesErik (subscriber, #17417) [Link]

"Where does such flexibility make sense for compilers?"

I respectfully submit that you're not thinking outside the box enough.  An example I know of
that would benefit from a pluggable GCC is GCC-XML (http://www.gccxml.org), which has the
(limited, last I looked) capability of exposing part of the GCC internal representation as
XML.  No, I'm not an XML fanboy, but the wide array of XML tools that can be applied make code
transformations a great deal more **accessible to the average developer**.  I understand the
GCC-XML team had very deliberate push-back from the core GCC devs on integrating their work
precisely because of the "easing proprietary extensions" specter.  As a result, GCC-XML is
indeed a largely duplicate download of plain GCC.

This is just one example, though.  Perhaps you'd want to make better integration with a
debugger, farm out pieces to a co-processor, generate some fancy-schmancy metrics and graphs
for the suits... and on and on.  I don't think either of us can imagine all the possibilities.

All this notwitstanding, sure, an internal cleanup and better documentation would go a long
way towards attracting development energy.  It is, no doubt, "a staggeringly sensible idea,"
but other efforts to make GCC development accessible more approachable are pretty far along
the sensibility curve in my book, too.

GCC unplugged

Posted Nov 20, 2007 14:28 UTC (Tue) by Nelson (subscriber, #21712) [Link]

What are some meaningful tasks people are doing with GCC-XML? I always thought it was more about providing information as to the internal state of the compiler, not providing a new IR to perform analysis on and inject back in to the compiler. Almost like something an IDE might use. That's a substantially different goal than letting someone inject XML back in to GCC. What are all these XML tools that are supposed to do that in a good way also? I have seen a lot of parsers but not a lot of tools that can do the sorts of things you need to do to IR.

The whole plug-in thing is sexy in a way, you always have a tool to solve the problems you didn't think about or don't want to solve. It's astonishingly difficult to do well though, eclipse is an example of something built with that in mind and if you use a good sized set of plugins, you're always screwed when they roll the version of eclipse. There is a certain cost you have to put in to a project to get in merged in, with a plugin, that cost is lower and the likelihood that you'll abandon the code is higher, I think it's very common for "plugins" to be abandoned. The thing is, in a healthy open source project, it's alive and on going, you have access to the code and can change it to fix those unseen problems.

GCC front ends makes sense, and it's possible. If you play by the right rules, you can get your front-end incorporated in to the compiler distribution (D, Modula-3, Pascal are examples of them not doing that, and look at where they are now...) As for code generation and optimization, why wouldn't you want that included in the compiler?

The big thing is if you want to play with GCC you have to be a team player, there are rules to follow, you can't just hack out some code, throw it out and expect the GCC guys to jump on it, clean it up and incorporate it. These discussions always revolve around people following the community rules, look are Reiserfs 4, same issue, it's not like Linus and company didn't want it in, Hans just wouldn't/couldn't follow the rules.

GCC unplugged

Posted Nov 20, 2007 15:07 UTC (Tue) by dcoutts (subscriber, #5387) [Link]

What are some meaningful tasks people are doing with GCC-XML?

FFI tools for other high level languages. For example c2hs is a tool that reads .h files and generates Haskell code that imports the foreign C functions at the correct type. This guarantees cross-language type safety which is a pretty big deal.

At the moment c2hs has to use it's own hand-written C parser that understands all the GNU C extensions. This was not easy to make and it's not easy to maintain. Having a proper intermediate format generated by gcc would allow us to parse GNU C code easily and providing additional things like correct calculation of sizes of C types etc.

GCC unplugged

Posted Nov 20, 2007 18:11 UTC (Tue) by aleXXX (subscriber, #2742) [Link]

> What are some meaningful tasks people are doing with GCC-XML? 

Kitware (the CMake developers) wrote it and I think they use it to 
generate bindings to scripting languages for their libraries.

Alex

GCC unplugged

Posted Nov 20, 2007 10:20 UTC (Tue) by nix (subscriber, #2304) [Link]

The passes system would actually make it, hm, `more possible' to add plugins. It's just that
not every significant transform that affects properties of the IR is reflected in properties
visible to the passes system, and it would be really easy for a plugin author to screw up the
data flowing through the translation pipeline in subtle ways.

At the very least a pipeline author should (just as the author of any other GCC change) run
the testsuite after changing anything significant. And that takes hours, much longer than
recompiling the compiler itself, especially if you avoid building/testing libjava. If you're
doing that, yo may as well build the compiler too, negating the advantage of plugins.

Plus, a lot of things are still implemented by means of target-specific macros rather than
function pointers; those are not used so much in the front- and middle-end code, but that
doesn't mean they're not used at all. (This is a good ideal state to aim for, though its speed
impact if everything was so translated might be substantial. Some of those macros are called a
*lot*. We probably need memory-efficient IPA before this is practical, so we can turn them
into functions and then inline them across translation units, although even that would fail if
they had to be replaceable at runtime... right now compiling GCC with --combine runs every
machine I have access to out of memory, so that's not ready for prime time yet. Mind you my
largest box has only 1.5Gb of RAM and some of that is used for other things.)


GCCXML is the largest plus point of all this, really. I've often wanted to be able to suck
data out of GCC's IR, because GCC has already done a lot of analysis of the source for me, and
I'd rather not reproduce all that work when writing little source code auditing and
invariant-checking tools. RMS's concerns about proprietary consumers of that data are
increasingly irrelevant, IMNSHO: there are lots of proprietary systems which do similar things
already, and anyone wanting to use GCC to do it and with the skill to know what they're asking
for has the skill to add an export-bits-of-IR feature to GCC --- which they are of course
allowed to do, under the GPL. So this is really security-by-obscurity on RMS's part, which
depends on the insides of GCC remaining too obscure to easily modify: and obscure and
hard-to-modify internals are exactly what the GCC maintainers *don't* want (or at least I hope
they don't).

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds