|
|
Subscribe / Log in / New account

Continuity problems

Continuity problems

Posted Mar 22, 2012 17:05 UTC (Thu) by jd (guest, #26381)
Parent article: GCC celebrates 25 years with the 4.7.0 release

The egcs vs. gcc fiasco comes to mind, but IIRC there have been a number of major reworkings. Certainly, there is an unbroken lineage from the original release to the present day - that's indisputable. Equally, it's indisputable that GCC is one of the most popular and powerful compilers out there. By these metrics, the claims are entirely correct.

However, having said that, the modern GCC wouldn't pass the "heraldry test" and there have been more than a few occasions when politics have delayed progress or disrupted true openness. The first of these is really a non-issue unless GCC applies for a coat of arms, but the second is more problematic. As GCC grows and matures, the more politics interferes, the more likely we are to see splintering.

Indeed, rival FLOSS compiler projects are taking off already, suggesting that the splintering has become enough of a problem for other projects to be able to reach critical mass.

Personally, I'd like to see GCC celebrate a 50 year anniversary as the top compiler. Language frontend developers can barely keep up with GCC, they won't be able to keep up with other compilers as well. Maximum language richness means you want as few core engine APIs as possible where the APIs have everything needed to support the languages out there. GCC can do that and has done for some time, which makes it a great choice.

But the GCC team (and the GLibC team) could do with being less provincial and more open. Those will be key to the next 25 years.


to post comments

Continuity problems

Posted Mar 22, 2012 17:17 UTC (Thu) by josh (subscriber, #17465) [Link] (17 responses)

I agree entirely. Personally, I think GCC would benefit massively from a better culture of patch review. As far as I can tell, GCC's development model seems designed around having contributors do enough work to justify giving them commit access, and *then* they can actually contribute. GCC doesn't do a good job of handling mailed patches or casual contributors.

On top of that, GCC still uses Subversion for their primary repository, rather than a sensible distributed version control system. A Git mirror exists, but gcc.gnu.org doesn't point to it anywhere prominent. As a result, GCC doesn't support the model of "clone, make a series of changes, and publish your branch somewhere"; only people with commit access do their work on branches. And without a distributed system, people can't easily make a pile of small changes locally (as separate commits) rather than one giant change, nor can they easily keep their work up to date with changes to GCC.

Changing those two things would greatly reduce the pain of attempting to contribute to GCC, and thus encourage a more thriving development community around it.

(That leaves aside the huge roadblock of having to mail in paper copyright assignment forms before contributing non-trivial changes, but that seems unlikely to ever change.)

Continuity problems

Posted Mar 22, 2012 18:51 UTC (Thu) by Lionel_Debroux (subscriber, #30014) [Link] (2 responses)

Another thing that would reduce the pain to contribute to GCC is a code base with a lower entry barrier. Despite the introduction of the plugin architecture in GCC, which already lowered it quite a bit, the GCC code base remains held as harder to hack on, less modular, less versatile than the LLVM/Clang code base.

The rate of progress on Clang has been impressive: self-hosting occurred only two years ago, followed three months later by building Boost without defect macros, and six months later by building Qt. On the day g++ 4.7 is released, clang++ is the only compiler whose C++11 support can be said to rival that of g++ (clang++ doesn't support atomics and forward declarations for enums, but fully supports alignment).

GCC isn't alone in not having switched to DVCS yet: LLVM, and its sub-projects, haven't either... However, getting commit access there is quite easy, and no copyright assignment paperwork is required.

Continuity problems

Posted Mar 23, 2012 3:20 UTC (Fri) by wahern (subscriber, #37304) [Link] (1 responses)

I've delved into both GCC and clang to write patches, albeit simple ones. GCC is definitely arcane, but both are pretty impenetrable initially. You can glance at the clang source code and fool yourself into thinking it's easy to hack, but there's no shortage of things to complain about.

Compiler writing is extremely well trodden ground. It shouldn't be surprising that it's fairly easy to go from 0-60 quickly. But it's a marathon, not a sprint. The true test of clang/LLVM is whether it can weather having successive generations of developers hack on it without turning into something that's impossible to work with. GCC has clearly managed this, despite all the moaning, and despite not being sprinkled with magic C++/OOP fairy dust. The past few years have seen tremendously complex features added, and clang/LLVM isn't keeping apace.

And as far as C++11 support, they look neck-and-neck to me:

http://clang.llvm.org/cxx_status.html
http://gcc.gnu.org/projects/cxx0x.html

Continuity problems

Posted Mar 25, 2012 0:49 UTC (Sun) by nix (subscriber, #2304) [Link]

I'd say that by about the time of the egcs fork GCC was close to impossible to hack on. Cruft piled on cruft. However, starting in the 3.x eras a determined effort was made to fix this (still underway: tree-ssa was a massive improvement, but it couldn't have been done without other cleanups), and it is now very much nicer.

Continuity problems

Posted Mar 22, 2012 20:59 UTC (Thu) by james_ulrich (guest, #83666) [Link] (13 responses)

Having lurked around gcc for close to 5 years, it seems to me like whole patch review culture simply stems from the fact that there is not a single person who cares about the compiler as a whole. Sure individuals care about their passes (IRA stands out here as being well taken care of), but seldom as to what happens outside of it. And when people do do reviews it very much feels like the response is just "I'll just ack it to get you off my back" which obviously doesn't do wonders for quality. Unless a Linus-of-GCC person comes along, I don't see much long-term future for the project.

The recent decision to move the project code base to C++ is also something that I think will actually hurt them badly in the long run. The GCC code base is very hard to read as-is and moving it to a language that is notorious for being hard to read and understand will not make things any better. (I'm well aware that some amazing pieces of code have been written in C++, but it is not a simple fix to the code cleanliness problem)

Continuity problems

Posted Mar 22, 2012 21:18 UTC (Thu) by HelloWorld (guest, #56129) [Link] (8 responses)

The GCC code base is very hard to read as-is and moving it to a language that is notorious for being hard to read and understand will not make things any better.
The GCC code base is actually a perfect example of things being convoluted because of missing functionality in the C language. C++, when used in a sensible way, is a way to fix this.

Continuity problems

Posted Mar 22, 2012 22:05 UTC (Thu) by james_ulrich (guest, #83666) [Link] (4 responses)

You mean C++ would magically make 2000+ line functions with variable declarations spanning over 50 lines easy to read? I think there are much lower hanging fruit in making the GCC code base readable before throwing C++ at it would be beneficial.

The decision to go with C++ seems (to me, an outside observer) to have been driven firstly by some people (I remember Ian Lance Taylor's name, but there where others pushing) "because I like to code in C++", rather than there being a pressing needed feature that would make the code clearer.

Continuity problems

Posted Mar 22, 2012 23:30 UTC (Thu) by elanthis (guest, #6227) [Link] (1 responses)

> You mean C++ would magically make 2000+ line functions with variable declarations spanning over 50 lines easy to read? I think there are much lower hanging fruit in making the GCC code base readable before throwing C++ at it would be beneficial.

Recompiling existing crappy C code with a C++ compiler does no such thing. It may very well provide the tools to rewrite that functions in a readable, sane way that C cannot easily do.

The one clear winner in C++ is data structures and templates. I cannot stress the importance of that enough.

The second you have to write a data structure that uses nothing but void* elements, or which has to be written as a macro, or which has to be copied-and-pasted for every different element type, you have a serious problem.

GCC is a heavy user of many complex data structures, many of which are written as macros. Compare this to the LLVM/Clang codebase, where such data structures are written once in clean, readable, testable, debugging C++ code, and reused in many places with an absolute minimum of fuss or danger.

I present you with the following link, which illustrates a number of very useful data structures in LLVM/Clang that are used all over the place, and which either do not exist, exist but are a bitch to use correctly, or which are copy-pastad all over the place in GCC:

http://llvm.org/docs/ProgrammersManual.html#datastructure

Continuity problems

Posted Mar 23, 2012 6:36 UTC (Fri) by james_ulrich (guest, #83666) [Link]

I can see that the structures and constructs used in compilers lends itself very well to the features of C++.

My point is that the reason GCC is a mess is not because it is written in C. Even with C++, 2000 line functions need to be logically split, and 20 line if() statements with 5 levels deep subexpression nesting also need to be split to make it readable. These, and other, de-facto coding style idiosyncrasies need to be fixed (or at least agreed upon not to write code like that), which is in no way affected by the C/C++ decision.

GCC also has this "property", let's say, that code is never actually re-written, only new methods added in parallel to the old ones. Classic examples are the CC_FLAGS/cc0 thing and best of all reload. Everyone knew it sucked 15 years ago, yet only now are motions made in the form of LRA to replace it (which, BTW, are in now way motivated by using C++). The same can be said for the old register allocator, combine, etc. I somehow doubt that C++ alone would magically motivate anyone to start rewriting these old, convoluted but critical pieces.

Based on past observations my prediction for GCC-in-C++ is that all the old ugly code will simply stay, the style will not really change, but now it will ugly code mixed with C++ constructs.

Continuity problems

Posted Mar 23, 2012 2:00 UTC (Fri) by HelloWorld (guest, #56129) [Link]

I would have responded to your posting, but elanthis was faster at making my point.

Continuity problems

Posted Mar 23, 2012 3:31 UTC (Fri) by wahern (subscriber, #37304) [Link]

Indeed. If you read the clang source code, instead of having 2000 line functions, you have things implemented with something approximating 2000, single line functions. Both are impenetrable. Where GCC abuses macros, clang/LLVM abuses classing and casting. (You wouldn't think that possible, but analyze the clang code for awhile and you'll see what I mean.)

Continuity problems

Posted Mar 25, 2012 0:56 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

I'm sorry, this is nonsense. As James Ulrich points out, the convolution in GCC has nothing to do with the implementation language.

Its biggest problem -- still pervasive in the RTL side of things -- was always global, unstated assumptions, often assumptions wired into target machine description files, RTL optimization passes, and reload. Often an RTL optimization pass would assume (or would grow to assume over years, accidentally) some property of md files that was true for all existing md files but not necessarily true, and then reload would come to depend on the form of the RTL emitted by optimization files when when that property was true. Some of these properties are much nastier than CC0 and can't be grepped for -- and fixing them requires understanding a lot of targets, and *testing* them.

This is slowly being sorted out as more machinery migrates into the tree-ssa side of things, and as older and cruftier targets are slowly decommissioned. But it's a slow, slow job, and it would be every bit as slow regardless of the implementation language. (This is one reason why reload has been such a monster to dump and replace: it's where all these unstated assumptions go to roost. Break just one of them and you might find yourself with wrong code on a couple of random targets you'd never heard of, and the poor sod who finds this is going to have the devil of a time tracking it down to your change.)

Continuity problems

Posted Mar 25, 2012 8:49 UTC (Sun) by james_ulrich (guest, #83666) [Link] (1 responses)

Reload has all these "assumptions" because it is such a hughe big convoluted mess. Also, it is a corner stone in the whole compilation process, so you can't just throw it out. And hence, instead of fixing reload when some bug pops out, all the passes around it are fixed just so that reload doesn't need to be touched.

While I had my share of struggling with reload, by far most of the annoying problems with RTL passes was that lot's of them completely ignore some MD feature or other -- it can not even be said that this is old cruft hanging around: even the web construction pass has issues, which is supposed to serve as an example! This is a review problem.

Continuity problems

Posted Mar 26, 2012 19:32 UTC (Mon) by nix (subscriber, #2304) [Link]

Yeah, but part of the problem with reload being a convoluted mess is that for many years earlier in GCC's history, when some target needed something done, half the time it was done in the md file and half the time it was done by hacking reload so that it did *just* the right thing for some shape of RTL that only that target would produce. And, of course, this was often not documented. Hello unstated assumptions and fragility.

(This is not to say that you are wrong in any way. There are other problems too, not least the 'it works on major targets' RTL, it must be complete' problem you mention...)

Continuity problems

Posted Mar 22, 2012 23:59 UTC (Thu) by slashdot (guest, #22014) [Link] (1 responses)

It's hard to read BECAUSE it is not in C++, obviously.

Although the fundamental reason it's hard to read is because it's not fully a library like LLVM/Clang, so they don't need to write clean reusable code with documented interfaces, and it shows.

The real question is: does it make sense to try to clean up, modularize and "C++ize" gcc?

Or it is simpler and more effective to just stop development on GCC, and move to work on a GPL or LGPL licensed fork of LLVM, porting any good things GCC has that LLVM doesn't?

Continuity problems

Posted Mar 23, 2012 6:59 UTC (Fri) by james_ulrich (guest, #83666) [Link]

Why does everyone pass around using C++ as some magic bullet that fixes all ugliness now and forever? It doesn't and it never will. The only lesson to be learnt from LLVM is that, when *starting from scratch*, a compiler can be well written in C++. Extrapolating that to "GCC's main problem is is not being written in C++ and doing so will fix all our problems" is plain idiotic.

Even if you start coding in C++, you still need to think about how to split long functions, ridiculous if() statements and make other general ugliness clearer. Take this (random example, there are much worse ones):

if (REG_P (src) && REG_P (dest)
&& ((REGNO (src) < FIRST_PSEUDO_REGISTER
&& ! fixed_regs[REGNO (src)]
&& CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))))
|| (REGNO (dest) < FIRST_PSEUDO_REGISTER
&& ! fixed_regs[REGNO (dest)]
&& CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest))))))

How exactly will C++ make this more obvious? Ofcourse it won't.

And, no, GCC not being a library is not it its main problem either.

Continuity problems

Posted Mar 23, 2012 4:06 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Rewriting stuff in another language is actually a good way to clean up the code.

Which gcc badly needs.

Continuity problems

Posted Mar 23, 2012 13:27 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Allow the cynic in me to make a possibly unfounded comment:

For some of GCC's ugliness, more of the improvement may come from the "rewrite" part than the "in C++" part. The "in C++" part just encourages a more thorough refactoring and rethinking of the problem, than a superficial tweaking-for-less-ugly.

In any case, nothing will fix GNU's ugly indenting standards as long as the language has a C/C++ style syntax. ;-)

"Heraldry test?"

Posted Mar 22, 2012 22:26 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (4 responses)

I'm sorry, I didn't catch the reference. What is the "heraldry test"?

"Heraldry test?"

Posted Mar 23, 2012 2:26 UTC (Fri) by ghane (guest, #1805) [Link] (3 responses)

In this case the current GCC is from the bastard (and disowned) son of the family (EGCS), who took over the coat of arms when the legitimate branch of the family died out, and was blessed by all. http://en.wikipedia.org/wiki/GNU_Compiler_Collection#EGCS...

My grandfather's axe, my father changed the handle, I changed the blade, but it is still my grandfather's axe.

--
Sanjeev Gupta

"Heraldry test?"

Posted Mar 23, 2012 3:40 UTC (Fri) by flewellyn (subscriber, #5047) [Link] (2 responses)

Ahhhh, thank you. I knew the history of GCC, but didn't connect it to that metaphor. Thanks very much.

"Heraldry test?"

Posted Mar 23, 2012 20:43 UTC (Fri) by JoeBuck (subscriber, #2330) [Link] (1 responses)

The use of the metaphor is mistaken. Much of the code in the first egcs release that wasn't in GCC 2.7.x was already checked in to the FSF tree, and merges continued to take place back and forth. Thinking that GCC was somehow a completely new compiler with the same name after the EGCS/GCC remerger is just wrong. Furthermore it was the same people developing the compiler before and after. What really happened was that there was a management shakeup.

"Heraldry test?"

Posted Mar 25, 2012 1:03 UTC (Sun) by nix (subscriber, #2304) [Link]

Quite. It wasn't even 'depose the king, he goes into exile'; kenner is *still* contributing to GCC now and then, and was contributing to egcs even while he was also maintaining trunk GCC.

Continuity problems

Posted Mar 23, 2012 3:59 UTC (Fri) by daglwn (guest, #65432) [Link] (11 responses)

> Equally, it's indisputable that GCC is one of the most popular and
> powerful compilers out there.

Popular? Maybe. Powerful? No way. There are many compilers out there that beat the pants off gcc in performance of generated code and have better standards support.

I'm not knocking the gcc guys. In terms of target support they are leaps and bounds above everyone else. But let's not kid ourselves. The claims about being first to provide an "architecture neutral" (whatever that means) vectorizer and OpenMP implementation are laughable. Other compilers have been doing that for decades.

Again, gcc is a great project but there needs to be a little dose of reality here.

Continuity problems

Posted Mar 23, 2012 8:27 UTC (Fri) by Pawlerson (guest, #74136) [Link] (8 responses)

What compilers do you mean? I hope you didn't mean clang/llvm which isn't even as half as powerfull.

Continuity problems

Posted Mar 23, 2012 16:15 UTC (Fri) by daglwn (guest, #65432) [Link] (7 responses)

The Intel compiler. The PGI compiler. The Pathscale compiler. That's just a few.

Continuity problems

Posted Mar 23, 2012 17:56 UTC (Fri) by khim (subscriber, #9252) [Link] (6 responses)

They may be faster (in some cases… it all depends on the code in question very much), but as far as “better standards support”… let's check: Intel compiler - 17 “N/A” out of 39 (vs 4 “N/A” out of 39 for GCC), the PGI compiler and Pathscale are not even mentioned.

These three compilers have much better Fortran strandards support, but this is another kettle of fish: Fortran was always quite weak in GCC because GCC is primarily C and C++ compiler.

Continuity problems

Posted Mar 23, 2012 18:27 UTC (Fri) by stevenb (guest, #11536) [Link] (3 responses)

Do you have a something to support the "much better Fortran support" comment? I don't think that comment is justified.

My impression is that GFortran is no worse than those 3 compilers, or any other available Fortran compiler. One popular Fortran benchmark (Polyhedron) supports that impression, see:

http://www.polyhedron.com/pb05-linux-language0html
http://www.polyhedron.com/pb05-linux-diagnose0html

Note they're comparing the latest Inter Fortran compiler (v11) to a 3 year old GFortran (v4.4). GFortran in more recent GCC releases has further improved significantly. (NB, Lahey and Absoft are Open64-based, like PathScale).

Also performance wise GFortran isn't so bad, and still improving. See:

http://www.polyhedron.com/pb05-linux-f90bench_AMD0html (v4.1, poor)
http://polyhedron.com/pb05-lin64-f90bench_SBhtml (v4.4, reasonable)
http://users.physik.fu-berlin.de/~tburnus/gcc-trunk/bench... (v4.7, great)

Continuity problems

Posted Mar 23, 2012 19:51 UTC (Fri) by daglwn (guest, #65432) [Link]

Does gfortran support Co-Array Fortran? I think it may support a limited form (intra-node only). I'm fuzzy on the details.

Certainly gfortran is improving but I wouldn't call it up to the same level of standards support as other Fortran compilers.

Continuity problems

Posted Mar 23, 2012 20:59 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

My impression is that GFortran is no worse than those 3 compilers, or any other available Fortran compiler. One popular Fortran benchmark (Polyhedron) supports that impression, see:

http://www.polyhedron.com/pb05-linux-language0html
http://www.polyhedron.com/pb05-linux-diagnose0html

Have you actually looked on the links you've provided? Just count “Yes” and “No”.

GFortran is much better then it was just a few years ago but it's still far behind other implementations. Almost as much as C++11 support is ahead of the others.

Continuity problems

Posted Mar 29, 2012 11:34 UTC (Thu) by David.Duffy (guest, #63252) [Link]

FWIW, for my code with Fortran 2003 features (73000 loc stats package), the executable GFortran produces is significantly faster than from the other 5 compilers I use.

Continuity problems

Posted Mar 23, 2012 19:52 UTC (Fri) by daglwn (guest, #65432) [Link] (1 responses)

Fair enough WRT C++-11. I don't know where PGI or Pathscale fits in there. I'll bet they use the EDG frontend and will get support whenever it is ready from EDG.

Continuity problems

Posted Mar 24, 2012 7:13 UTC (Sat) by codestr0m (guest, #75719) [Link]

For the record - PathScale EKOPath 5 will use a clang based FE for C and C++. We may have some C++11 in the initial release, but it's more likely to come in an update.

Continuity problems

Posted Mar 23, 2012 18:43 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

The claims about being first to provide an "architecture neutral" (whatever that means) vectorizer and OpenMP implementation are laughable. Other compilers have been doing that for decades.

They may have been supporting those features, but they haven't been doing it across multiple architectures the way GCC does. That's the point they're trying to make with the "architecture neutral" comment; they do it across architectures and architecture families. That's tremendously important for anyone who hopes to write software that's usable across the huge range of systems supported by something like GNU/Linux.

Continuity problems

Posted Mar 23, 2012 19:49 UTC (Fri) by daglwn (guest, #65432) [Link]

PGI has targeted several systems over the years. So have more traditional vectorizing compilers like IBM's offerings.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds