|
|
Subscribe / Log in / New account

Continuity problems

Continuity problems

Posted Mar 22, 2012 20:59 UTC (Thu) by james_ulrich (guest, #83666)
In reply to: Continuity problems by josh
Parent article: GCC celebrates 25 years with the 4.7.0 release

Having lurked around gcc for close to 5 years, it seems to me like whole patch review culture simply stems from the fact that there is not a single person who cares about the compiler as a whole. Sure individuals care about their passes (IRA stands out here as being well taken care of), but seldom as to what happens outside of it. And when people do do reviews it very much feels like the response is just "I'll just ack it to get you off my back" which obviously doesn't do wonders for quality. Unless a Linus-of-GCC person comes along, I don't see much long-term future for the project.

The recent decision to move the project code base to C++ is also something that I think will actually hurt them badly in the long run. The GCC code base is very hard to read as-is and moving it to a language that is notorious for being hard to read and understand will not make things any better. (I'm well aware that some amazing pieces of code have been written in C++, but it is not a simple fix to the code cleanliness problem)


to post comments

Continuity problems

Posted Mar 22, 2012 21:18 UTC (Thu) by HelloWorld (guest, #56129) [Link] (8 responses)

The GCC code base is very hard to read as-is and moving it to a language that is notorious for being hard to read and understand will not make things any better.
The GCC code base is actually a perfect example of things being convoluted because of missing functionality in the C language. C++, when used in a sensible way, is a way to fix this.

Continuity problems

Posted Mar 22, 2012 22:05 UTC (Thu) by james_ulrich (guest, #83666) [Link] (4 responses)

You mean C++ would magically make 2000+ line functions with variable declarations spanning over 50 lines easy to read? I think there are much lower hanging fruit in making the GCC code base readable before throwing C++ at it would be beneficial.

The decision to go with C++ seems (to me, an outside observer) to have been driven firstly by some people (I remember Ian Lance Taylor's name, but there where others pushing) "because I like to code in C++", rather than there being a pressing needed feature that would make the code clearer.

Continuity problems

Posted Mar 22, 2012 23:30 UTC (Thu) by elanthis (guest, #6227) [Link] (1 responses)

> You mean C++ would magically make 2000+ line functions with variable declarations spanning over 50 lines easy to read? I think there are much lower hanging fruit in making the GCC code base readable before throwing C++ at it would be beneficial.

Recompiling existing crappy C code with a C++ compiler does no such thing. It may very well provide the tools to rewrite that functions in a readable, sane way that C cannot easily do.

The one clear winner in C++ is data structures and templates. I cannot stress the importance of that enough.

The second you have to write a data structure that uses nothing but void* elements, or which has to be written as a macro, or which has to be copied-and-pasted for every different element type, you have a serious problem.

GCC is a heavy user of many complex data structures, many of which are written as macros. Compare this to the LLVM/Clang codebase, where such data structures are written once in clean, readable, testable, debugging C++ code, and reused in many places with an absolute minimum of fuss or danger.

I present you with the following link, which illustrates a number of very useful data structures in LLVM/Clang that are used all over the place, and which either do not exist, exist but are a bitch to use correctly, or which are copy-pastad all over the place in GCC:

http://llvm.org/docs/ProgrammersManual.html#datastructure

Continuity problems

Posted Mar 23, 2012 6:36 UTC (Fri) by james_ulrich (guest, #83666) [Link]

I can see that the structures and constructs used in compilers lends itself very well to the features of C++.

My point is that the reason GCC is a mess is not because it is written in C. Even with C++, 2000 line functions need to be logically split, and 20 line if() statements with 5 levels deep subexpression nesting also need to be split to make it readable. These, and other, de-facto coding style idiosyncrasies need to be fixed (or at least agreed upon not to write code like that), which is in no way affected by the C/C++ decision.

GCC also has this "property", let's say, that code is never actually re-written, only new methods added in parallel to the old ones. Classic examples are the CC_FLAGS/cc0 thing and best of all reload. Everyone knew it sucked 15 years ago, yet only now are motions made in the form of LRA to replace it (which, BTW, are in now way motivated by using C++). The same can be said for the old register allocator, combine, etc. I somehow doubt that C++ alone would magically motivate anyone to start rewriting these old, convoluted but critical pieces.

Based on past observations my prediction for GCC-in-C++ is that all the old ugly code will simply stay, the style will not really change, but now it will ugly code mixed with C++ constructs.

Continuity problems

Posted Mar 23, 2012 2:00 UTC (Fri) by HelloWorld (guest, #56129) [Link]

I would have responded to your posting, but elanthis was faster at making my point.

Continuity problems

Posted Mar 23, 2012 3:31 UTC (Fri) by wahern (subscriber, #37304) [Link]

Indeed. If you read the clang source code, instead of having 2000 line functions, you have things implemented with something approximating 2000, single line functions. Both are impenetrable. Where GCC abuses macros, clang/LLVM abuses classing and casting. (You wouldn't think that possible, but analyze the clang code for awhile and you'll see what I mean.)

Continuity problems

Posted Mar 25, 2012 0:56 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

I'm sorry, this is nonsense. As James Ulrich points out, the convolution in GCC has nothing to do with the implementation language.

Its biggest problem -- still pervasive in the RTL side of things -- was always global, unstated assumptions, often assumptions wired into target machine description files, RTL optimization passes, and reload. Often an RTL optimization pass would assume (or would grow to assume over years, accidentally) some property of md files that was true for all existing md files but not necessarily true, and then reload would come to depend on the form of the RTL emitted by optimization files when when that property was true. Some of these properties are much nastier than CC0 and can't be grepped for -- and fixing them requires understanding a lot of targets, and *testing* them.

This is slowly being sorted out as more machinery migrates into the tree-ssa side of things, and as older and cruftier targets are slowly decommissioned. But it's a slow, slow job, and it would be every bit as slow regardless of the implementation language. (This is one reason why reload has been such a monster to dump and replace: it's where all these unstated assumptions go to roost. Break just one of them and you might find yourself with wrong code on a couple of random targets you'd never heard of, and the poor sod who finds this is going to have the devil of a time tracking it down to your change.)

Continuity problems

Posted Mar 25, 2012 8:49 UTC (Sun) by james_ulrich (guest, #83666) [Link] (1 responses)

Reload has all these "assumptions" because it is such a hughe big convoluted mess. Also, it is a corner stone in the whole compilation process, so you can't just throw it out. And hence, instead of fixing reload when some bug pops out, all the passes around it are fixed just so that reload doesn't need to be touched.

While I had my share of struggling with reload, by far most of the annoying problems with RTL passes was that lot's of them completely ignore some MD feature or other -- it can not even be said that this is old cruft hanging around: even the web construction pass has issues, which is supposed to serve as an example! This is a review problem.

Continuity problems

Posted Mar 26, 2012 19:32 UTC (Mon) by nix (subscriber, #2304) [Link]

Yeah, but part of the problem with reload being a convoluted mess is that for many years earlier in GCC's history, when some target needed something done, half the time it was done in the md file and half the time it was done by hacking reload so that it did *just* the right thing for some shape of RTL that only that target would produce. And, of course, this was often not documented. Hello unstated assumptions and fragility.

(This is not to say that you are wrong in any way. There are other problems too, not least the 'it works on major targets' RTL, it must be complete' problem you mention...)

Continuity problems

Posted Mar 22, 2012 23:59 UTC (Thu) by slashdot (guest, #22014) [Link] (1 responses)

It's hard to read BECAUSE it is not in C++, obviously.

Although the fundamental reason it's hard to read is because it's not fully a library like LLVM/Clang, so they don't need to write clean reusable code with documented interfaces, and it shows.

The real question is: does it make sense to try to clean up, modularize and "C++ize" gcc?

Or it is simpler and more effective to just stop development on GCC, and move to work on a GPL or LGPL licensed fork of LLVM, porting any good things GCC has that LLVM doesn't?

Continuity problems

Posted Mar 23, 2012 6:59 UTC (Fri) by james_ulrich (guest, #83666) [Link]

Why does everyone pass around using C++ as some magic bullet that fixes all ugliness now and forever? It doesn't and it never will. The only lesson to be learnt from LLVM is that, when *starting from scratch*, a compiler can be well written in C++. Extrapolating that to "GCC's main problem is is not being written in C++ and doing so will fix all our problems" is plain idiotic.

Even if you start coding in C++, you still need to think about how to split long functions, ridiculous if() statements and make other general ugliness clearer. Take this (random example, there are much worse ones):

if (REG_P (src) && REG_P (dest)
&& ((REGNO (src) < FIRST_PSEUDO_REGISTER
&& ! fixed_regs[REGNO (src)]
&& CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (src))))
|| (REGNO (dest) < FIRST_PSEUDO_REGISTER
&& ! fixed_regs[REGNO (dest)]
&& CLASS_LIKELY_SPILLED_P (REGNO_REG_CLASS (REGNO (dest))))))

How exactly will C++ make this more obvious? Ofcourse it won't.

And, no, GCC not being a library is not it its main problem either.

Continuity problems

Posted Mar 23, 2012 4:06 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Rewriting stuff in another language is actually a good way to clean up the code.

Which gcc badly needs.

Continuity problems

Posted Mar 23, 2012 13:27 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Allow the cynic in me to make a possibly unfounded comment:

For some of GCC's ugliness, more of the improvement may come from the "rewrite" part than the "in C++" part. The "in C++" part just encourages a more thorough refactoring and rethinking of the problem, than a superficial tweaking-for-less-ugly.

In any case, nothing will fix GNU's ugly indenting standards as long as the language has a C/C++ style syntax. ;-)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds