|
|
Subscribe / Log in / New account

GCC for new contributors

David Malcolm has put together the beginnings of an unofficial guide to GCC for developers who are getting started with the compiler. "I’m a relative newcomer to GCC, so I thought it was worth documenting some of the hurdles I ran into when I started working on GCC, to try to make it easier for others to start hacking on GCC. Hence this guide."



to post comments

GCC's dialect of C++

Posted Mar 29, 2017 15:08 UTC (Wed) by epa (subscriber, #39769) [Link] (13 responses)

It would be an interesting thought experiment to ask the gcc developers: if they could design an ideal language for implementing gcc in, what would it look like? And if gcc had to then support that language, would this change their decision about what features to include?

GCC's dialect of C++

Posted Mar 29, 2017 20:44 UTC (Wed) by joib (subscriber, #8541) [Link] (12 responses)

Disclaimer: I'm just a volunteer contributor (with very little time to contribute these days, to boot) to one of the frontends.

I think the "ideal" language for implementing compilers would be something with 1) decent performance 2) memory/type/etc. safety 3) high level features (pattern matching, first-class functions, etc.). Rust, maybe, but OTOH a compiler tends to a) not have real-time requirements and b) lots of variously interconnected graphs, so I think a language with GC would be a better match than Rust's ownership based memory management. Haskell is maybe a bit too esoteric. Maybe OCaml, or Swift, or D (no personal experience here)?

That being said, if one looks at the current GCC code, it looks like trying to program Lisp in C (e.g. trees are all the same generic type from the C perspective, with a tag field specifying which type it really is), getting the worst of both worlds. So in that spirit, why not just bite the bullet and write the thing in a proper Lisp then? :)

GCC's dialect of C++

Posted Mar 29, 2017 21:11 UTC (Wed) by smoogen (subscriber, #97) [Link] (6 responses)

They would then need to create a LISP interpreter which would then need a compiler :). While languages like Haskell and Rust do have advantages, I think most compiler engineers would prefer a language which has been fully implemented and 'standardized'. Many a compiler engineer started off looking at the innards of a 'current' compiler and gone "What the heck happened here? We should redo this code in..." where .. is a language that is in vogue or has neat features like 'object orientation' 'type safety', 'garbage collection', etc. They dive in and might even get a proof of concept that does 60% of the language specification. Then they start trying to get that next 20% they find out that the language they ported to isn't really good at some aspect.. maybe it is that the standard library isn't complete, maybe it is that the garbage collection gets crazy at the wrong time, or maybe you really needed to do something in a very unsafe way.

The last 20% is where the compiler needs to work on a lot of different computers, many of the architectures haven't been built in 20-40 years... all of them having interesting quirks that are known to induce the eventual hearing of discordant flutes and drums in compiler engineers.

The final 10% (yes that makes 110%, long lived compilers are always non euclidean in volume) are the various optimizations that can be performed on every one of those architectures.. except when they can't be. Did I mention the drums?

GCC's dialect of C++

Posted Mar 30, 2017 9:52 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

FWIW, Haskell is a standardized language and there is at least one full implementation of the latest standard.

GCC's dialect of C++

Posted Mar 30, 2017 15:25 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

OCaml seems to be a popular language for implementing compilers as well.

GCC's dialect of C++

Posted Mar 31, 2017 10:16 UTC (Fri) by joib (subscriber, #8541) [Link]

Well, so is ANSI Common Lisp (1994).

GCC's dialect of C++

Posted Mar 31, 2017 10:56 UTC (Fri) by joib (subscriber, #8541) [Link] (2 responses)

I think the fact that GCC, as well as many other compilers, are implemented in C/C++ is more a testament to the aphorism that with sufficient thrust even pigs can fly, rather than how awesome C is. There's plenty of examples of high-quality compilers implemented in other languages.

Now, certainly in the real world (as opposed to the "ideal" question I was originally answering) there are considerations like bootstrapping, dogfooding, portability etc.. So yes, for a primarily C or C++ compiler, you can certainly find reasons for implementing it in C or C++.

GCC's dialect of C++

Posted Mar 31, 2017 16:09 UTC (Fri) by smoogen (subscriber, #97) [Link] (1 responses)

I have heard the "with sufficient enough thrust pigs fly" for about every language compiler out there.. usually by the person having maintained the compiler for any length of time.

That said, I don't disagree that the bootstrap problem is what keeps gcc being written in C/C++ versus a saner language like LISP. However the days of native running LISP machines seem to becoming pale shadows as I have never actually directly used one myself but was told of their Halcyon days when I was getting into computers in the late 1980's.

[Of course I expect someone will start a kickstarter to make a new Symbolics on a chip.. to plug into a raspberry pi.. but there is a limit to how far I will follow the mad fluters of Azathoth.]

GCC's dialect of C++

Posted Mar 31, 2017 16:50 UTC (Fri) by joib (subscriber, #8541) [Link]

No need to resurrect Lisp machines, there's plenty of Lisp compilers that compile to native code on common platforms. The most popular on Linux is probably SBCL, see their table of supported platforms: http://sbcl.org/platform-table.html

The performance Lisp compilers are able to achieve is pretty amazing considering how dynamic Lisp is. Not as fast as C of course, but leagues ahead of similarly dynamic languages like python, ruby. Something in the range of javascript performance in general, depending on the benchmark of course, which is still pretty amazing considering the amount of $$$ poured into JS compilers in the past decade.

GCC's dialect of C++

Posted Mar 29, 2017 21:50 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

You don't want mandatory GC independently-applied for all objects because of its cache effects. GCC moved to GC for almost everything in 3.0. It slowed down, a lot. A lot of migration of things with predictable lifespans back into obstacks and other non-GC storage followed, speeding it up again.

GC is really useful for a lot of parts of a compiler (the cross-pass parts with independently-variable lifespans, mostly). It is not so useful for those bits that live and die in one pass or that have coupled lifespans: a poolized allocator is a better fit there.

Of course, not all language-level GCs are applied to each object independently, but a great many seem to be, or are applied in a way that is not particularly predictable to the developer (hello, Haskell!). Such things are problematic for compilers with architectures similar to GCC's -- though obviously it is *possible* to write compilers in such languages. You just have to get your speed back somewhere else. :)

GCC's dialect of C++

Posted Mar 30, 2017 19:21 UTC (Thu) by smckay (guest, #103253) [Link] (3 responses)

Not that I've used it very much, but Apple's ARC and autorelease pools are the best answer I've seen to "how do we make memory management easier than malloc/free without the latency cost of GC". Rust is safer but I don't think it's easier, although you do get other nice guarantees that I don't think ARC provides. If Apple wasn't so focused on desktop and mobile I think I'd use ObjC or Swift a lot and like them a lot.

GCC's dialect of C++

Posted Mar 31, 2017 8:49 UTC (Fri) by roc (subscriber, #30627) [Link] (2 responses)

Atomic refcounts for everything don't sound great if you're trying to write efficient parallel code.

GCC's dialect of C++

Posted Mar 31, 2017 16:57 UTC (Fri) by smckay (guest, #103253) [Link] (1 responses)

When Apple says ARC they mean automatic refcounting, with the compiler calling retain/release instead of the programmer. It's actually their second attempt at eliminating manual refcounting; first there was a GC mode of the ObjC runtime, which was swiftly deprecated and I don't think even exists anymore. Going by https://clang.llvm.org/docs/AutomaticReferenceCounting.html, atomic ops are used for weak references but not otherwise. Section 6 specifically calls out that using atomic ops pervasively would kill performance.

GCC's dialect of C++

Posted Apr 6, 2017 1:22 UTC (Thu) by nix (subscriber, #2304) [Link]

AIUI, frequently the refcounts are eliminated entirely -- which is a good thing, because a refcount change is a memory write, and memory writes are comparatively expensive these days (particularly but not only for cross-core-shared things).


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds