Weekly Edition Return to the Front pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Striking gold in binutilsA new linker is not generally something that arouses much interest outside of the hardcore development community—or even inside it—unless it provides something especially eye-opening. A newly released linker, called gold has just that kind of feature, though, because it runs up to five times as fast as its competition. For developers who do a lot of compile-link-test cycles, that kind of performance increase can significantly increase their efficiency. Linking is an integral part of code development, but it can be invisible, as it is often invoked by the compiler. The sidebar accompanying this article is meant for non-developers or those in need of a refresher about linker operation. For those who want to know even more, the author of gold, Ian Lance Taylor, has a twenty-part series about linker internals on his weblog, starting with this entry. For Linux systems, the GNU Compiler Collection (GCC) has been the workhorse by providing a complete toolchain to build programs in a number of different languages. It uses the ld linker from the binutils collection. With the announcement that gold has been added to binutils, there are now two choices for linking GCC-compiled programs. A linker overviewFor non-developers, a quick overview of the process that turns source code into executable programs may be helpful. Compilers are programs that turn C—or other high-level languages—into object code. Linkers then collect up object code and produce an executable. Usually the linker will not only operate on object code created from a project's source, but will also reference libraries of object code—the C runtime library libc for example. From those objects, the linker creates an executable program that a user can invoke from the command line. The linker allows program code in one file to refer to a code or data object in another file or library. It arranges that those references are usable at run time by substituting an address for the reference to an object. This "links" the two properly in the executable. Things get more complicated when considering shared libraries, where the library code is shared by multiple concurrent executables, but this gives a rough outline of the basics of linker operation. The intent is for gold to be a complete drop-in replacement for ld—though it is not quite there yet. It is currently lacking support for some command-line options and Linux kernels that are linked with it do not boot, but those things will come. It also currently only supports x86 and x86_64 targets, but for many linker jobs, gold seems to be working well. The speed seems to be very enticing to some developers, with Bryan O'Sullivan saying:
When I switched to using gold as the linker, I was at first a little
surprised to find that it actually works at all. This isn't especially
common for a complicated program that's just been committed to a source
tree. Better yet, it's as fast as Ian claims: my app now links in 2.6
seconds, almost 5.4 times faster than with the old binutils linker!
Performance was definitely the goal that Taylor set for gold development. It supports ELF (Executable and Linking Format) objects and runs on UNIX-like operating systems only. Only supporting one object/executable format, along with a fresh start and an explicit performance goal are some of the reasons that gold outperforms ld. Tom Tromey likes the looks of the code:
I looked through the gold sources a bit. I wish everything in the GNU
toolchain were written this way. It is very clean code, nicely commented,
and easy to follow. It shows pretty clearly, I think, the ways in which C++
can be better than C when it is used well.
Because the implementation is geared for speed, Taylor used techniques that may confuse some. He has some concerns about the maintainability of his implementation:
While I think this is a reasonable approach, I do not yet know how
maintainable it will be over time. State machine implementations can be
difficult for people to understand, and the high-level locking is
vulnerable to low-level errors. I know that one of my characteristic
programming errors is a tendency toward code that is overly complex, which
requires global information to understand in detail. I've tried to avoid it
here, but I won't know whether I succeeded for some time.
Overall, it seems to be getting a nice reception by the community, with O'Sullivan commenting that he is "looking forward to the point where gold entirely supplants the existing binutils linker. I expect that won't take too long, once Mozilla and KDE developers find out about the performance boost." Once gold gets to that point, Taylor is already thinking about concurrent linking—running compiler and linker at the same time—as the next big step. There are two other ongoing projects that are working with the greater GCC ecosystem in interesting ways: quagmire and ggx. Quagmire is an effort to replace the GNU configure and build system—consisting of autoconf, automake, and libtool—with something that depends solely on GNU make. Currently, that system uses various combinations of the shell, m4, and portable makefiles to make the building and installation of programs easy—the famous "./configure; make" command line. The tools were written that way to try and ensure that users did not need to install additional packages to configure and build GNU tools. Quagmire, which has roots in a posting by Taylor recognizes that GNU make is ubiquitous, so basing a system around that makes a great deal of sense. The ggx project is Anthony Green's step-by-step procedure to create an entire toolchain that can build programs for a processor architecture that he is creating as a thought experiment. The basic idea is to design the instruction set based on the needs of the compiler, in this case GCC, rather than the needs of the hardware designers. He is using GCC's ability to be retargeted for new architectures, along with its simulation capabilities to create a CPU that he can write programs for. As of this writing, he has a "hello world" program working, along with large chunks of the GCC test suite passing. Well worth a look. (Log in to post comments)
Your teaser is too conservative Posted Mar 26, 2008 17:37 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] You write "up to five times as fast", but I'm consistently seeing factors above five. For the program I'm currently living with (build, debug, test, change, build, etc), I get a speedup ratio of 5.9 with gold, and that's with some libraries on an NFS mount. With all files on a local disk, it's even faster.Ian, you're a hero.
A bit premature Posted Mar 27, 2008 14:18 UTC (Thu) by clugstj (subscriber, #4020) [Link] Before we declare him a linker god, we need to remember that what he has links x86 ELF (imperfectly). This is a far cry from a replacement for GNU ld which links dozens of processors' code in at least a half dozen different file formats. A 5x speedup is nice, but it's not nearly as amazing when you consider how little of the previous tool's functionality it has.
A bit premature Posted Mar 27, 2008 14:59 UTC (Thu) by pj (subscriber, #4506) [Link] Still, ld should be optimized for the common case, which is x86 ELF, and it's clearly not. Times changes, the 'common case' changes, and tools should keep up.
A bit premature Posted Mar 28, 2008 0:17 UTC (Fri) by giraffedata (subscriber, #1954) [Link] Besides, Ian doesn't have to produce an actual replacement for Ld or smite Ld to the ground to be a hero or a linker god. If he can produce a replacement for Ld on x86 and x86_64 with ELF, that's godly all by itself.For a great many people, there is no effective difference between a linker that works on x86 and x86_64 with ELF and a linker that works on dozens of processors and half a dozen file formats.
A bit premature Posted Mar 28, 2008 16:20 UTC (Fri) by landley (subscriber, #6789) [Link] > This is a far cry from a replacement for GNU ld which links dozens of > processors' code in at least a half dozen different file formats. 1) Isn't the unix way "do one thing and do it well"? 2) Shouldn't Linux Weekly News be most interested in the tools and formats used by _Linux_? (Is it still interesting to have an ld mode to produce a.out code? I tried to produce a static a.out binary with gcc 4.1.2 and couldn't figure out how in the first hour of trying, and google didn't pull anything up either. How interesting is the ability to produce other binary formats last used by discontinued Hewlett Packard minicomputers from 1986?) 3) The way "binflat" files is created is to make an ELF file, then have a second tool produce a second file from the ELF file. Same for the kernel producing zImage files from the ELF format vmlinux. Much of what GNU ld is doing may not actually actually a good idea...
A bit premature Posted Mar 28, 2008 22:43 UTC (Fri) by nix (subscriber, #2304) [Link] GNU ld *does* only one thing: it links. It doesn't actually know much about object file formats (as you doubtless know, that job is left to libbfd). ld's real problem is *age*: its design predates ELF, and it shows. Its design meshes quite well with COFF, but who uses that naymore? (And I use the ihex ld target on a fairly regular basis. Maybe there are other ways to do the same thing, but it works for me...)
Who uses COFF? Posted Apr 5, 2008 14:22 UTC (Sat) by anton (guest, #25547) [Link] Last I looked, Windows and Tru64 Unix (possibly also other proprietary Unices).
A bit premature Posted Mar 30, 2008 21:40 UTC (Sun) by AJWM (subscriber, #15888) [Link] >> This is a far cry from a replacement for GNU ld which links dozens of >> processors' code in at least a half dozen different file formats. >1) Isn't the unix way "do one thing and do it well"? Yes, but remember, GNU's Not Unix. While it's often an improvement, it does have some idiosyncrasies that drive me crazy (like the insistance on having man pages that basically say 'see the info document', for example).
A bit premature Posted Mar 31, 2008 0:12 UTC (Mon) by nix (subscriber, #2304) [Link] Er, most GNU software has had better manpages than that for many years (derived automatically from the texinfo, just as the info is).
Striking gold in binutils Posted Mar 26, 2008 17:37 UTC (Wed) by joey (subscriber, #328) [Link] I feel for the people developing software where a 5x linker speedup is valuable. Really... I've watched your code build natively on some slow arm machines where linking took 20 hours. Will this really be a big win for most of us? I hope not. Things that take significant time to link tend to be big messes. Now automake and libtool -- that's slow for all of us..
Striking gold in binutils Posted Mar 26, 2008 17:53 UTC (Wed) by elanthis (subscriber, #6227) [Link] C++ programs tend to generate a huge number of symbols. Things like Firefox and OpenOffice.org are pure hell to build. There's also the issue of simply large programs written in other languages, or collections of programs. Shaving 2 seconds off link time may not seem like much, but when you're compiling several dozen executables in one go, that 2 seconds adds up really quick.
Striking gold in binutils Posted Mar 26, 2008 19:18 UTC (Wed) by mto (subscriber, #24123) [Link] Won't this, in the long run, affect load times for applications? dynamically linked applications need to be linked twice: once right after compile, and a second time at run time. If gold could eventually replace ld.so and run 5 times faster, the impact is pretty big...
Striking gold in binutils Posted Mar 26, 2008 20:44 UTC (Wed) by nix (subscriber, #2304) [Link] It can't replace ld.so. The dynamic and static linkers have quite different (although related) jobs, and have very different goals as well (e.g. performance is utterly paramount for ld.so but less so for ld). The only system which has even *tried* to merge the two is AIX, and I think even it gave up in the end.
Striking gold in binutils Posted Mar 26, 2008 19:58 UTC (Wed) by nix (subscriber, #2304) [Link] It's not just 'a huge number of symbols'; it's 'a huge number of symbols with very long names differing only in their last few characters'. This also proved to be a worst-case for a lot of dynamic linkers...
Striking gold in binutils Posted Mar 27, 2008 7:17 UTC (Thu) by michaeljt (subscriber, #39183) [Link] Is this really the case? I once tried hacking up ld.so to do the lookup backwards (it is actually possible without doing a strlen for every comparison) and I could see no difference in performance, based on loading OpenOffice with both linkers and enabling the built-in linker profiling. Of course, I may have messed up something else in the process...
ld.so is different beast Posted Mar 27, 2008 9:43 UTC (Thu) by khim (subscriber, #9252) [Link] You can read about what goes on there in Drepper's article. Scroll down to "The GNU-style hash table".
ld.so is different beast Posted Mar 27, 2008 10:16 UTC (Thu) by michaeljt (subscriber, #39183) [Link] That article was the reason I tried it in the first place :)
Striking gold in binutils Posted Mar 27, 2008 10:46 UTC (Thu) by nix (subscriber, #2304) [Link] Hm, interesting. I'll try it at some point (probably with part of KDE: OOo takes too damn long to build ;} ) and see if I can make it go slow ;}
Striking gold in binutils Posted Mar 27, 2008 11:05 UTC (Thu) by michaeljt (subscriber, #39183) [Link] No need to rebuild anything to try out a new dynamic linker, methinks...
Striking gold in binutils Posted Mar 28, 2008 21:27 UTC (Fri) by nix (subscriber, #2304) [Link] I need to rebuild it to add back a non DT_GNU_HASH :)
Striking gold in binutils Posted Mar 28, 2008 16:26 UTC (Fri) by landley (subscriber, #6789) [Link] A friend of mine who still bothers with C++ explained to me once how C++ compilers used to use a linker optimization where the name mangling would put the innermost identifiers first, and the outermost identifiers last. That way if you had these two symbols: class1.class2.class3.class4.member1 class1.class2.class3.class4.member2 By comparing "member1" vs "member2" first, your string match would figure out inequality faster. If you go the other way, your string matches have to go through lots of common namespace for every symbol before coming to the unique parts, and with BigLongMixedCaseNames this can get fairly ridiculous. Now, which way did the Intel Itanium C++ spec specify that name mangling had to occur? The long way that links slowly, of course. And everybody else picked up the Itanium C++ spec because nobody else bothered to write up a standard for this part of the language.
Striking gold in binutils Posted Mar 26, 2008 19:35 UTC (Wed) by aleXXX (subscriber, #2742) [Link] > Will this really be a big win for most of us? I hope not. Things that > take significant time to link tend to be big messes. Hmm, if you have applications which link to a big number of libraries with many symbols (e.g. KDE) linking does take quite long, so improving this may be really nice. I'll give it a try. > Now automake and libtool -- that's slow for all of us.. Just use cmake, it does it all and also gets rid of libtool :-) Alex
Striking gold in binutils Posted Mar 26, 2008 17:45 UTC (Wed) by quotemstr (subscriber, #45331) [Link] Why was ld so slow in the first place? A speedup of five times, IMHO, indicates that the fundamental algorithm used by GNU ld was wrong, not that the same algorithm is implemented better in gold.
Striking gold in binutils Posted Mar 26, 2008 17:54 UTC (Wed) by nix (subscriber, #2304) [Link] Ian talks about this in his linker tutorial, but, yes, basically GNU ld works inside out and upside down ;} it wasn't originally designed for ELF and it shows.
Algorithm Complexity Posted Mar 29, 2008 8:16 UTC (Sat) by pkolloch (subscriber, #21709) [Link] That's funny. For me a factor of five actually suggests that the implementation quality has substantially increased but not the algorithm. If you had a different algorithmic complexity, you should be able to find larger examples with a larger factor. Maybe that's the case but the "factor five" statement would not tell you that.
More on Linkers and Loaders. Posted Mar 26, 2008 17:48 UTC (Wed) by dfarning (subscriber, #24102) [Link] I am providing a link to the online copy of John Levine's book, Linkers and Loaders. It provides good background information on the topic. http://www.iecc.com/linker/ Dave
More on Linkers and Loaders. Posted Mar 26, 2008 18:28 UTC (Wed) by deweerdt (subscriber, #18159) [Link] Ian's linkers series is worth a read too: http://www.airs.com/blog/archives/38
Blogs not useful for documenting complex information! Posted Mar 26, 2008 17:56 UTC (Wed) by cruff (subscriber, #7201) [Link] I followed the two links to blog entries about gold and ggx with interest, but ran head on into a brick wall where it appears to be impossible to quickly read just the links about each project with out wading through the rest of the irrelevant blog entries. Are there any links to pages that consolidate the relevant project info?
Blogs not useful for documenting complex information! Posted Mar 26, 2008 18:24 UTC (Wed) by jake (editor, #205) [Link] > Are there any links to pages that consolidate the relevant project info? The first ggx link is a summary page that refers to all the blog entries. I haven't found the equivalent for Ian's linker series. I certainly agree that the blog format is very bad for trying to follow along. jake
Blogs not useful for documenting complex information! Posted Mar 26, 2008 19:59 UTC (Wed) by nix (subscriber, #2304) [Link] That's because the whole of Ian's linker series is almost-never-documented gold dust. Read it all, you know you want to...
Blogs not useful for documenting complex information! Posted Mar 27, 2008 0:41 UTC (Thu) by Ringding (subscriber, #34316) [Link] In Ians' case it's quite easy to access the whole series because the integer numbers at the end of the URLs are consecutive.
C++???? Posted Mar 26, 2008 18:38 UTC (Wed) by sylware (subscriber, #35259) [Link] gold is C++... C++ for a core toolchain program?? Well... -->trash.
C++???? Posted Mar 26, 2008 18:54 UTC (Wed) by ncm (subscriber, #165) [Link] You seem to be confusing C++ with Java.
whatever Posted Mar 26, 2008 19:00 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] Keep your 5x-slower linker then; it's C after all.
whatever Posted Mar 26, 2008 20:04 UTC (Wed) by nix (subscriber, #2304) [Link] sylware seems to have something against all languages other than C (and perhaps some flavour of assembler as well). I wonder if (s)he knows that most large C programs (including libbfd) are in many ways object-oriented...
whatever Posted Mar 26, 2008 20:28 UTC (Wed) by elanthis (subscriber, #6227) [Link] A lot of "old beards" dislike C++ for a variety of (mostly) no longer accurate reasons. Keep in mind I'm a huge proponent of C++ and use it for nearly all of my non-Web projects. It generally isn't about object-oriented programming, but more about low-level technical or political issues that were once an actual problem. C++ fractured a lot during its pre-standardization days. This led to a lot of issues with porting code between different compilers. Even after standardization, it's taken a long while for certain vendors (*cough* Microsoft) to catch up and play nice, and there are still a few corner cases where certain compilers don't meet the spec. This really isn't a real concern anymore though. C++ had a non-standard ABI on early Linux systems, leading to constant breakage of C++ programs. Upgrading one's system would often lead to many or sometimes even all C++ programs no longer working. This hasn't happened in many, many years, and is unlikely to ever happen again given that there is a standardized C++ ABI that all major compiler vendors follow. C++ suffers from a massively huge specification that no one compiler implements bug-free, and compiler updates often include standard-compliance fixes that break old non-compliant (but perfectly functional) code. Upgrading your compiler can sometimes result in old code no longer compiling. This is still true today, unfortunately - the GCC 4.3 release has broken a lot of software, and even though the fixes are generally extremely simple to make, they're still annoying. This is only an issue if you aren't 100% sure you're writing standard-compliant code; unfortunately, few of us can be, given how big that spec is. Valid fear. C++ can include some (negligible) performance regressions over C when using certain C++ features. However, these features really have no analog in C at all, and the few attempts at providing similar features in C often reason is horrifically ugly and difficult to use APIs or result in even worse performance than C++'s implementation. These features include exceptions and RTTI. Both of these features can be turned off in almost every compiler, so apps which don't really need them do not need to take the performance hit and can still take advantage of C++'s other features. C++ makes it easier to write bad code that masquerades as good code. That is, C++ allows you to hide what your code is doing behind operator overloading, function/method overloading, virtual vs non-virtual method calls, and so on. C would require the coder to be explicit about what he is doing, and looking at the code makes it obvious what is going on without having to look up every header to figure out which methods are virtual or which global operators or overloaded functions have been defined. Basically, this boils down to there being a lot of shitty programmers, C being less friendly to shitty programmers, and more of those shitty programmers flocking to C++ than to C. Not an issue when you're dealing with someone who is truly good at software engineering, like the gold author. C++ does nothing that C can't do. Anything you can write in C++ you can write in C, and make it just as efficient, possibly even more so in some circumstances. This argument is mostly bogus because it overlooks the fact that writing most of those things in C is way, way harder than writing them in C++ (getting the performance of templatized STL container classes out of C without manually maintaining a metric crap load of almost identical code is not really feasible). It's also bogus because, for those few things where doing it "the C way" or eschewing the standard "C++ way" (like using a custom container for a specific data set that allows for very specific optimization techniques that the STL can't use), C++ is entirely capable of doing it "the C way." Still, it's pretty common to see old beards claim C++ is pointless or meritless, even if it is easily refuted. C++ has additional runtime requirements that make it less suitable for low-level tools and libraries. A C program can get by with just libc. A C++ program needs some extra libraries, for things like iostreams, the new/delete operators, exception handling, and so on. As with exceptions in general, these features may be done without to avoid the additional dependencies, although the c++ compiler will still link them in anyway by default. These dependencies really don't have any true negative impact (not like some of the big GUI apps, C and C++ alike, that require dozens of shared libraries to run... hello GTK/GNOME), but there is a fear among some of having any kind of dependency that isn't 100% mandatory. If the dependency really is an issue it is avoidable without giving up C++'s other features, and the fear is pretty pointless any more, so this reason is bunk. So, there a few valid reasons to possibly avoid C++ for low-level work. Some of those can be worked around with a little effort. In general, given that complete OS kernels have been written in C++ with no ill effect, I think it's safe to say that this kind of fear of C++ in 2008 is greatly unjustified. Old beards aren't likely to change their tune now any more than they were a decade ago, though. :)
+5, Insightful Posted Mar 26, 2008 20:48 UTC (Wed) by nix (subscriber, #2304) [Link] I think I agree with everything you said there, but you said it *very* well.
+5, Insightful Posted Mar 26, 2008 21:08 UTC (Wed) by JoeF (subscriber, #4486) [Link] Ditto. I use C++ a lot, for my day job, and I agree with everything elanthis said. Before my current project, I had to maintain a complex program written almost completely in C. That turned out to be a nightmare, and it actually was fairly well-written code. In my current project, I make use of a lot of STL functionality. Implementing that in C would be hell...
yet another +5 Posted Mar 26, 2008 22:14 UTC (Wed) by pr1268 (subscriber, #24648) [Link] Another C++ fanboy here who really like Elanthis' comments. I write lots of personal utility programs in C, and yes, the C code is usually leaner and more spry than equivalent C++ code. But, it's hard to ignore just how well C++ satisfies virtually every programming paradigm in existence. For me, trying some new paradigm or language feature set in C++ is usually a fruitful academic exercise where I come away so much more enlightened. I wrote a C++ program recently that had all kinds of libc calls (stat, mkdir, opendir, and readdir, to name a few) sitting next to standard library vectors and fstreams. It might have looked like a Frankenstein of a program suffering from a source language identity crisis, but I'll be damned if it didn't run perfectly and efficiently (using existing GCC and binutils). And this "hybrid" C/C++ program ran several orders of magnitude faster than my hand-coded linked list version it replaced (but let's not go to why my linked list version sucked - suffice it to say that code was a brown paper bag experience). ;-) I do understand Sylware's sentiments towards C++ (even if I don't agree with them) - after all, Linus feels the same way. It's all about using the right tool for the right job. With C, you get a well-stocked hand-carry tool box. For many folks, that's perfectly adequate. With C++, you get the entire Sears Craftsman catalog. I'm actively looking forward to seeing more of Gold and its performance gains. Kudos to the developers.
yet another +5 Posted Mar 27, 2008 4:15 UTC (Thu) by wahern (subscriber, #37304) [Link] If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#? I agree that C++ is in some respects a "better C". But I don't want a "better C". I like C because its simple, portable, and the lowest common denominator. I can compile most of my own libraries and applications on Linux, *BSD, and Win32 w/o so much as changing a compiler option. If I want a complex language, there are better ones than C++; the cost/benefit tradeoff is superior. When C++ gets garbage collection and closures, call me. (And Boehm better not be on the line.) If I have to put up w/ the extra baggage (language complexity, superfluous libraries, porting headaches), I demand more bang for my buck.
yet another +5 Posted Mar 27, 2008 7:09 UTC (Thu) by pr1268 (subscriber, #24648) [Link] If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#? I can't find in my earlier post where I said I didn't use Java. In fact, I do from time to time. My earlier post was about why I'm such a fan of C++ - and how I can still enjoy all the benefits of both C and C++, even in the same program. However, I take exception to your implication that Java provides competitive "sugar" to C++. Consider the following example of reading an integer from a file: C++
int x;
ifstream in_file;
in_file.open("foo.txt");
assert(in_file);
in_file >> x;
Java
int x;
String line;
try {
FileReader fr = new FileReader("foo.txt");
BufferedReader br = new BufferedReader(fr);
line = br.readLine();
x = parseInt(line);
}
catch (Exception e) {
}
Not more sugar, IMO. Yes, I do realize that Java has to abstract a lot of file I/O due to the fact that it supports multi-byte character sets on dozens of architectures with different byte orders and file systems, thus explaining the syntactic "salt". But, still, even the C version is pretty simple: C
int x;
FILE* in_file;
in_file = fopen("foo.txt", "r");
assert(in_file);
fscanf(in_file, "%d", &x);
But again, I'm not trying to dismiss Java, only to provide a counter example of where Java fails to provide any programming benefit over C++. Actually, the C and C++ examples above would likely only work on ASCII or UTF-8 filesystems, but Java's UTF-16 support is native to the language1. So, Java gets to tell C and C++ what "portability" means (even if the programmer has to dig through 67 layers of abstraction to accomplish what he/she set out to do). ;-) As for C#, well, despite my own thoughts about Microsoft getting in the way, I just don't see why C# even needs to exist. Microsoft was actively marketing Java development suites and compilers in the mid- and late-1990s (Visual J++, anyone?), adding their own APIs and language features, until Sun Microsystems had the guts to stand up to MS and tell them to stop violating the terms of Sun's license (with a successful lawsuit). It was all sour grapes for MS afterwards, so they just had to go run out and create their own Java imitation. C'mon, Microsoft, you already had legions of Visual Basic programmers! Why go out and create a whole new language when there are so many already out there? Was it because Visual Basic wasn't "Java-like" enough? Not that I think C# is a bad language; it does have some interesting features. But, I get this strange feeling that C# skills will be useless come five years from now, just like Visual Basic skills are in much less demand than they were five years ago. As for coding C#, well, I abandoned MS Windows four years ago for a single-boot Linux. And the only native C# compiler for Linux I know of is Mono. Which causes anomalous behavior on my Slackware machines. Come to find out Mono has library dependency issues with my current toolchain and dynamic linker. In other words, I experienced DLL HELL in Linux2, simply by installing Mono (version 1.24) in Slackware 12. How ironic this is considering I'm trying to write Windows applications in Linux. I suppose the gratuitous DLL hell is all part of the Microsoft "experience" I'm supposed to get whilst writing C# code. No thank you. I mostly agree with the rest of your post - I do indeed like C's portability (hey, it was one of the core motivations for creating the language to begin with), and I also like its efficiency. I can't say that adding garbage collection to C++ would be worthwhile; Even Bjarne Stroustrup labored over whether to include garbage collection in C++ back in the early 1980s. My personal thoughts are that garbage collection built-in to the programming language would send several mixed messages to programmers using that language:
(And Boehm better not be on the line.) I LOLed at your comment. I downloaded a source tarball several months ago (I forget which program/application) that had a Boehm GC dependency. I thought to myself, C++ doesn't have garbage collection for a reason, but why does this particular project feel that it needs to slather a layer of protection over the code? Are the programmers lazy? Or, do they not know how to use new and delete? When C++ gets garbage collection and closures, call me. Perhaps some of your garbage collection needs could be met by using the C++ standard library container classes (e.g., vector, list, deque, etc.). Or, more appropriately stated, your need for a GC to begin with could be eliminated. But, perhaps that's a discussion for another time. I've rambled on long enough, it's been fun pontificating and bloviating (to quote one of my graduate professors). :-) 1 C++ does have explicit support for multi-byte character sets with its wchar_t type. I don't know what kind of support C has in this regards, or if it supports multi-byte characters at all. 2 I'll openly admit that Microsoft unfairly receives the brunt of user frustration over DLL hell when in reality the basic concept of library dependency hell is a Unix creation which predates DLL files by several years.
yet another +5 Posted Mar 27, 2008 10:54 UTC (Thu) by nix (subscriber, #2304) [Link] GC support in the language has one major advantage over not having it: if the compiler and GC layer cooperate, the language can do type-accurate garbage collection. That's pretty much impossible with a 'guess if this bit pattern is a pointer' implementation like Boehm. (But still, why GC in a C/C++ program? Easy: sometimes, the lifespan of allocated regions is complex enough that you don't want to track it in your own code. A lot of large C/C++ systems have garbage collectors in them, often hand-rolled. GCC does, for instance, and while its effect on data locality slowed GCC down a lot, it *also* wiped out huge piles of otherwise-intractable bugs. In my own coding I find that Apache-style mempools and disciplined use of ADTs eliminates most of the need for GC while retaining the nice object-lifecycle benefits of C/C++, so I can use RAII without fear. Losing that pattern in Java is a major reason why I try to avoid the language: in effect Java eliminates memory leaks only to replace them with other sorts of resource leak because you can't use RAII to clean them up for you...)
Python Posted Mar 27, 2008 12:43 UTC (Thu) by ernstp (subscriber, #13694) [Link]
Sorry, completely off topic, I just had to post this.
Python:
int( file("foo.txt").read() )
:-P
Python Posted Mar 27, 2008 13:21 UTC (Thu) by pr1268 (subscriber, #24648) [Link] Show-off! You forgot to catch the exception of the file not opening. Where's your deadParrot() error-handling function? ;-)
Python Posted Mar 27, 2008 17:46 UTC (Thu) by cwarner (subscriber, #47176) [Link] How far we've come.. how far.
Ruby Posted Mar 28, 2008 1:15 UTC (Fri) by Tuxie (subscriber, #47191) [Link]
sorry, I had to :-)
x = File.read("foo.txt").to_i rescue deadParrot
Ruby Posted Mar 28, 2008 16:08 UTC (Fri) by alkandratsenka (subscriber, #50390) [Link]
Reading whole file in memory just to parse int from it's first line is very funny :)
You'll need a longer version like this
(File.open('foo.txt') {|f| f.readline}).to_i rescue deadParrot
c++ vs c Posted Mar 27, 2008 14:51 UTC (Thu) by jimparis (subscriber, #38647) [Link]
> C++
>
> int x;
> ifstream in_file;
> in_file.open("foo.txt");
> assert(in_file);
> in_file >> x;
> C
>
> int x;
> FILE* in_file;
> in_file = fopen("foo.txt", "r");
> assert(in_file);
> fscanf(in_file, "%d", &x)
Here's something that really bugs me about C++. Where's the documentation? With C, "man
fopen" "man assert" "man fscanf" gives me all the info I need. With C++, I suppose some
manual page for ifstream would be most appropriate, but I don't seem to have it. Which
package is that in? Or must I resort to google searches every time?
Of course, even if I did have C++ manpages, deciphering "in_file >> x" still requires that I
track backwards to figure out the types of "in_file" and/or "x" (yay operator overloading!)
c++ vs c Posted Mar 27, 2008 15:13 UTC (Thu) by pr1268 (subscriber, #24648) [Link] I suppose some manual page for ifstream would be most appropriate, but I don't seem to have it. All Glibc standard library functions have man pages (I'm unsure whether these came before or after the shell functions' man pages). I think this might be related to the founding philosophy that C is supposed to be portable, and the man pages were a convenient way of distributing documentation on the system call interfaces without having to decipher C code you've never seen before (not impossible, but time-consuming). I can't recall ever seeing a C++ man page, but then again, the whole language standard was in limbo up until its 1998 ISO standardization. Not sure why they don't exist nowadays, but perhaps Stroustrup would prefer that you buy his book instead (stupid conspiracy theory). Some of the top links in Google searches for various C++ functions and standard library classes are quite decent (IMO). Personally, I recommend anyone trying to "dive into" C++ go find a used C++ textbook. Just be sure to get one dated more recent than 1998 (because older C++ texts are rife with code that predates the ISO standard).
c++ man pages on gcc.gnu.org Posted Mar 27, 2008 17:01 UTC (Thu) by bkoz (subscriber, #4027) [Link] See: http://gcc-ca.internet.bs/libstdc++/doxygen/ I believe some os vendors (debian, I think) package these. -benjamin
c++ vs c Posted Mar 28, 2008 12:00 UTC (Fri) by cortana (subscriber, #24596) [Link] Indeed, man pages are not really suitable for C++ (and many other languages) for the reasons you state. If you are on a Debian system, run: apt-cache -n search libstdc++ doc and install one of those packages. Then check out its directory in /usr/share/doc. The docs are also online at http://gcc.gnu.org/onlinedocs/libstdc++/. A very nice quick reference to iostreams and the STL can be found at http://cppreference.com/. I have to say I don't really prefer the man pages for C development because often they contain oudated or just plain incorrect information. I prefer to use the glibc manual directly for reference.
Use of assert Posted Mar 30, 2008 4:53 UTC (Sun) by pjm (subscriber, #2080) [Link]
Incidentally, please don't use or encourage use of ‘assert’ for checking for I/O errors or
other can-happen runtime conditions. Such checks will disappear when compiled with -DNDEBUG
(or conversely make it impractical to compile with -DNDEBUG, thus discouraging use of
assertions), and fail to give a meaningful error message. That should be ‘if (!in_file) {
perror("foo.txt"); exit(EXIT_FAILURE); }’.
Use of assert Posted Mar 31, 2008 21:05 UTC (Mon) by pr1268 (subscriber, #24648) [Link] Well, I had to level the playing field somewhat since Java forces me to put all that code in a try/catch block... But yeah, I generally do what you recommend in C/C++.
The metric is SPEED not just easier development Posted Mar 27, 2008 10:03 UTC (Thu) by khim (subscriber, #9252) [Link] If the metric is easier development because the language provides more sugar for certain patterns, how do you justify not using Java or C#? Java and C# are using virtual machines and thus are slower. End of story. C is closer to the metal, but suffers from human problem: it's not feasible to generate 10'000 specialization by hand. You need some metaprogramming. If you'll take a look on really fast "C libraries" (like FFTW or ATLAS) you'll find out that while they include bunch of .c files these .c files are not the source! They itself are generated by some automatic process. C++ allows you to do something similar without using yet-another-specialized system (STL and especially boost are big help, but simple template metaprogramming works as well in simple cases). Thus in practice C++ programs written by good programmers are faster then C programs (if you turn of rtti and exceptions, of course). AFAICS this was reason for C++ usage in gold, too. Of course it's very easy to misuse C++, too...
The metric is SPEED not just easier development Posted Mar 27, 2008 10:56 UTC (Thu) by nix (subscriber, #2304) [Link] One point: RTTI and exception handling don't slow down C++ programs anymore, except if dynamic_cast<> is used or exceptions are thrown, and those are things which if you implemented them yourself you'd have a lot of trouble making as efficient as the compiler's implementation (I doubt that you *can* make them as efficient or reliable without compiler support).
Since WHEN? Posted Mar 28, 2008 9:26 UTC (Fri) by khim (subscriber, #9252) [Link] Last time we've checked (GCC 4.1.x) removal -fnortti and/or -fnoexceptions made real world programs 5-10% slower (up to 15% combined). What change happened in GCC 4.2 and/or GCC 4.3??? If you DO need RTTI and/or exceptions of course it's better to use compiler-provided ones, then to write your own, but if not... For things like gold abort() is perfectly usable alternative to the exceptions...
Since WHEN? Posted Mar 28, 2008 21:26 UTC (Fri) by nix (subscriber, #2304) [Link] I think I need to profile this, then, because exception frames should be very nearly free to set up and (non-throw) tear down, certainly not as expensive as 15%. This wasn't on an sjlj target, was it? 'cos they're *so* last millennium.
The metric is SPEED not just easier development Posted Apr 2, 2008 11:12 UTC (Wed) by dvdeug (subscriber, #10998) [Link] Java the programming language doesn't use a virtual machine any more than C does. It happens to usually be implemented using a virtual machine, but there are native compilers, like gcj. Furthermore, the coding matters a lot more than the language, and the language can frequently simplify the coding.
+5, Insightful Posted Mar 28, 2008 0:56 UTC (Fri) by man_ls (subscriber, #15091) [Link] Another +5 here. Only a small detail bothers me:C being less friendly to shitty programmersHaving seen lots of horrible C code I think that shitty programmers feel as confident obfuscating C, C++ or Java code. Just the liberal use of global variables and gotos can get as bad as the worst class hierarchy.
You missed the point Posted Mar 26, 2008 22:41 UTC (Wed) by sylware (subscriber, #35259) [Link] Anything that makes dependent a system tool on more than the complexity of a C compiler should be trashed, period. Why? This is my limit for containement of the size/complexity of the system software stack. I won't go any further. As you perfectly put forward, a C++ compiler, even a non optimizing one, is hell on earth to code compared to a C compiler. A linker as a system C++ program would damage the size/complexity of the system software stack. My conclusion is horribly simple:gold has to go to trash and its coder should work on speeding the properly (namely C) coded ld (oprofile?).
What a narrow-minded viewpoint! Posted Mar 26, 2008 22:55 UTC (Wed) by felixfix (subscriber, #242) [Link] I don't have a list, but I am sure you use tools every day that were developed in some language other than C. Perl and Python come to mind, but at any rate, restricting your tool chain to C-based code is not possible nowadays. It isn't just narrow-minded to want that, it is burying your head in the sand to pretend it is possible.
You missed the point Posted Mar 27, 2008 0:47 UTC (Thu) by ncm (subscriber, #165) [Link] If sylware thinks he understands Gcc's C compiler, it can only be because he hasn't looked at it in a long, long time.
You missed the point Posted Mar 27, 2008 0:49 UTC (Thu) by epa (subscriber, #39769) [Link] gcc has an extremely complex codebase. Your argument would seem to suggest replacing it with a simpler C compiler such as pcc, in order to reduce the total complexity of the 'system software stack'. Similarly you should be running Minix or another kernel that is simple enough one person can read and understand all the code. And I assume you have no truck with the horribly baroque autoconf/automake/libtool rat's nest. From what I've read, gold is much simpler than the overly-general ld implementation it substitutes for. Of course, part of this simplicity is because it is written in a higher-level language. Often this is a worthwhile tradeoff - after all the compiler only has to be written and debugged once. Were this not the case, all programs would be in assembly.
You missed the point Posted Mar 27, 2008 2:12 UTC (Thu) by elanthis (subscriber, #6227) [Link] Your argument makes no sense. The C++ compiler stack is part of GCC, and the C++ portions of the compiler make up such a small percentage of the complexity of the rest of the stack as to be not worthy of mentioning. Plus, I'm fairly sure (might be wrong) that the current versions of GCC have merged the C and C++ parsers. The complexity of C++ does not mean that you get huge, unwieldly compilers. It just means that you have trouble implementing the spec 100% accurately. It's no different some a protocol like HTTP. HTTP (esp v1.1) is actually pretty hard to get fully implemented correctly. A lot of HTTP servers get it wrong, as do a lot of clients, and thus there are certain combinations of HTTP server and client that just don't work together. Despite this, a fully correct HTTP server or client implementation is still pretty short, sweet, and easy to read. HTTP doesn't force ugly code, it's just not as simple a protocol as one might think it is. You can think of C++ the same way. It doesn't require that much extra effort on top of C to write a compiler for, but it trying to make sure you cover 100% of the spec is harder than you might think given how very little C++ adds to the C language.
You missed the point too Posted Mar 27, 2008 2:31 UTC (Thu) by sylware (subscriber, #35259) [Link] If I need to rewrite from scratch a non optimizing C compiler it's easy. Look at tcc, and I have plenty of students who had the "small and simple" project of writing a C compiler. Of course, when we bring C++ on the table, you hear "insane", "crazy", "far to complex" etc... I was refering to *that* complexity, not the current complexity of the best utra-super optimizing compiler which is gcc.
You missed the point too Posted Mar 27, 2008 11:04 UTC (Thu) by nix (subscriber, #2304) [Link] But a random C compiler reimplementation isn't capable of compiling most of the other parts of the stack in any case. GCC can be compiled with just about anything that supports ISO C, but you'll need to reproduce a lot of GCC's (largely undocumented) foibles and language extensions before you can compile, say, the Linux kernel with it. I don't really see why the complexity of the *compiler* is relevant anyway. It's not as if GCC is going to abruptly go away or stop working, so its complexity doesn't negatively impact you at all.
You missed the point too Posted Mar 28, 2008 17:20 UTC (Fri) by landley (subscriber, #6789) [Link] Actually, I'm working on making tinycc (a project derived from tcc) compile the rest of the stack, including the kernel. I have rather a lot of work left to do, of course. :) http://landley.net/code/tinycc I find gold interesting, but not useful in my case because tinycc's linker is built-in. (I'm reorganizing the code to work as a "swiss army knife" executable ala busybox, but that's not in -pre2. Maybe -pre3.) As for the kernel, the linux-tiny project is working on getting that more modular so we need to select less of it... Rob
You missed the point Posted Mar 27, 2008 19:17 UTC (Thu) by tjc (subscriber, #137) [Link] > I'm fairly sure (might be wrong) that the current versions > of GCC have merged the C and C++ parsers. The C parser was rewritten in gcc 4.1, and I *think* its still separate from the C++ parser.
You missed the point Posted Mar 28, 2008 22:25 UTC (Fri) by nix (subscriber, #2304) [Link] Yes. It's not separate from the *Objective C* parser. (The similarity is that, like the C++ parser, the C parser has now made the transition from bison to a hand-rolled parser.)
You missed the point Posted Mar 27, 2008 4:59 UTC (Thu) by artem (subscriber, #51262) [Link] Well then you should really stay away from the programming for some time already. Guess what the language is used for the tools used in designing computer chips? (for reference, see http://www.research.att.com/~bs/applications.html, scroll down to the bullet labeled 'Intel')
You missed the point Posted Mar 27, 2008 5:50 UTC (Thu) by lysse (subscriber, #3190) [Link] colorForth? ;)
You missed the point Posted Mar 27, 2008 7:37 UTC (Thu) by pr1268 (subscriber, #24648) [Link] Google has written a memory allocator library (to compete with the Glibc 2.3 equivalent, ptmalloc2), in C++. Now, my understanding of the memory allocator is that this is a library whose run-time efficiency should be unquestioned. This is code that interfaces with the kernel nearly continuously. Accordingly, C++ would not have been my first choice of programming language in which to implement this (I would have chosen C, but don't mind me--I've never written a memory allocator before!). But, Google's allocator library appears to have improved performance over the incumbent Glibc ptmalloc2 in certain scenarios, according to the graphs near the bottom of that page. And to think this was accomplished with C++ (I'm assuming that the Glibc ptmalloc2 is written in C, but I do ask someone to correct me if I'm wrong).
You missed the point Posted Mar 27, 2008 11:05 UTC (Thu) by nix (subscriber, #2304) [Link] Actually the memory allocator largely interfaces with itself and its userspace callers. Its interface with the kernel is restricted to the occasional sbrk() and mmap() calls.
You missed the point Posted Mar 28, 2008 3:45 UTC (Fri) by pflugstad (subscriber, #224) [Link] Also, TCMalloc doesn't return memory to the kernel at all, while GNU libc's does.
You missed the point Posted Mar 27, 2008 5:49 UTC (Thu) by lysse (subscriber, #3190) [Link] How dare you be this dismissive of *anyone's* work without an alternative to offer? Especially on that flimsiest of pretexts, ideology? You want something done *your* way, do it yourself. Otherwise, take what you're offered. Telling someone that they have to junk what they've done is bad enough when they're only as far as having it working; when they're handily trouncing what they aim to replace, telling them that their replacement isn't "good enough" - because their choice of implementation language doesn't satisfy *your aesthetic tastes* - only exposes *you* as the fool you are. (Unfortunately, yours is the voice of the majority, and humanity is doomed.)
Being dismissive of another's work Posted Mar 28, 2008 0:02 UTC (Fri) by giraffedata (subscriber, #1954) [Link] How dare you be this dismissive of *anyone's* work without an alternative to offer? The way I read it, sylware did offer an alternative: classic binutils 'ld'. He says even as slow as it is, it's better because of the C++ issue. You want something done *your* way, do it yourself. Otherwise, take what you're offered. Surely you don't mean that. We all take a third option all the time: do without.
You missed the point Posted Mar 27, 2008 10:59 UTC (Thu) by nix (subscriber, #2304) [Link] Um, ld's *algorithms* are wrong, and the wrongness is deeply embedded. The only way to speed it up as much as gold is is to rewrite it from scratch. Ian did that, and preferred to do it in C++. Feel free to rewrite it yourself, in whatever language you prefer. When you make something faster and easier to maintain than gold, come back to us.
Good point! Posted Mar 27, 2008 15:31 UTC (Thu) by pr1268 (subscriber, #24648) [Link] Agreed. If you choose C over C++ merely because C++ is "slow", "bloated", or "inefficient" then don't complain any further until you've rewritten all your applications in assembly language! Then we'll talk about efficient code. Now, if you choose C over C++ because you're more comfortable, familiar, or experienced at it, then fine, but don't start making unsubstantiated generalizations about how C++ is slow, bloated, inefficient, etc. C++ isn't nearly as bloated or slow as it might have been a number of years ago. And, the Gold linker may improve this even further.
You missed the point, again. Posted Apr 2, 2008 17:56 UTC (Wed) by sylware (subscriber, #35259) [Link] You are missing the point full throttle, reread my posts.
You missed the point, again. Posted Apr 2, 2008 19:51 UTC (Wed) by nix (subscriber, #2304) [Link] Ah, the last refuge of the erroneous. Sorry, the onus is on *you*: you're the one making the exaggerated claims.
C++ incompatibility history Posted Mar 26, 2008 23:03 UTC (Wed) by jreiser (subscriber, #11027) [Link] The C++ version incompatibility and interoperability nightmare was still very much alive only TWO years ago.C++ had a non-standard ABI on early Linux systems, leading to constant breakage of C++ programs. Upgrading one's system would often lead to many or sometimes even all C++ programs no longer working. This hasn't happened in many, many years, ... Fedora Core 3 was still leading edge in August 2005. Its then-current software updates had gcc-c++-3.4.4-2 and compat-gcc-c++-8-3.3.4.2 because there were incompatibilities between 3.3.4 and 3.4.4. In September 2005, the newly-issued Fedora Core 4 had gcc-4.0.1-4 which was again incompatible with 3.4.4. Fedora Core 5 was released in March 2006, finally signalling that FC3 truly had ridden into history.
C++ incompatibility history Posted Mar 27, 2008 0:27 UTC (Thu) by solid_liq (subscriber, #51147) [Link] The was a gcc problem. gcc had ABI compatibility problems leading into the 4.0 release with C too, not just C++.Also, Fedora is not the baseline standard.
C++ incompatibility history Posted Mar 28, 2008 0:12 UTC (Fri) by giraffedata (subscriber, #1954) [Link] It's also important to note that the echoes of those historical compatibility problems can be with us for a long time. I try to use C++, but it's a trial, because I have systems that have roots going back to Gcc 2 days. There is no switch on these systems I can flip to recompile every program and library with a current compiler. A C++ program compiled with Gcc 3 will not work with an existing C++ library, and vice versa. So when I write new C++ code, I compile with Gcc 2, and probably always will.I recently learned, painfully, that current 'ld' has a similar compatibility problem -- something to do with throwing an exception across object modules. It's worth noting that none of these problems exist with C. I.e. the zero-headache alternative for me is to use C.
C++ incompatibility history Posted Mar 28, 2008 3:49 UTC (Fri) by pflugstad (subscriber, #224) [Link] Heh - I'm still forced to use a EGCS 1.1 cross compiler for a certain embedded OS I work with. Talk about painful. Even more so: it's running under an _old_ version of cygwin on windows (and if you know much about cygwin, you know the old versions had lots of interesting bugs and multiple versions on the same system don't play nice together, so it ends up trashing the more modern cygwin installs)... sigh... Sorry, just had to whine...
Not quite Posted Mar 27, 2008 0:44 UTC (Thu) by ncm (subscriber, #165) [Link] As useful as I find C++, some of the above is not right.There is no standard ABI for C++. G++ (in different versions) has two in common use, with a third coming soon; MSVC++ has others. (Other compilers tend to copy one or other of Gcc's or MSVC++'s, depending on target.) What is different now is that people have learned to include version numbers in the names of library files and library packages, so one rarely tries to link to a library built with the wrong compiler. C++ code can be substantially faster than the best macro-obscurified C code, even without fancy template tricks. The reason is, believe it or don't, exceptions. Checking return status codes at each level in C (besides obscuring code logic!) is slower than leaving the stack to be unwound by compiler-generated code in the (unlikely) case of an exception. Shitty programmers are more likely to code in C++ not because they're drawn to it, particularly, but because C++ is what everybody uses in Windows-land, and that's where most of them come from. That could be taken as a slight on typical Windows development habits, but it's really more a matter of the law of big numbers. The only valid reason to consider C++ unsuitable for some particular "low-level" application is if the environment it must be linked/loaded into was built with a C compiler, and lacks the minimal support needed for, e.g., exception handling. An example is the Linux kernel. There's no reason Linux couldn't all be compiled and linked with G++ -- modulo some undisciplined use of C++ keywords as identifiers -- and then C++ drivers would be fine. However, it would be unwise to throw an exception in many contexts there. Finally, the instability introduced in Gcc-4.x has a lot more to do with the optimizer than with changes to C++ or its implementation. That instability affected C programs (including the Linux kernel) as much as C++ programs. None of these affect the conclusion, of course.
Not quite Posted Mar 27, 2008 4:51 UTC (Thu) by wahern (subscriber, #37304) [Link] Your theory about C++ exceptions being more performant than a comparable C pattern doesn't pan out. It's a similar argument the Java folk give: "Java *can* be faster, because you can do code optimization on-the-fly". The extra tooling that C++ must put into function prologs and epilogs--and is mandated by the various ABIs--for stack unwinding, as a practical matter, adds at least as much work, and usually more. There are tables to index into--often from within another function which must be called, and maybe using a pointer dereference. Any one of those can add up to several register comparisons. I dunno how function inlining effects exception tooling, but I imagine the relative losses only increase. For the rare instance where you really need to fine tune a block or routine, both C and C++ suffice. I once shaved 20% runtime by changing a single line--loop to GCC built-in; it was in C but would've applied equally to C++. In reality, C applications will be moderately faster. But in most cases we're comparing apples to oranges because, for instance, many people prefer exceptions. If they improve _your_ ability to engineer better solutions, and don't hinder others, there is no other justification required. I don't understand why people try so hard to prove that some feature "tastes better and is less fattening".
Imaginary losses Posted Mar 27, 2008 6:16 UTC (Thu) by ncm (subscriber, #165) [Link] Can you this identify any of this "extra tooling" in assembly output from the compiler? Or are you just making it up? You can "imagine" all the "relative losses" you like, but that has nothing to do with the facts.What is factual is that the extra code each programmer must insert in C code to return error codes, to check error codes, and to dispatch based on error codes compiles to actual instructions that must be executed on every function return. When errors are reported by exception, instead, none of those instructions are executed unless an error occurs. The difference has been measured as high as 15%. Now, 15% isn't very much in Moore's Law country, but it's not negligible. It's not a reason to choose one language over another, but it puts the lie to made-up claims that C++ code is slower than C.
Imaginary losses Posted Mar 27, 2008 7:20 UTC (Thu) by alankila (subscriber, #47141) [Link] I'm not sure what kind of code has been used to benchmark that, but assuming the C++ compiler has to insert some low-level call such malloc() into the generated code to handle the new operator (or whatever), it will have to detect the return code from malloc just the same as the programmer using the C compiler. In general, I suspect C code doesn't execute error paths a lot. In a malloc example there is practically nothing to do but die if it fails. So you'd expect the C++ and C code to actually perform pretty much the same instructions -- both would do the call, and both would test for error, and in case of no error they move forward to the next user construct. In case of error, the C program would do something the programmer wrote, the C++ would do whatever magic is required to raise exception (hopefully without further memory allocations, of course). After this point, things do diverge a lot, but I think in most cases there are no errors to handle. Therefore, it would seem to me that both should perform identically, unless error returns are a common, expected result, in which case you'd have to write dispatch logic in C to deal with each error type (normally a O(log N) switch-case statement I'd guess) while the C++ compiler would probably generate code to figure out which exception handler should receive the exception. Somehow I do get the feeling that C should win in this comparison. After all, it's testing the bits of one integer, while C++ has to test exception class hierarchives. In light of this, it seems ludicruous to claim that C error handlers cost a lot of code that need to be run all the time, but somehow C++ exceptions are "free".
Imaginary losses Posted Mar 27, 2008 7:56 UTC (Thu) by njs (subscriber, #40338) [Link] malloc isn't the example to think of here, because yeah, usually you just abort. And the problem isn't that first if statement, where you detect the error in the first place. The problem is that in well-written C code, practically *every* function call has some sort of error checking wrapped around it, because errors in that function need to detected and propagated back on up the stack. It's the propagating that really hurts, because you have to do it with if statements, and if statements are expensive.Compare C:
error_t foo() {
char * blah;
error_t e = bar(blah);
if (!e)
return e;
e = baz();
if (!e) {
free(blah);
return e;
}
/* ... */
}
versus C++:
void foo() {
std::string blah = bar();
baz();
...
}
One might think that the C++ code has "hidden" if statements; for old C++ compilers, that was true. Modern compilers, though, use Extreme Cleverness to avoid that sort of thing. (If you're curious for details, just g++ -S some simple programs and see.)
Imaginary losses Posted Mar 27, 2008 20:40 UTC (Thu) by pphaneuf (subscriber, #23480) [Link] You get a once per function (and thus, amortized better and better with the longer the function) setup and teardown that registers destructors. If there are no destructors, it simply can be left out. Even a "just crash" approach involves one test and branch per possible failure point. On modern processors, having branches is expensive, due to mis-predicting them. I suspect that's one of the reasons that profile-driven optimizers can be so good, is that they can figure out which side of a branch is more likely. In the case of error-handling, which branch is more likely would be readily obvious to a human, but is harder to do for a compiler (see the annotations available in GCC, used by the Linux kernel code). The code size increases with error-handling code, often with "dead spots" that get jumped over when there are no error, which on todays faster and faster machines, means increased instruction cache usage, less locality and so on. I don't doubt that when they happen, C++ exceptions might be more expensive, but the thing with exception is that they don't happen often, and thus, that's the most interesting case.
Imaginary losses Posted Mar 27, 2008 20:14 UTC (Thu) by wahern (subscriber, #37304) [Link] Modern C++ exceptions might be conceptually zero-cost, but it is not less work than comparable C code. The difference is in how the stack is prepared to call the function. There is, evidently, a small fixed cost in every C++ function call which offsets the lack of a test+jump after the call. I admit I was unfamiliar w/ the details of modern exception handling, but I'm glad you forced my hand, because if anything we're cutting through some hyperbole. Also, the error handling pattern in my C code doesn't duplicate as much code as the straw man examples posted here. I'm perfectly capable of using "goto" to jump to a common error handling block within a function, achieving something similar to the range table method of placing the error handling logic outside of the main execution flow. And I do this most of the time, because it just makes sense, and I get, IMO, better readability than exceptions, because there are fewer syntactic blocks to obscure my code. (I admit, that's highly subjective.)
Here's the example you requested. I used GCC--gcc version 4.0.1 (Apple Inc. build 5465)--with -O2 optimization. To compile:
#if CPLUSPLUS
#include <iostream>
void noargs(int i) {
if (i > 1)
throw i;
return /* void */;
}
int main (int argc, char *argv[]) {
try {
noargs(argc);
} catch (int e) {
_Exit(e);
}
return 0;
}
#else
#include <stdio.h>
int noargs(int i) {
if (i > 1)
return i;
return 0;
}
int main(int argc, char *arg[]) {
int e;
if (0 != (e = noargs(argc))) {
_Exit(e);
}
return 0;
}
#endif
Simple, straight-forward code. Let us count the number of instructions from main() to our call to noargs(), and from return from the noargs() to leaving main(). C++ output:
.globl _main
_main:
LFB1481:
pushl %ebp
LCFI4:
movl %esp, %ebp
LCFI5:
pushl %esi
LCFI6:
subl $20, %esp
LCFI7:
movl 8(%ebp), %eax
movl %eax, (%esp)
LEHB0:
call __Z6noargsi
LEHE0:
addl $20, %esp
xorl %eax, %eax
popl %esi
leave
ret
On the "fast-path", we have 12 instructions for C++. Now, plain C:
globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %eax
movl %eax, (%esp)
call _noargs
testl %eax, %eax
jne L10
leave
xorl %eax, %eax
ret
And in C, we have... 11 instructions. Well, well! And I'm being charitable, because in fact there are additional instructions for noargs() which increase the disparity: 8 in C, 12 in C++. That makes the total count 19 to 24, but for simplicity's sake, I'm happy to keep things confined to the caller. Explain to me how this is a poor example. I'm willing to entertain you, and I by no means believe that this little example is conclusive. But, it seems pretty telling to me. I admit, I'm surprised how close they are. Indeed, if anybody suggested to me that C++ exceptions introduced too much of a runtime cost, I'd set them straight. But if they looked me straight in the eye and told me unequivocally that they were faster, I'd show them the door.
Imaginary losses Posted Mar 27, 2008 20:56 UTC (Thu) by pphaneuf (subscriber, #23480) [Link] From my experience, the more common thing is not really try/catch, but letting the exception bubble up. Basically, you just want to clean up and tell your caller something went wrong. We'll agree that if there is a clean up to do, it's probably equally there in C and in C++, right? The "big saving" in C++ is in the case where you just clean up and bubble up the exception. If a function doesn't have cleaning up to do, it doesn't even go in that function at all! As they say, the fastest way to do something is to not do it.
Imaginary losses Posted Mar 27, 2008 21:24 UTC (Thu) by wahern (subscriber, #37304) [Link] Hmmm, good point. So, if you don't throw from an intermediate function, you compound the savings. Well... I guess I'll just call "uncle" at this point. I personally don't like exceptions, specifically because in my experience letting errors "bubble up" usually means that much error context is lost, and the programmer gets into the habit of not rigorously handling errors (that's why, I guess, I didn't think about that pattern). But, in a discussion like this that's inapplicable.
Imaginary losses Posted Mar 27, 2008 22:15 UTC (Thu) by pphaneuf (subscriber, #23480) [Link] My theory is that you do something about it where you can. If you can't think of something useful to work around the problem, then just let it bubble up, maybe someone who knows better will take care of it, and if not, it'll be the same as an assert. That's clearly sensible in a lot of cases, because otherwise there would be no such thing as error statuses, they'd just all "handle the errors". I also quite prefer the default failure mode of a programmer failing to handle an error to be a loud BANG than silently going forward...
Imaginary losses Posted Mar 27, 2008 21:04 UTC (Thu) by wahern (subscriber, #37304) [Link]
I forgot to test multiple calls in the same try{} block. Indeed, for every additional
back-to-back call C needs an additional two instructions (test+jump). So, for moderately long
functions, w/ a single try{} block and lots of calls to some small set of functions, I can see
C++ being faster. The trick is that you don't want the fixed-costs to exceed the gains, of
course. In the above example, C++ pulls ahead at the 4th call to noargs().
It would be an interesting exercise to count the number of function definitions, and functions
call in my code, and multiple by the respective differences of C and C++. But, it seems
complicated by the treatment of blocks in C++. I can see how in some tests C++ came out 15%
ahead, though.
In any event, there is indeed a fixed-cost to C++ exceptions. There might not be a prologue,
but the epilogue is invariably longer for functions, and, apparently, some blocks.
Not quite Posted Mar 27, 2008 16:13 UTC (Thu) by BenHutchings (subscriber, #37955) [Link] Most C++ implementations use range tables for exception handling today, so no extra code is needed in the function prologue or the non-exception epilogue. The possibility of a callee throwing can constrain optimisation of the caller, but so does explicit error checking.
Not quite Posted Mar 27, 2008 20:35 UTC (Thu) by wahern (subscriber, #37304) [Link] From my limited research, it seems the constraint is much more in C++, because C++ must preserve stack state (minimally, the mere existence of an activation record), whereas in C a compiler can obliterate any evidence of a function call, no matter whether or how the return value is used. Granted, I'm not aware of what kind of requirements the C++ standard mandates; certainly I'd bet in non-conforming mode a compiler could cheat in this respect. I'd like to hear some analysis on this. Inlining in general, though, is actually important, because in C one of the biggest fixed costs you have to keep in mind is function call. As shown in my example else thread, there's comparatively quite a lot of work to maintain the stack. This is, of course, a big deal in most other languages, too. If you've ever written much PerlXS (and peered behind the macros), at some point it dawns on you how much work is being done to maintain the stack--it's incredible! The fixed costs of maintaining call state in Perl dwarfs most everything else--excepting I/O or process control--including manipulation of dynamically typed objects.
Not quite Posted Mar 27, 2008 22:28 UTC (Thu) by ncm (subscriber, #165) [Link] For the record, nothing about exceptions in the C++ standard or common implementation methods interferes with inlining. In practice, the body of an inlined function is just merged into the body of whatever non-inline function it's expanded in. The only place where exceptions interfere with optimization is in that the state of a function context at a call site must be discoverable by the stack unwinder, so it can know which objects' destructors have to run. In practice this means that calls in short-circuited expressions, e.g. "if (a() && b()) ...", sometimes also set a flag: "if (a() && ((_f=1),b())) ...". This only happens if b() returns an object with a destructor, i.e. rarely.
an "old beard" ? Posted Mar 27, 2008 1:22 UTC (Thu) by tialaramex (subscriber, #21167) [Link] I guess I'm an "old beard". It's strange to hear that, maybe I should be more pleased than I am. I have an old edition of Stroustrup's book, unlike K&R it is dusty and lives on the bottom shelf alongside other technical works that proved useless or unreadable. I must say that, as a beginning programmer with some experience of C++ when I bought it, it was disappointing. A triumph of ego and verbosity, even. Well, on the one hand you're right, after literally decades of work C++ has more or less matured into a language that you can use to write software for the real world without incurring significantly more pain that C. The stable ABI in particular took a lot longer to arrive than it had any reason to, and longer than you've really allowed in your description. But that maturity comes with a lot of caveats. It was already arguably too easy to write C that you couldn't understand, thus making it unmaintainable, C++ provides any number of features which make that worse, and nearly every beginner text seems to emphasise these features as benefits. The result is that a new "C++ programmer" is often pumping out a mixture of pseudo-code masquerading as program code and Perl-style unmaintainable gobbledegook. By the time C++ was being invented we already knew that the challenge wasn't adding more expressiveness (though given a time machine maybe I'd add namespaces and possibly a weak "class" concept to C89), but delivering more maintainability. The author of this program has already expressed his doubts about the maintainability of his code. Is that inevitable in a C++ program? No, but the language definitely isn't helping. No-one, so far as I can see, is claiming that C++ actually made it significantly easier to write this linker (except perhaps in the sense that the author prefers C++ and he was writing it) or that its performance benefits are in any way linked to the choice of language. So it's understandable that there's concern that we're going to get ourselves an abandoned and unmaintainable piece of software in the core of the tool chain. Maybe one of the people who feels more strongly than me (and has more spare time, it's 0100 and I'm still working) will implement the same approach in C and eliminate the perceived problem.
an "old beard" ? Posted Mar 27, 2008 2:33 UTC (Thu) by felixfix (subscriber, #242) [Link] The problem many of us commenters have with sylware's comment is that it is ludicrous to expect a modern system to rely only upon C and avoid anything fancier. I commented on that above -- but I want to say a little more in response to this. I avoid C++ like the plague for my own purposes, for, I think, pretty much the same reasons -- it is far more complex for not much gain. It's been a long time since I did anything with C++, so maybe these comments will be rusty too, but two bad memories come back. One is malloc/free in C, as horrible as they are and easy to abuse, turning into three pairs of calls (new/delete and new array/delete array) -- mix them and have an instant mess -- not just in the two new ways, but in the additional traps of mixing them up. How can that be considered progress? The other bad memory was needing to specify virtual if you wanted to override methods -- what is the point of object oriented if overriding wasn't the default? The entire language struck me as more of a rushed standard to beat other OO versions of C, and then one pile of rushed patches to the spec after another. Nevertheless, some people get along better with C++ than others do with C. Forcing everyone to write in C will simply result in more bad matchups between personalities, projects, and tools. Old beards like UNIX because it is a system of tools which allow the users and developers to pick the right combination for them and the job. If forcing everybody to use C were the answer, it wouldn't be UNIX, it would merely be Microsoft "my way or the highway" but on the C highway instead of the C++ highway.
C++ features and performance |