LWN: Comments on "What's new in GCC 4.5?"

What's new in GCC 4.5?

Spudd86 — Mon, 31 May 2010 15:52:25 +0000

Err doesn't setting the control word before doing ANYTHING mean that you'll keep a 53 bit mantissa throughout?

What's new in GCC 4.5?

roelofs — Sun, 30 May 2010 01:01:13 +0000

Was that with or without debugging symbols?

With. In this application, auto-gdb-backtrace was pretty much a necessity.

I'm no longer working on that particular project (or even in C++), but I'll keep jwakely's and your suggestions handy in case it crops up again.

Thanks,
Greg

What's new in GCC 4.5?

Cosan — Sat, 22 May 2010 23:18:45 +0000

LTO is great. When I saw that GCC 4.5 had support for it, I immediately installed it so I could see how well it works, and I was not disappointed.

I've been working on a project that makes use of a lot of small functions. GCC 4.4, at -O3, inlines many of them and this gives a measurable boost in performance. The problem was that one of my source files was getting pretty large, and I wanted to split it up. Of course, splitting it up meant no more inlining (unless I moved a lot of code into headers, and I didn't want to do that).

Once I had 4.5 installed, I went ahead and did the split. Timing it without LTO showed that it was measurably slower, as expected. However, enabling LTO boosted it right back up to the speed it had been running at previously. There was no loss of performance and the code became much more manageable. Three cheers for LTO!

Open64's LTO (aka IPA) is similarly useful, for the record.

What's new in GCC 4.5?

robert_s — Sat, 22 May 2010 14:00:59 +0000

That's not true.

They're still there, but there are just better replacements for both of them. The only situation where this might be true is if bit 29 of CPUID 0x80000001 is not set, in which case you can't use MMX in long mode.

x87 is always there.

What's new in GCC 4.5?

foo-bar — Fri, 21 May 2010 14:10:07 +0000

On 32-bit x86 SSE2 is sometimes slower than x87.

What's new in GCC 4.5?

mjw — Thu, 20 May 2010 17:47:41 +0000

> Not a word about the speed of the compiler itself.

It became a lot faster!

http://gcc.gnu.org/ml/gcc/2010-04/msg00948.html

"In general GCC-4.5.0 became faster (upto 10%) in -O2 mode. This is first considerable compilation speed improvement since GCC-4.2. GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64 SPECFP2000 in -O2 mode) code too in comparison with the previous release. That is not including LTO and Graphite which can gives even more (especially LTO) in many cases."

What's new in GCC 4.5?

zaitcev — Thu, 20 May 2010 16:15:29 +0000

Not a word about the speed of the compiler itself.

What's new in GCC 4.5?

jwakely — Thu, 20 May 2010 10:03:58 +0000

See http://www.cs.huji.ac.il/~dants/papers/MinimizeDependenci... for another technique for reducing template instantiations without having to resort to function pointers.

What's new in GCC 4.5?

quotemstr — Thu, 20 May 2010 03:56:03 +0000

A 15 MB C-only executable grew to ~600 MB as parts of it were rewritten in C++

That's huge! There's no good reason to tolerate that level of bloat. Was that with or without debugging symbols?

Part of the cause is almost certainly forced inline function generaton. Using hidden symbols allows the compiler to skip the generaiton of certain functions --- if they're private symbols, the compiler can assume they're not overwritten at load-time.

Another thing to keep in mind is C++ template generation, as you mentioned. It's easy to achieve a combinatorial explosion of template instantiations when you have a template library used in many difficult circumstances. It's often worthwhile to have generic, templated code just be an inline-only, typesafe wrapper around concrete code; use function pointers to let that concrete code safely work with whatever the higher-level wrapper gives it.

Using that approach, you give up a tiny bit of runtime performance for a huge reduction in code size. Imagine the difference between qsort() and std::sort --- it's easy to write the latter such that the entire sorting agorithm implementation is emitted once per type sorted! (It's also possible for a C++ library implementor to write std::sort using the type erasure technique I mention.)

What's new in GCC 4.5?

roelofs — Wed, 19 May 2010 21:46:16 +0000

Compiling gcc with a C++ compiler has already uncovered a number of latent bugs, such as comparing values of enum_type_1 to values of enum_type_2. That's not an error in C, because enums are just ints, but in C++ they're distinct types and the compiler catches the problem.

Those are excellent benefits, and I've come to like C++ for such reasons--as long as one doesn't go overboard, of course. C++ can lead to "write-only" code, i.e., easy to write, impossible to maintain. One needs a little discipline and design sense, which I'm sure the GCC folks have in abundance. (Doug Crockford has made similar comments about JavaScript, btw. Just because the language officially supports something doesn't mean you should actually use it. :-) )

One unforeseen drawback we encountered, however: generated code size (that is, binaries) exploded. A 15 MB C-only executable grew to ~600 MB as parts of it were rewritten in C++. I still think it was worthwhile overall, but holy cow...don't underestimate the pain of creating, deploying, loading into memory, and core-dumping huge binaries. (Some of it might have been due to symbol visibility; I never had time to investigate. I think quite a bit was due to template use. No doubt you guys will figure out ways to keep it under control in GCC...)

Greg

What's new in GCC 4.5?

dark — Tue, 18 May 2010 21:31:50 +0000

This doesn't sound like a complete solution. I think you would also have to use 'double' everywhere and excise 'float' from all your code in order to get consistent results. Though it's probably still okay to use 'float' in arrays as long as you convert to 'double' for all calculations.

What's new in GCC 4.5?

pharm — Tue, 18 May 2010 16:50:09 +0000

Oh wait, I see what you're saying.

I suppose you can set the control word to 53-bit mantissa & copy a value from one FP register to another. That would be a bit slow though.

What's new in GCC 4.5?

pharm — Tue, 18 May 2010 16:47:10 +0000

No, Intel processors can switch between 64-bit & 80-bit floating point register mode. You can use the -mpc option to gcc to force 64-bit floats.

Why people bang on about -ffloat-store instead of pointing people to -mpc64 if they want to truncate floats to 64 bits on Intel platforms I'm not sure.

Check out the FLDCW (Floating Point Load Control Word) instruction for the gory details.

What's new in GCC 4.5?

mpr22 — Mon, 17 May 2010 08:10:55 +0000

>A feature that force developers to think very carefully of what they are trying todo. Having to thought it for 5 times of whether it is possible to free up a pointer. Check a million times for dangling ones.
And get a Schrödinbug when you (almost inevitably) miss one.

I like C. I like C++. I am not so enamoured of either to call it a flawless or even merely universally superior choice in all problem spaces.

What's new in GCC 4.5?

cph — Mon, 17 May 2010 06:19:09 +0000

On the other hand, the programmer can always set the floating-point control word to do 53-bit precision; this makes the in-register values have the same precision as the in-memory ones.

I don't understand why the article didn't mention this. It's a simple fix that gives consistent results regardless of the memory/register optimization.

What's new in GCC 4.5?

HelloWorld — Sat, 15 May 2010 14:57:07 +0000

<blockquote>C "bugs" of taking-everything-programmers-throws-at-it is actually a "features".</blockquote>
This is *exactly* the kind of *bullshit* that keeps the same bugs happening over and over again in C programs.

Good programmers think about their code anyway, but no matter how good they are, they *will* make silly mistakes, and if the compiler (or whatever else) catches those, then that is a Good Thing.

What's new in GCC 4.5?

nix — Sat, 15 May 2010 10:28:08 +0000

Your implication here is that the GCC developers are trying to go to C++ because they haven't mastered C. Nothing could be further from the truth. GCC uses every C coding trick going and then some (with one single exception: no tricks relying on GCC extensions are used in the middle-end or C frontend because they must be compilable with non-GCC bootstrap compilers, and no tricks that bootstrap compilers choke on are allowed there either, which is why everyone hated trying to bootstrap with the horrible HP-UX 10 bundled C compiler). The language it's written in uses so many elaborate macros it's barely even C anymore (in this it is similar to many other large C projects). And that's the problem: many of these macros are intrinsically non-typesafe, and bugs *do* crop up as a consequence of this.

Regarding the 'free up a pointer' thing, well, this proved so intractable to get right for GCC (where many objects have extremely hard-to-describe and interacting lifetimes crossing many passes) that it ended up with a garbage collector simply to lift the burden of manual memory management from the developers; it is not known how many bugs this fixed, but it was surely a lot. (Some heavily-used objects have since been shifted back from GC for speed reasons, but it's a case-by-case judgement whether to *not* garbage-collect, rather than vice versa.)

What's new in GCC 4.5?

RCL — Sat, 15 May 2010 06:02:50 +0000

Just use 64-bit OS. Luckily, there's no FPU in x86-64, it's gone together with MMX.

What's new in GCC 4.5?

daglwn — Sat, 15 May 2010 05:37:07 +0000

Just because the standard might allow it doesn't mean customers will. :)

What's new in GCC 4.5?

arief — Sat, 15 May 2010 05:08:14 +0000

I would second this.

C "bugs" of taking-everything-programmers-throws-at-it is actually a "features".

A feature that force developers to think very carefully of what they are trying todo. Having to thought it for 5 times of whether it is possible to free up a pointer. Check a million times for dangling ones.

C is easy to comprehend and hard to master. While C++ is hard to understand and hard to master.

What's new in GCC 4.5?

jwakely — Sat, 15 May 2010 02:13:51 +0000

> I should really have waited for jwakely to answer more authoritatively...

I only focus on the C++ library so I'm not up to speed on LTO either, but stevenb is :-)

What's new in GCC 4.5?

giraffedata — Fri, 14 May 2010 22:51:44 +0000

First, to be clear, I'm using the term "linker" in the same sense as the phrase "link time" in the name LTO, which means the linker is GCC. GCC is the program to which you feed .o files and get an executable out.

If instead of using GCC to link my .o files I use GNU 'ld', it will still work, right? And it looks like 'ld' doesn't know what LTO is.

Even GCC doesn't always know what LTO is. GCC 3 doesn't.

LTO could have been designed so that 'ld' and GCC 3 could not link the .o files created by gcc -flto, but it looks to me like it was a design objective that they be able to.

What's new in GCC 4.5?

stevenb — Fri, 14 May 2010 22:48:13 +0000

I thought Fortran is the most liberal language of all when it comes to re-ordering floating point computations? I'll confess it's been a while since I programmed Fortran, but I seem to remember that anything goes, except when a computation is in parentheses.

What's new in GCC 4.5?

stevenb — Fri, 14 May 2010 22:43:34 +0000

What you were going to say would have been true. Writing GIMPLE doesn't cost much time. The problem is that after writing GIMPLE, the code is pushed through the entire compiler pipeline to write the rest of the assembler output too.

So the GIMPLE goes through the compiler pipeline twice: during compilation to an object file, and during link time optimizations. That is where the extra cost comes from.

We have our smartest people working on a solution for this... ;-)

What's new in GCC 4.5?

nix — Fri, 14 May 2010 21:45:27 +0000

The linker doesn't have to know what LTO is for -flto to work (at least, not unless you put such .o files into .a files); all that needs to know is collect2, and collect2 is part of GCC so it always knows.

What's new in GCC 4.5?

giraffedata — Fri, 14 May 2010 21:18:04 +0000

Thanks for the explanation.

I suppose the objective is not just to let someone choose a non-LTO link, but for the .o file to be useful by a linker that doesn't even know what LTO is.

I was going to say the time to write the GIMPLE shouldn't be enough to be a consideration against using -lto, but then I remembered that I once avoided compiling with debugging information because I was using NFS and writing the .o files took significantly longer with -g.

What's new in GCC 4.5?

daglwn — Fri, 14 May 2010 20:28:51 +0000

While this problem will never be solved in computers with inexact representation of floating-point numbers,

That's simply not true. Compilers have been dealing with this for a long time. For example, good Fortran compilers take great pains not to reorder floating-point computation. There are many solutions available for the x87 problem other than -ffloat-store. For the vast majority of x86 machine today, compiling for sse2 works great.

Usually the user cares more about consistency on one architecture (compiler flags not changing results) than consistency across architectures (bitwise matching results on different processors). The latter is indeed very difficult to achieve but even that is possible with enough work. Maintaining consistency across flags (other than those designed to relax consistency) is not very hard at all.

What's new in GCC 4.5?

nix — Fri, 14 May 2010 19:21:32 +0000

When compiling a source file with -flto, GCC outputs *two* things; the traditional object file format, run all the way through the target assembler, and (in a single ELF section in the .o) the serialized representation of the GIMPLE tree that gave rise to it (and associated stuff). IIRC, both these outputs come from the same run (so parsing is only done once), but still this is more work and more data to write out than if we could *rely* on -flto being used at link time, because we could stop at the GIMPLE stage and not write out all the native code.

When linking with -flto, only the GIMPLE form is used and the native code in the .o files (and .a files if gold(1) is in use) is thrown away; when linking without it, only the non-GIMPLE form is used, and the GIMPLE in the .o files is thrown away.

(IIRC, of course. I haven't been paying enough attention to GCC development for the last year or so for this to be more authoritative than the ramblings of any passing madman. I should really have waited for jwakely to answer more authoritatively...)

What's new in GCC 4.5?

giraffedata — Fri, 14 May 2010 18:13:28 +0000

Explaining why LTO increases compile time:

the individual object files are driven all the way to assembler

What does that mean?

What's new in GCC 4.5?

jwakely — Fri, 14 May 2010 09:35:25 +0000

As well as increased type-safety C++ gives you automatic memory management (via destructors) which could potentially replace the garbage collection used today.

Gcc uses lots of hash tables and vectors internally (the VEC type mentioned at the link you gave) which could be replaced by standard C++ containers - although that's a bit less certain, as it would require a working C++ standard library as well as C++ compiler to bootstrap.

There are of course downsides to C++, so let's not have a language war here :)

What's new in GCC 4.5?

pr1268 — Fri, 14 May 2010 02:40:20 +0000

GCC 4.5 is the first release of GCC that can be compiled with a C++ compiler.

I had to think about this for a moment. Why? Isn't GCC already working just fine (i.e. fast and [reasonably] efficient) as is in C? Then, visiting GNU's GCC page link in the article, I began to wonder if the developers want to use those features of C++ not present in C for the compiler? (Like classes, OO, and templates.)

Of course, any compiler can be written in any Turing-complete language. Even the 2nd edition of the Dragon Book has the source for a front-end written in Java.

My questions border on rhetorical, but perhaps I'm just trying to stimulate a discussion on this. Thanks!

My "favourite" FP bug

alex — Thu, 13 May 2010 15:58:38 +0000

I had a real head twister in a previous life caused by FP numbers getting pushed through the x87 when I didn't want them to be. A real pain when your trying to emulate another architectures FP behaviour as closely as possible.

What's new in GCC 4.5?

foom — Thu, 13 May 2010 15:52:37 +0000

You're not really supposed to use the x87 FPU these days, anyways. Use SSE2 instead, which actually uses 64bit FP operations instead of 80bit. Then you don't have the problem in the first place.

Unfortunately most software for Linux/x86 is compiled without SSE2 enabled, because distros want to support pre-Pentium4 processors.

What's new in GCC 4.5?

mpr22 — Thu, 13 May 2010 10:09:37 +0000

x86 is fundamentally a big bag of hacks and kludges. Glaring deficiencies are never surprising.

What's new in GCC 4.5?

creemj — Thu, 13 May 2010 00:26:15 +0000

I take it from this discussion that the Intel processors do not have a CPU/FPU instruction to convert the 80 bit representation to 64bit IEEE compliant representation (and vice versa) directly within the FPU register without a memory store? To myself, who knows very little of Intel architecture, that is surprising.

What's new in GCC 4.5?

nix — Wed, 12 May 2010 21:26:12 +0000

Among other things, LTO doubles the size of object files and .a files, increases the time taken to compile (as the individual object files are driven all the way to assembler in case they're linked *without* -flto), and in 4.5 at least interacts badly with debugging information, so distros might not be able to use it for most of their packages (as these are normally built with debugging information which is then separated). Perhaps only speed-critical mathematical stuff and things like the compiler itself will see -fltoing immediately.

What's new in GCC 4.5?

arekm — Wed, 12 May 2010 17:55:49 +0000

Why these optimizations (like lto) aren't default on?

What's new in GCC 4.5?

eparis123 — Wed, 12 May 2010 16:42:50 +0000

Yes, I misunderstood the context. I misread it as having the desire to put the variable in the 80-bit FPU register for extra precision, instead of the opposite.

What's new in GCC 4.5?

farnz — Wed, 12 May 2010 16:12:27 +0000

Using volatile instead of -ffloat-store forces the selected variable to memory every time it's changed, without forcing all floating point variables to memory on all modification. The goal is to avoid the compiler caching floating point numbers in registers; I don't understand why you think this is the opposite to the standard's use.

What's new in GCC 4.5?

foom — Wed, 12 May 2010 15:45:55 +0000

That's what it means in this context, too.

The problem is that the registers are 80bits but the memory is 64bits, and the datatype is defined to be a 64bit floating point value. By using volatile, you tell the compiler to always write the data back to memory instead of caching it in the larger register, thus ensuring the calculation is using the expected precision.