LWN.net Logo

"Dependent on" ???

"Dependent on" ???

Posted Feb 28, 2007 12:19 UTC (Wed) by k8to (subscriber, #15413)
In reply to: "Dependent on" ??? by nix
Parent article: Mitchell Baker and the Firefox Paradox (Inc)

Thanks for all that, a fascinating read.

Here is an unfair nitpick:

> You probably won't see huge improvements in speed on targets like
> i386 without someone rewriting the register allocator and reload
> pass, probably the most hairy thing left and a creature of a
> different era.

Indeed, I'm busy noticing how much _slower_ gcc has gotten in generated-code speed on i386 over the last few cycles. I'm definitely behind the times here, as I bothered to try doing a bunch of cross-release cross-feature timings for various performance sensitive code recently as I got a new amd64 box. The surprise was gcc 3.x generated code was much faster (5-20%) across the board. Not what I was expecting at all.


(Log in to post comments)

"Dependent on" ???

Posted Feb 28, 2007 14:28 UTC (Wed) by nix (subscriber, #2304) [Link]

I'd expect this with -O3, as inlining tends to increase register pressure.

I've noticed that quite a few of the recent optimizations tend to do this on register-poor targets, actually; the problem seems to be that the optimizers have got no way of telling if they're causing excessive stack spills because allocation of non-pseudo-registers doesn't happen until right up until the end (and that happens to RTL, far lower-level than the GIMPLE that most optimizers chew on). This is also waiting on some heroic figure rewriting the register allocator so that earlier passes can get decent feedback on whether they're going to spill to hell and back or not (and then the earlier passes would have to get updated to use this information...)

(A good few passes are turned off at -O<3 for this reason.)

There's still a lot to do: GCC is far from perfect: a lot of the problems it tries to solve (like, well, register allocation) are NP-complete anyway, but it could definitely do a better job than it does.

But it hasn't been stagnating.

(Again, I'm just an observer; if I'm spraying ignorant rubbish someone who does actual useful work like Joe Buck should point it out. :) )

"Dependent on" ???

Posted Feb 28, 2007 15:41 UTC (Wed) by k8to (subscriber, #15413) [Link]

It seemed also true with -O2 and no O Flags at all. It was also true on amd64 which is not so register poor.

"Dependent on" ???

Posted Feb 28, 2007 17:50 UTC (Wed) by nix (subscriber, #2304) [Link]

With no -O flags at all I'd expect *huge* register pressure problems at all times! With -O2, well, even there the pressure is building up (although the problem is much less severe than with -O3).

I'm surprised to find problems on amd64. Raise a bug, maybe?

"Dependent on" ???

Posted Feb 28, 2007 17:57 UTC (Wed) by k8to (subscriber, #15413) [Link]

I suspect they know. It's getting better across 4.x as a whole. Also I'm really not excited about filing a bug "these 5 big applications that I don't really understand all do much better on gcc 3.x on i386 and amd64. Here's your several hundred thousand line testcase."

If I was the developer of them maybe, or if they were were more managably sized, or if I hadn't read through other bug entries where it's discussed that there are (supposedly) no automated performance regression tests at all.

I'm certainly not really intending my observations as complaints, it's just a matter of fact observation. I wish it wasn't so but it is, and I suspect half-assed bugs will only cost the project.

"Dependent on" ???

Posted Feb 28, 2007 22:13 UTC (Wed) by massimiliano (subscriber, #3048) [Link]

Want to do a big favor to the gcc developers?

Profile those applications, and find the hot spots that get significantly worse. Then, write small benchmarks with the same code, and test them to see that the slowdown is still there.

Finally, open the bug with the small benchmarks :-)

And if you do it, remember to send the samples to me as well (massi(at)ximan"dot"com)! I work on the Mono JIT, and am generally interested in performance tests...

"Dependent on" ???

Posted Feb 28, 2007 23:20 UTC (Wed) by nix (subscriber, #2304) [Link]

Hell, if they're free software at least say what they are so someone else
with more machine time than sense and heaps of old compiler versions
scattered around can do the profiling :)

"Dependent on" ???

Posted Mar 1, 2007 12:09 UTC (Thu) by k8to (subscriber, #15413) [Link]

Off the top, the n-queens "benchmark" shows a significant drop from 3.x to 4.x. http://www.arch.cs.titech.ac.jp/~kise/nq/index.htm

A larger one I care more about is UAE, the amiga emulator, without JIT. Both mainline and E-UAE from rcdrummond.net

"Dependent on" ???

Posted Mar 9, 2007 19:49 UTC (Fri) by anton (guest, #25547) [Link]

Want to do a big favor to the gcc developers?

Profile those applications, and find the hot spots that get significantly worse. Then, write small benchmarks with the same code, and test them to see that the slowdown is still there.

Finally, open the bug with the small benchmarks :-)

And see it closed as invalid after an hour (less time than it took to do create the bug report).

Given this reaction, I don't think the gcc developers consider such bug reports as favors, and it certainly has not inspired me to report other gcc bugs.

"Dependent on" ???

Posted Mar 9, 2007 20:52 UTC (Fri) by massimiliano (subscriber, #3048) [Link]

And see it closed as invalid after an hour (less time than it took to do create the bug report).

Interesting discussion in that bug :-)

Anyway, I wrote my suggestion without knowing how gcc development actually works... I would welcome bugs like that opened for the Mono JIT!
At worst, I would mark it as "Wishlist", and keep it open that way...

"Dependent on" ???

Posted Mar 10, 2007 1:42 UTC (Sat) by nix (subscriber, #2304) [Link]

It seems that fixing this would be ridiculously difficult and not terribly
beneficial (how many programs have you seen that use computed goto? How
much effort is it worth going to to speed it up?)

I'd have kept it open, too, but there is so much low-hanging optimization
fruit in GCC that fixing fairly small one-platform optimization bugs in
rarely-used language extensions isn't going to get much attention at the
best of times.

GCC development actually works in much the same way everything else works
in the free software community: if you have a performance bug in a tiny
obscure feature it's probably not going to get fixed unless you fix it. A
noninvasive patch would probably have made it...

"Dependent on" ???

Posted Apr 7, 2007 20:29 UTC (Sat) by anton (guest, #25547) [Link]

>how many programs have you seen that use computed goto? How
>much effort is it worth going to to speed it up?

Many interpreters are using labels-as-values for a significant
speedup. And a huge number of programs use interpreters.

>one-platform optimization bugs

As far as I understand the reply to the bug report, this bug affects
every platform that does not have a conditional indirect branch, i.e.,
pretty much every platform except PPC and IA64.

>A noninvasive patch would probably have made it...

They consider the bug "invalid", so they probably would not have
accepted the patch, and I am glad that I did not waste my time on
trying to build one.

"Dependent on" ???

Posted Feb 28, 2007 22:30 UTC (Wed) by massimiliano (subscriber, #3048) [Link]

This is also waiting on some heroic figure rewriting the register allocator so that earlier passes can get decent feedback on whether they're going to spill to hell and back or not (and then the earlier passes would have to get updated to use this information...)

How funny! This is exactly what I want to do in the Mono JIT :-)
Here's a link to a presentation about our medium-long term JIT plans: "http://www.go-mono.com/meeting06/MonoSummit2006-JIT.pdf".

Actually we will not go fully SSA so fast, and maybe never. But we will rewrite the register allocator, and in the SSA path I will make sure that the communication between the optimization passes and the regalloc will happen effectively!

If you ever want to "chat" about compiler internals, feel free to drop me a mail (massi(at)ximian"dot"com).

"Dependent on" ???

Posted Feb 28, 2007 23:19 UTC (Wed) by nix (subscriber, #2304) [Link]

Interesting stuff. Of course the Mono JIT is operating under more constraints than GCC in some respects (`compilation' must be *fast*) but it's more amenable to rewrites because you don't have to target a myriad decade-old backends rife with unstated assumptions and with widely-varying constraints on (e.g.) the register file.

Zack Weinberg put it well in an IRC conversation (later reprinted in the acknowledgements to _A Maintenance Programmer's View of GCC_ in the 2003 GCC Summit proceedings):

Take an H. R. Giger painting, you know, with the perverse and insanely complicated biomechanical constructs.

Now, instead of being all shiny and new, make it old and overgrown with weeds. Slimy weeds.

Much of GCC is better than that these days (thanks to tree-ssa obsoleting many of the nasty parts, and a lot of effort to clean up the problems Zack identified in that paper), but some parts (notably reload and the rest of the register allocator, and combine) are still deep in the slime.

The first part of Vlad Makarov's _Fighting register pressure in GCC_ (in the 2004 summit proceedings) has a good description of how the current allocator works, and the extreme constraints on changing it. That attempt at rewriting the allocator ran into the slimy weeds and got all tangled up; virtually every subsequent summit proceedings has the skeleton of another attempt in it.

Someday, someone will succeed... you're already talking about moving away from BURG, but one of the earlier rewrite attempts was I think trying to move to it. I'm fairly sure the Mono compiler's register allocator can do a better job at this intractable task than GCC's can...

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds