> One important missing piece here is performance. Not that it
> may be slower than gcc build but how much slower is the question.
I have a fairly normal calculation program that, compiled using clang, is much (100 times IIRC) faster than those compiled using gcc.
Posted Oct 26, 2010 11:59 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
compiles 100 times faster or runs 100 times faster? the latter would be more surprising than the former, though either would be an impressive result
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 12:32 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> compiles 100 times faster or runs 100 times faster?
Run time. The following code runs much faster when compiled clang++ (0.003s) than when compiled using g++ (1.593s):
#include <iostream>
#include <boost/thread.hpp>
#define len 1000000000L
static void f(unsigned long a, unsigned long b, unsigned long *va)
{
for (*va = 0; a < b; a++)
*va += a;
}
int main()
{
unsigned long va = 0;
boost::thread a(f, 0l, 2* len, &va);
a.join();
std::cout << va << std::endl;
return 0;
}
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:04 UTC (Tue) by juliank (subscriber, #45896)
[Link]
It might be that clang is able to optimize (-O2)
for (*va = 0; a < b; a++)
*va += a;
to
*va = (1 + b)*(b/2) - b
- as per Gauss: 1+2+...+N=(1 + N)*(N/2)
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:13 UTC (Tue) by juliank (subscriber, #45896)
[Link]
In
*va = (1 + b)*(b/2) - b
I missed a, and the solution was wrong. The correct way to calculate the sum of all values {x | a <= x < b} is:
*va = (b)*(b-1)/2 - (a)*(a-1)/2
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:38 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
Or maybe instead of using an identity it just inlines the call through &f (which is quite impressive) then finds that all the values are known at compile-time, so the loop can be completely unrolled. Pretty cool either way.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:01 UTC (Tue) by juliank (subscriber, #45896)
[Link]
In any case, I simplified it to C and posted it to my blog at http://juliank.wordpress.com/2010/10/26/simple-code-clang-creates-1600x-faster-executable-than-gcc/
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:12 UTC (Tue) by rahulsundaram (subscriber, #21946)
[Link]
You might want to report this to the GCC developers via mailing list or bugzilla.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:35 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
Posted Oct 26, 2010 14:32 UTC (Tue) by tzafrir (subscriber, #11501)
[Link]
It also becomes way faster in gcc (4.4.5 here) once you remove "__attribute__((noinline))". This looks like a simpler explanation for the simpler code you posted there.
Posted Oct 26, 2010 14:37 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> It also becomes way faster in gcc (4.4.5 here) once
> you remove "__attribute__((noinline))". This looks
> like a simpler explanation for the simpler code you posted there.
Then gcc calculates the result of f() at compile-time and just has a constant integer in the assembler code. Clang does not appear to do this (there's callq f in clang's assembly)
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:29 UTC (Tue) by gmaxwell (subscriber, #30048)
[Link]
"Clang 100x faster than GCC!"
"But.. why did you handicap GCC?"
"Cause if I didn't GCC was much faster!"
(just saying :) )
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:31 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> "Clang 100x faster than GCC!"
> "But.. why did you handicap GCC?"
> "Cause if I didn't GCC was much faster!"
> (just saying :) )
GCC 4.5 at -O3 is as fast as clang, although not if you call the function via a pointer. GCC 4.4 has the same slow speed at -O2, -O3, -O4, -O9.
Clang builds a working 2.6.36 Kernel
Posted Oct 27, 2010 9:27 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
-O4 and -O9 don't exist, no surprise they aren't any better than -O3
GCC 4.5 has apparently already improved. Are you also comparing with a version of Clang from 18 months ago, when GCC 4.4 was released?
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 21:47 UTC (Tue) by tzafrir (subscriber, #11501)
[Link]
To make this "handicap" more realistic, replace the constant with a command-line parameter.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:19 UTC (Tue) by mjw (subscriber, #16740)
[Link]
If so, try -funroll-loops, which isn't automatically done with gcc/g++, because it might create larger object code:
-funroll-loops
Unroll loops whose number of iterations can be determined at
compile time or upon entry to the loop. -funroll-loops implies
-frerun-cse-after-loop, -fweb and -frename-registers. It also
turns on complete loop peeling (i.e. complete removal of loops with
small constant number of iterations). This option makes code
larger, and may or may not make it run faster.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:38 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> If so, try -funroll-loops, which isn't automatically
> done with gcc/g++, because it might create larger object code:
Brings it down to 0.200s
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:32 UTC (Tue) by gmaxwell (subscriber, #30048)
[Link]
I believe that unrolling is enabled by default in GCC if you do a profile guided build.