One important missing piece here is performance. Not that it may be slower than gcc build but how much slower is the question.
Also
"Depending on how the configuration is tweaked/the
SMP support of the kernel that I'm building on top of, Clang builds Linux in about 13-15 minutes."
It will be interesting to see if clang builds faster than gcc with similar level of optimizations enabled.
Posted Oct 26, 2010 6:31 UTC (Tue) by wash (guest, #70825)
[Link]
As far as compiling, as my original post states, the clang-compiled kernel self hosts. Before I had SMP working, it took about 17ish minutes to build (the result of that build was a kernel with SMP). I'm using a self-hosted Linux kernel that was built with Clang right now. I don't have hard numbers, but compiling doesn't seem to be too much slower. This isn't an optimized kernel build, though.
I've been unable to find a usable kernel test suite. I've tried a few, and been disappointed by all of them, most of all by the Linux Testing Project.
The best test suite I've found so far is a debian package called posixtestsuite. It's a few years old, and looks to have been written by a bunch of Intel hackers. It has a number of desirable features, such as the ability to be compiled without hair pulling, and documentation.
I'm open to suggestions regarding broader performance testing solutions.
Clang builds a working 2.6.36 Kernel
Posted Oct 27, 2010 11:27 UTC (Wed) by i3839 (guest, #31386)
[Link]
A good start would be to post the kernel sizes, both the optimized for size and for speed builts, and compare them to the GCC ones.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 11:11 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> One important missing piece here is performance. Not that it
> may be slower than gcc build but how much slower is the question.
I have a fairly normal calculation program that, compiled using clang, is much (100 times IIRC) faster than those compiled using gcc.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 11:59 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
compiles 100 times faster or runs 100 times faster? the latter would be more surprising than the former, though either would be an impressive result
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 12:32 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> compiles 100 times faster or runs 100 times faster?
Run time. The following code runs much faster when compiled clang++ (0.003s) than when compiled using g++ (1.593s):
#include <iostream>
#include <boost/thread.hpp>
#define len 1000000000L
static void f(unsigned long a, unsigned long b, unsigned long *va)
{
for (*va = 0; a < b; a++)
*va += a;
}
int main()
{
unsigned long va = 0;
boost::thread a(f, 0l, 2* len, &va);
a.join();
std::cout << va << std::endl;
return 0;
}
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:04 UTC (Tue) by juliank (subscriber, #45896)
[Link]
It might be that clang is able to optimize (-O2)
for (*va = 0; a < b; a++)
*va += a;
to
*va = (1 + b)*(b/2) - b
- as per Gauss: 1+2+...+N=(1 + N)*(N/2)
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:13 UTC (Tue) by juliank (subscriber, #45896)
[Link]
In
*va = (1 + b)*(b/2) - b
I missed a, and the solution was wrong. The correct way to calculate the sum of all values {x | a <= x < b} is:
*va = (b)*(b-1)/2 - (a)*(a-1)/2
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 13:38 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
Or maybe instead of using an identity it just inlines the call through &f (which is quite impressive) then finds that all the values are known at compile-time, so the loop can be completely unrolled. Pretty cool either way.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:01 UTC (Tue) by juliank (subscriber, #45896)
[Link]
In any case, I simplified it to C and posted it to my blog at http://juliank.wordpress.com/2010/10/26/simple-code-clang-creates-1600x-faster-executable-than-gcc/
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:12 UTC (Tue) by rahulsundaram (subscriber, #21946)
[Link]
You might want to report this to the GCC developers via mailing list or bugzilla.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:35 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
Posted Oct 26, 2010 14:32 UTC (Tue) by tzafrir (subscriber, #11501)
[Link]
It also becomes way faster in gcc (4.4.5 here) once you remove "__attribute__((noinline))". This looks like a simpler explanation for the simpler code you posted there.
Posted Oct 26, 2010 14:37 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> It also becomes way faster in gcc (4.4.5 here) once
> you remove "__attribute__((noinline))". This looks
> like a simpler explanation for the simpler code you posted there.
Then gcc calculates the result of f() at compile-time and just has a constant integer in the assembler code. Clang does not appear to do this (there's callq f in clang's assembly)
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:29 UTC (Tue) by gmaxwell (subscriber, #30048)
[Link]
"Clang 100x faster than GCC!"
"But.. why did you handicap GCC?"
"Cause if I didn't GCC was much faster!"
(just saying :) )
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:31 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> "Clang 100x faster than GCC!"
> "But.. why did you handicap GCC?"
> "Cause if I didn't GCC was much faster!"
> (just saying :) )
GCC 4.5 at -O3 is as fast as clang, although not if you call the function via a pointer. GCC 4.4 has the same slow speed at -O2, -O3, -O4, -O9.
Clang builds a working 2.6.36 Kernel
Posted Oct 27, 2010 9:27 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
-O4 and -O9 don't exist, no surprise they aren't any better than -O3
GCC 4.5 has apparently already improved. Are you also comparing with a version of Clang from 18 months ago, when GCC 4.4 was released?
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 21:47 UTC (Tue) by tzafrir (subscriber, #11501)
[Link]
To make this "handicap" more realistic, replace the constant with a command-line parameter.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:19 UTC (Tue) by mjw (subscriber, #16740)
[Link]
If so, try -funroll-loops, which isn't automatically done with gcc/g++, because it might create larger object code:
-funroll-loops
Unroll loops whose number of iterations can be determined at
compile time or upon entry to the loop. -funroll-loops implies
-frerun-cse-after-loop, -fweb and -frename-registers. It also
turns on complete loop peeling (i.e. complete removal of loops with
small constant number of iterations). This option makes code
larger, and may or may not make it run faster.
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 14:38 UTC (Tue) by juliank (subscriber, #45896)
[Link]
> If so, try -funroll-loops, which isn't automatically
> done with gcc/g++, because it might create larger object code:
Brings it down to 0.200s
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 16:32 UTC (Tue) by gmaxwell (subscriber, #30048)
[Link]
I believe that unrolling is enabled by default in GCC if you do a profile guided build.