Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 12:32 UTC (Tue) by juliank (guest, #45896)
In reply to: Clang builds a working 2.6.36 Kernel by jwakely
Parent article: Clang builds a working 2.6.36 Kernel

> compiles 100 times faster or runs 100 times faster?

Run time. The following code runs much faster when compiled clang++ (0.003s) than when compiled using g++ (1.593s):


#include <iostream>
#include <boost/thread.hpp>

#define len 1000000000L

static void f(unsigned long a, unsigned long b, unsigned long *va)
{
    for (*va = 0; a < b; a++)
        *va += a;
}
    
int main()
{
    unsigned long va = 0;
    boost::thread a(f, 0l,  2* len, &va);
    a.join();
    std::cout << va  << std::endl;
    return 0;
}

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 13:04 UTC (Tue) by juliank (guest, #45896) [Link] (14 responses)

It might be that clang is able to optimize (-O2)

for (*va = 0; a < b; a++)
    *va += a;

*va = (1 + b)*(b/2) - b

- as per Gauss: 1+2+...+N=(1 + N)*(N/2)

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 13:13 UTC (Tue) by juliank (guest, #45896) [Link] (13 responses)

*va = (1 + b)*(b/2) - b

I missed a, and the solution was wrong. The correct way to calculate the sum of all values {x | a <= x < b} is:

*va = (b)*(b-1)/2 - (a)*(a-1)/2

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 13:38 UTC (Tue) by jwakely (subscriber, #60262) [Link] (12 responses)

Or maybe instead of using an identity it just inlines the call through &f (which is quite impressive) then finds that all the values are known at compile-time, so the loop can be completely unrolled. Pretty cool either way.

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:01 UTC (Tue) by juliank (guest, #45896) [Link] (8 responses)

In any case, I simplified it to C and posted it to my blog at http://juliank.wordpress.com/2010/10/26/simple-code-clang-creates-1600x-faster-executable-than-gcc/

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:12 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

You might want to report this to the GCC developers via mailing list or bugzilla.

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:35 UTC (Tue) by jwakely (subscriber, #60262) [Link]

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
thanks for the report

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:32 UTC (Tue) by tzafrir (subscriber, #11501) [Link] (5 responses)

It also becomes way faster in gcc (4.4.5 here) once you remove "__attribute__((noinline))". This looks like a simpler explanation for the simpler code you posted there.

I didn't check it with clang.

And just to provide a working link:

http://juliank.wordpress.com/2010/10/26/simple-code-clang...

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:37 UTC (Tue) by juliank (guest, #45896) [Link] (4 responses)

> It also becomes way faster in gcc (4.4.5 here) once
> you remove "__attribute__((noinline))". This looks
> like a simpler explanation for the simpler code you posted there.

Then gcc calculates the result of f() at compile-time and just has a constant integer in the assembler code. Clang does not appear to do this (there's callq f in clang's assembly)

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 16:29 UTC (Tue) by gmaxwell (guest, #30048) [Link] (3 responses)

"Clang 100x faster than GCC!"

"But.. why did you handicap GCC?"

"Cause if I didn't GCC was much faster!"

(just saying :) )

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 16:31 UTC (Tue) by juliank (guest, #45896) [Link] (1 responses)

> "Clang 100x faster than GCC!"
> "But.. why did you handicap GCC?"
> "Cause if I didn't GCC was much faster!"
> (just saying :) )

GCC 4.5 at -O3 is as fast as clang, although not if you call the function via a pointer. GCC 4.4 has the same slow speed at -O2, -O3, -O4, -O9.

Clang builds a working 2.6.36 Kernel

Posted Oct 27, 2010 9:27 UTC (Wed) by jwakely (subscriber, #60262) [Link]

-O4 and -O9 don't exist, no surprise they aren't any better than -O3

GCC 4.5 has apparently already improved. Are you also comparing with a version of Clang from 18 months ago, when GCC 4.4 was released?

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 21:47 UTC (Tue) by tzafrir (subscriber, #11501) [Link]

To make this "handicap" more realistic, replace the constant with a command-line parameter.

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:19 UTC (Tue) by mjw (subscriber, #16740) [Link] (2 responses)

If so, try -funroll-loops, which isn't automatically done with gcc/g++, because it might create larger object code:

   -funroll-loops
       Unroll loops whose number of iterations can be determined at
       compile time or upon entry to the loop.  -funroll-loops implies
       -frerun-cse-after-loop, -fweb and -frename-registers.  It also
       turns on complete loop peeling (i.e. complete removal of loops with
       small constant number of iterations).  This option makes code
       larger, and may or may not make it run faster.

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 14:38 UTC (Tue) by juliank (guest, #45896) [Link] (1 responses)

> If so, try -funroll-loops, which isn't automatically
> done with gcc/g++, because it might create larger object code:

Brings it down to 0.200s

Clang builds a working 2.6.36 Kernel

Posted Oct 26, 2010 16:32 UTC (Tue) by gmaxwell (guest, #30048) [Link]

I believe that unrolling is enabled by default in GCC if you do a profile guided build.