Clang builds a working 2.6.36 Kernel
Clang builds a working 2.6.36 Kernel
Posted Oct 26, 2010 12:32 UTC (Tue) by juliank (guest, #45896)In reply to: Clang builds a working 2.6.36 Kernel by jwakely
Parent article: Clang builds a working 2.6.36 Kernel
> compiles 100 times faster or runs 100 times faster?
Run time. The following code runs much faster when compiled clang++ (0.003s) than when compiled using g++ (1.593s):
#include <iostream>
#include <boost/thread.hpp>
#define len 1000000000L
static void f(unsigned long a, unsigned long b, unsigned long *va)
{
    for (*va = 0; a < b; a++)
        *va += a;
}
    
int main()
{
    unsigned long va = 0;
    boost::thread a(f, 0l,  2* len, &va);
    a.join();
    std::cout << va  << std::endl;
    return 0;
}
        
      Posted Oct 26, 2010 13:04 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (14 responses)
       It might be that clang is able to optimize (-O2) to - as per Gauss:  
     
    
      Posted Oct 26, 2010 13:13 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (13 responses)
       I missed a, and the solution was wrong. The correct way to calculate the sum of all values {x | a <= x < b} is: 
     
    
      Posted Oct 26, 2010 13:38 UTC (Tue)
                               by jwakely (subscriber, #60262)
                              [Link] (12 responses)
       
     
    
      Posted Oct 26, 2010 14:01 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (8 responses)
       
     
    
      Posted Oct 26, 2010 14:12 UTC (Tue)
                               by rahulsundaram (subscriber, #21946)
                              [Link] (1 responses)
       
     
    
      Posted Oct 26, 2010 14:35 UTC (Tue)
                               by jwakely (subscriber, #60262)
                              [Link] 
       
     
      Posted Oct 26, 2010 14:32 UTC (Tue)
                               by tzafrir (subscriber, #11501)
                              [Link] (5 responses)
       
I didn't check it with clang. 
And just to provide a working link: 
http://juliank.wordpress.com/2010/10/26/simple-code-clang...  
     
    
      Posted Oct 26, 2010 14:37 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (4 responses)
       
Then gcc calculates the result of f() at compile-time and just has a constant integer in the assembler code. Clang does not appear to do this (there's callq f in clang's assembly) 
     
    
      Posted Oct 26, 2010 16:29 UTC (Tue)
                               by gmaxwell (guest, #30048)
                              [Link] (3 responses)
       
"But.. why did you handicap GCC?" 
"Cause if I didn't GCC was much faster!" 
(just saying
 :) ) 
     
    
      Posted Oct 26, 2010 16:31 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (1 responses)
       
GCC 4.5 at -O3 is as fast as clang, although not if you call the function via a pointer. GCC 4.4 has the same slow speed at -O2, -O3, -O4, -O9. 
     
    
      Posted Oct 27, 2010 9:27 UTC (Wed)
                               by jwakely (subscriber, #60262)
                              [Link] 
       
GCC 4.5 has apparently already improved.  Are you also comparing with a version of Clang from 18 months ago, when GCC 4.4 was released? 
 
     
      Posted Oct 26, 2010 21:47 UTC (Tue)
                               by tzafrir (subscriber, #11501)
                              [Link] 
       
     
      Posted Oct 26, 2010 14:19 UTC (Tue)
                               by mjw (subscriber, #16740)
                              [Link] (2 responses)
       
     
    
      Posted Oct 26, 2010 14:38 UTC (Tue)
                               by juliank (guest, #45896)
                              [Link] (1 responses)
       
Brings it down to 0.200s 
     
    
      Posted Oct 26, 2010 16:32 UTC (Tue)
                               by gmaxwell (guest, #30048)
                              [Link] 
       
 
     
    Clang builds a working 2.6.36 Kernel
      for (*va = 0; a < b; a++)
    *va += a;*va = (1 + b)*(b/2) - b1+2+...+N=(1 + N)*(N/2)
      In 
Clang builds a working 2.6.36 Kernel
      *va = (1 + b)*(b/2) - b*va = (b)*(b-1)/2 - (a)*(a-1)/2Clang builds a working 2.6.36 Kernel
      
      In any case, I simplified it to C and posted it to my blog at http://juliank.wordpress.com/2010/10/26/simple-code-clang-creates-1600x-faster-executable-than-gcc/
      
          Clang builds a working 2.6.36 Kernel
      Clang builds a working 2.6.36 Kernel
      
Clang builds a working 2.6.36 Kernel
      Clang builds a working 2.6.36 Kernel
      
Clang builds a working 2.6.36 Kernel
      
> you remove "__attribute__((noinline))". This looks
> like a simpler explanation for the simpler code you posted there.
Clang builds a working 2.6.36 Kernel
      
Clang builds a working 2.6.36 Kernel
      
> "But.. why did you handicap GCC?"
> "Cause if I didn't GCC was much faster!"
> (just saying
 :) )
Clang builds a working 2.6.36 Kernel
      
Clang builds a working 2.6.36 Kernel
      
      If so, try -funroll-loops, which isn't automatically done with gcc/g++, because it might create larger object code:
Clang builds a working 2.6.36 Kernel
      
   -funroll-loops
       Unroll loops whose number of iterations can be determined at
       compile time or upon entry to the loop.  -funroll-loops implies
       -frerun-cse-after-loop, -fweb and -frename-registers.  It also
       turns on complete loop peeling (i.e. complete removal of loops with
       small constant number of iterations).  This option makes code
       larger, and may or may not make it run faster.
      
          Clang builds a working 2.6.36 Kernel
      
> done with gcc/g++, because it might create larger object code:
Clang builds a working 2.6.36 Kernel
      
 
           