User: Password:
|
|
Subscribe / Log in / New account

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 25, 2013 10:12 UTC (Mon) by jezuch (subscriber, #52988)
Parent article: Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

> "Because the SPEC CPU benchmarks are drawn from the compute intensive portion of real applications"

Well...

> for (dd=d[k=0]; k<16; dd=d[++k])

That's.... horrifying.


(Log in to post comments)

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 25, 2013 15:43 UTC (Mon) by hthoma (subscriber, #4743) [Link]

> > for (dd=d[k=0]; k<16; dd=d[++k])

> That's.... horrifying.

Exactly. Looks like obfuscated C contest stuff ...

I would guess that if you write it the "normal" way, i.e.

for(k = 0; k < 16; k++) {
dd = d[k];
satd += (dd < 0 ? -dd : dd);
}

the compiler would not optimize the code to an infinite loop and get a better chance to optimize.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 8:02 UTC (Tue) by jezuch (subscriber, #52988) [Link]

> Exactly. Looks like obfuscated C contest stuff ...

I guess it's a result of a *very* popular misconception that the more you cram into a single statement the faster it is ;)

Seeing how the compiler unwinds all of this stuff is an eye-opening experience. We, humans, have a very limited operating memory; the compiler can analyze much, much larger structures than we imagine it can.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 26, 2013 9:44 UTC (Tue) by khim (subscriber, #9252) [Link]

It may not run faster, but it certainly is shorter and a lot of guys (including me) always try to make code shorter.

It's funny, really: it looks like I've finally found where these clashes come from. Less them two years ago I had no idea and struggled to understand, but now, after a lot of discussions with other guys on an important piece of code in our project, I know that there are two types of programmers: the ones who think about their program in C (C++, C#, Java, JavaScript (uh-oh), PHP (ugh), Python, etc) and the ones who think about their program in English (Hebrew, Mandarin, Russian, whatever).

For the "C thinkers" size of the code is very important (the shorter it is the easier to observe large chunks of code at once) and most comments are just useless distraction (and/or admission of defeat: what, you mean this piece of code is so convoluted and cryptic that you can't understand it just from a C code... gosh I think it's time to give up and add couple of comments). Sure, high-level interface must be described in human language (C is great for low-level bit manipulations, but for description of relationship between HTML document and DOM tree, created from said document it's too low-level), but everything below it must be understandable from the code.

For "English thinkers" comments are vital piece of the information: they expect to fully understand the program from comments alone and perceive the need to actually read C code as something degrading (or as necessary evil when something does not work). Even if they read C code they usually just compare it to the comment near it (and they become angry when they found no comments to compare the code to). For them size of code is less important (because they only ever perceive it in small pieces) and verbose style is, actually, better (it makes it easier to compare code to comments).

I'm not sure which style is better, but I found that C thinkers usually produce fast and efficient code which may contain small, localized bugs (the code in article is prime example) while English thinkers produce code which is verbose and slow yet still contain plethora of bugs - but these bugs are distinctly different: instead of off-by-one errors or simple "++" vs "--" mixup we have cases where one module produces subtly broken object which is mishandled by another module and then everything blows up in a third one.

Easy to understand why: there are no "safety net" in C thinkers code thus localized bugs are easy to miss, but interfaces are very narrow and well-defined while English thinkers produce the code which is locally correct but globally they are hopeless because there are so many interactions between different pieces of code. Think XBox or Wii bootloader code (few bugs in the initial runs which were eventually ironed out and now there are no new bugs in sight) vs JVM code (there are endless bugs without the end in sight - and most of them are because different pieces of code interact "quirckly").

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 13:32 UTC (Wed) by nye (guest, #51576) [Link]

Oh please. Take your macho trolling somewhere else.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 15:16 UTC (Wed) by redden0t8 (guest, #72783) [Link]

Interesting observation, it really made me think.

I'm definitely a C-thinker, but over-shortening code (like in this article's example) really makes me cringe. I really don't understand the drive to make pieces of code as short as possible. I'm more along the lines of "clear and concise", with "clear" being more important than "concise". I guess you could think of it as writing code so as to optimize the time it takes to read and follow, rather than optimizing the line count.

Then again maybe this comes from being a hobbyist programmer and not a professional... maybe I just have a different definition of "clear" lol.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 17:43 UTC (Wed) by dlang (subscriber, #313) [Link]

The balance between clear and concise will vary depending on how familiar you are with the language in question.

the obfuscated C contest shows clear examples where concise is far more important than clear.

But the line itself if rather fuzzy.

hijacking an example from elsewhere. If you have a bucket filled with water and start punching holes in the bottom, when does it stop being a bucket that leaks and start being a sieve? At some point it will be very clear that you have passed the line, but exactly where the line is is hard to define.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Mar 27, 2013 19:17 UTC (Wed) by HelloWorld (guest, #56129) [Link]

> It may not run faster, but it certainly is shorter
Uh, no it's not. The obfuscated version is 25 tokens, the sensible one is 22 tokens. Sure, if you use conventional formatting, you'll end up with one more line for the normal version, but nobody says you have to do that...

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 1, 2013 14:51 UTC (Mon) by khim (subscriber, #9252) [Link]

Screen real estate is not measured in tokens. Inches, centimeters, may be pixels, but most definitely not tokens. But this measure it's shorter. Is it worth it? That's debatable and depends very much on the individual, but of course it's separate issue.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 1, 2013 16:19 UTC (Mon) by hummassa (subscriber, #307) [Link]

isn's the "proper" c++ version

for(auto x: dd)
satd += abs(x);

?? (generates the same code, no errors and READABLE...)

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 1, 2013 16:57 UTC (Mon) by hummassa (subscriber, #307) [Link]

Actually, "properest" version would be

auto satd = accumulate(begin(d), end(d), 0, [](int a, int x) { return a+abs(x); });

But I suppose that has the potential to be less efficient, at least it involves some function calls here...

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 2, 2013 17:30 UTC (Tue) by hummassa (subscriber, #307) [Link]

I am answering to my own comments in this subthread, and it really feels like I am losing my mind... :-D
anyway, I tried this with -O4 and both

for(auto x: d) satd += abs(x);

and

auto satd = accumulate(begin(d), end(d), 0, [](int a, int x) { return a+abs(x); });

generated the same code, roughly:

movl	(%rax), %edx
movl	(%rax), %ecx
addq	$4, %rax    
sarl	$31, %edx   
xorl	%edx, %ecx  
subl	%edx, %ecx  
leaq	64(%rsp), %r
addl	%ecx, %ebx  
cmpq	%rax, %rdx  
jne	.L3	#,

which seemed nice to me.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 2, 2013 19:17 UTC (Tue) by jwakely (guest, #60262) [Link]

You know GCC doesn't have a -O4 optimisation level, right? ;)

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 2, 2013 20:08 UTC (Tue) by nix (subscriber, #2304) [Link]

I've seen people use -O4, -O6, -O64 ("it's a nice round number and higher than 3" he said, so at least he knew what he was aiming for), and of course glibc, of all things, used -O99 for donkey's years. GCC obviously needs an "-Olots" for these people.

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 2, 2013 22:56 UTC (Tue) by jwakely (guest, #60262) [Link]

It should be called -Over9000 though

Regehr: GCC 4.8 Breaks Broken SPEC 2006 Benchmarks

Posted Apr 2, 2013 21:18 UTC (Tue) by hummassa (subscriber, #307) [Link]

It's on my default makefile since forever, because clang does not like -O5 and beyond and I am too lazy to look up which is the biggest effective level for each compiler...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds