LWN.net Logo

The problem with branch prediction

The problem with branch prediction

Posted May 24, 2011 21:43 UTC (Tue) by davecb (subscriber, #1574)
Parent article: The problem with prefetch

I saw a similar issue with prefetch and branch prediction back when I was doing a lot of SPARC work.

Branch prediction gave us a bit of extra performance with a few code bases, but the older and better the code, they less we saw. My favorite example is Samba, so a Smarter Colleague[tm] and I looked at what was actually happening. Turns out most branches were around either short runs of legitimately conditional code or debug macros. In those cases it didn't matter if we set the prediction to correctly predict we'd branch around. The branch was very often far enough we hit a different i-cache line. Since we didn't have a way of hinting what line we'd hit, we'd slow down whenever it wasn't trivial-to-predict straight-line code.

The better and older the code, the less we would get the next i-cache line sitting waiting for us, and the slower we'd run. Grungy straight-line FORTRAN benefited fine.

I don't recollect ever seeing an actual slowdown, but we rarely could see the predicted benefits from branch prediction.

I'd venture as much as a five cent bet we'll see the same with the intel architecture.

--dave


(Log in to post comments)

I think it's about some particular case of branch prediction...

Posted May 25, 2011 6:10 UTC (Wed) by khim (subscriber, #9252) [Link]

To fill the pipeline on contemporary CPU you need 30-50 instructions in flight. Without branch prediction it's just impossible to do. If you disable branch prediction on contemporary CPU the slowdown is crippling. Sadly only Intel engineers can give you numbers (because there are no way to disable it on retail CPU) - and they are not telling.

I think it's about some particular case of branch prediction...

Posted May 25, 2011 10:58 UTC (Wed) by mingo (subscriber, #31122) [Link]

It's relatively easy to measure the cost of branch misses in certain cases, such as using 'perf stat --repeat N' (the branch miss rate will be measured by default) and a testcase that uses a pseudo-RNG so it can run the same workload random and non-random and comparing the two.

And yes, missing branches is crippling to performance: a 3% branch miss rate can cause a 5% total execution slowdown and a 20% percent miss rate can already double the runtime of a workload. (!)

The problem with outguessing the CPU

Posted May 25, 2011 7:16 UTC (Wed) by alex (subscriber, #1355) [Link]

I have seen prefetch help on some architectures. When doing DBT stuff we would often look for places we could arrange the code to make it as efficient as possible. In the case of Itanium prefetch was a definite win as the architecture was structured to leave the hard stuff to the compiler. On x86 our experiments with instruction re-ordering and prefetch generally didn't yield much at all. The main difference being the x86 expends an awful lot of silicon in logic that attempts to predict all this behaviour for you. It's pretty good at it's job as well given how hard it was for us to squeeze extra out despite having a much better view of how the code was running than a compiler usually has.

The problem with branch prediction

Posted May 29, 2011 21:50 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

Turns out most branches were around either short runs of legitimately conditional code or debug macros. In those cases it didn't matter if we set the prediction to correctly predict we'd branch around.

Why not? I can see there might not be any prefetching advantage because you're branching to something that is already in cache, but you can still do a lot of other execution of the instructions while still working on a prior one.

The branch was very often far enough we hit a different i-cache line. Since we didn't have a way of hinting what line we'd hit, ...

The line you'd hit is completely determined by the target in the branch instruction, isn't it?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds