Likely unlikely()s

By Jake Edge
December 15, 2010

The kernel has two macros to assist the compiler and CPU in doing branch prediction: likely() and unlikely(). As their names imply, they are meant to annotate tests in the code based on the likelihood that they will evaluate to true. But, getting it wrong such that something marked likely is actually unlikely—or vice versa—can impact performance as the CPU may prefetch the wrong instructions.

Steven Rostedt has been looking at the problem using the annotated branch profiler and found ten places "that really do not need to have an annotated branch, or the annotation is simply the opposite of what it should be". So, he created a series of patches that either switched the sense of the annotation or removed the likely()/unlikely() entirely.

As an example, page_mapping() had an unlikely() annotation that Rostedt reported as being "correct a total of 1909540379 times and incorrect 1270533123 times, with a 39% being incorrect". Those numbers come from his main workstation which runs a variety of standard programs (Firefox, XChat, etc.) as well as participating in his build farm, so it should represent a reasonably "normal" workload. Being wrong 39% of the time was pretty obviously too much and led to the removal of the annotation for that test.

The changes are various subsystems including the scheduler, memory management, and VFS. So far, there have been no complaints, though there have been several requests to completely remove annotations that had just been changed in order to allow the compiler's and CPU's branch prediction logic make the decision. That, and breaking the patches up into separate sets for each subsystem, caused Rostedt to respin them. It would seem likely() that we'll see them make their way into 2.6.38.

Index entries for this article
Kernel	likely()

Likely unlikely()s

Posted Dec 16, 2010 4:53 UTC (Thu) by JoeBuck (subscriber, #2330) [Link] (1 responses)

I recall that there was once a compiler that implemented the user-specified branch likelihood hint backwards and it was at least a year before anyone noticed, but this may be an apocryphal story. Does anyone remember?

Likely unlikely()s

Posted Dec 16, 2010 14:45 UTC (Thu) by Yorick (guest, #19241) [Link]

I have heard that story been told in conjunction with the FORTRAN's original FREQUENCY statement, which allegely was dropped because of this (and because humans turned out to be bad at estimating branch frequencies in general). Anyway, I cannot seem to find a source for this either.

Likely unlikely()s

Posted Dec 16, 2010 5:09 UTC (Thu) by jzbiciak (guest, #5246) [Link] (11 responses)

Being wrong 39% of the time was pretty obviously too much and led to the removal of the annotation for that test.

What that's saying is that a 60:40 ratio is too close. So what is an appropriate threshold?

It seems like any notable bias in a particular direction should be reason enough. The main key is whether the bias is stable over a range of reasonable workloads. (Note I say "reasonable". If you have likely(doesn't segfault) and then run a workload full of segfaults, well that's just silly.)

Consider a simplified model: You have two costs, k_correct and k_incorrect. If you predict a branch is taken and the branch is in fact taken, then you get charged k_correct, otherwise you get charged k_incorrect. And, this whole exercise is only worthwhile if k_correct < k_incorrect. You also have two probabilities: p_correct is the probability that your prediction matches what the computer did, and (1 - p_correct) is the probability it didn't. In this model, then, overall performance should approach k_correct * p_correct + k_incorrect * (1 - p_correct) if you annotated your branch correctly. In the 60:40 example from the article, p_correct would be 0.6.

In this model, you end up with a few possibilities:

You annotated your branch correctly, and performance = k_correct * p_correct + k_incorrect * (1 - p_correct)
You annotated your branch incorrectly, effectively switching k_correct and k_incorrect. performance = k_incorrect * p_correct + k_correct * (1 - p_correct)
You left it up to the compiler to figure it out. Whether you get #1 or #2 above depends on how good/lucky your compiler is.

Now you're left weighing the quality of the compiler against your ability to get the annotation right and your desire to maintain that annotation. And that's just with this simplified model. In a more complex model, the probability a branch gets taken can vary on other factors, some of which may be discoverable by the CPU's branch prediction logic for example.

Given all that, I can see why you'd want a very clear signal for 'likely'/'unlikely'. But still, how clear?

Likely unlikely()s

Posted Dec 16, 2010 6:51 UTC (Thu) by cras (guest, #7000) [Link] (6 responses)

I was once adding likely()s and unlikely()s to heavily CPU-intensive code. I remember that adding unlikely() to a branch that was rarely executed (IIRC only a few % of calls) was slower than simply not giving any hint. Nowadays I add them only to branches where getting the branch prediction wrong just doesn't matter (error handling mostly).

Likely unlikely()s

Posted Dec 16, 2010 13:45 UTC (Thu) by dmk (guest, #50141) [Link] (3 responses)

wouldn't it be worthwhile sometimes, to annotate a codepath with likely(), even if it wasn't true in reality, just to make that one codepath fast? For example if we are in that one codepath that decides if the world continues or not, we would want to get the decision really fast, if the world continues, else we don't care?

Likely unlikely()s

Posted Dec 16, 2010 13:58 UTC (Thu) by cras (guest, #7000) [Link] (2 responses)

Well, if the code was like:

while (likely(!is_world_ending()) {
update_world_status();
}
save_the_world();

And the branch prediction was wrong pretty much always, that would also increase the amount of time used to update world status. So even when it finally sees the world ending and jumps immediately to saving the world, perhaps it could have done that in the previous iteration already if the inner loop had been faster.

Or something like that :)

Likely unlikely()s

Posted Dec 18, 2010 20:54 UTC (Sat) by giraffedata (guest, #1954) [Link] (1 responses)

dmk's basic idea seems valuable, but the example is bad. And cras seems to have misread that example. We need a better example.

The problem with what dmk writes ("we would want to get the decision really fast") is that likely() doesn't affect how fast the test is. It affects how fast what you do as a result of that test is. If the test is world_continues and you want to respond quickly when it does, then the truth is the right way to go: world_continues is likely, and that's how you want to annotate it.

I believe cras assumes we want to respond quickly if the world is ending, because it isn't really ending -- we can save it. That's not what dmk described: he said the world really is ending, so all we can do is say good bye, and it just won't make any difference how slowly we do that.

Likely unlikely()s

Posted Dec 27, 2010 10:54 UTC (Mon) by tesarik (subscriber, #52705) [Link]

Well, let's give a better example of this. Consider a spinlock slow
path (ie. on contention):

while(unlikely(is_locked(lock)))
cpu_relax();

Although it's likely that the lock is held many times before we can take it,
it's more important to run the critical section as fast as possible when it
finally gets available.

Likely unlikely()s

Posted Dec 16, 2010 19:44 UTC (Thu) by vonbrand (subscriber, #4458) [Link] (1 responses)

Sounds nonsensical to me... if the annotated code is not in any way performance critical, doing the extra work of annotating it is pure waste.

Likely unlikely()s

Posted Dec 16, 2010 20:21 UTC (Thu) by cras (guest, #7000) [Link]

Of course. But if there are some heavily used macros and functions where it's easy to add them once and then get them used throughout the code, it's not much extra trouble even if the benefit is small.

Although now that I think about it, maybe many of those checks I used to put unlikely() in my code are nowadays pointless, because they call functions that I've since marked with __attribute__((cold)), which should put the unlikely()s there automatically. (But do they? I remember testing that the cold attribute did nothing in some gcc version.)

Likely unlikely()s

Posted Dec 19, 2010 3:39 UTC (Sun) by quotemstr (subscriber, #45331) [Link] (3 responses)

You're assuming that p_correct and p_incorrect are constant over time, and they're often not. CPUs these days have large global branch prediction buffers and can detect fairly complex patterns (e.g., branch every third time) automatically. The overall result of this dynamic prediction may be superior to a purely static assertion, even if that assertion is correct on average.

Think of a branch that goes one way during the day and another way at night. If we suppose that activity is twice as high during the day, then we'll see a 2:1 branch bias in favor of the daylight path. But over smaller intervals of time (say, an hour), the branch bias will be almost totally in favor of one path or the other. Unlike static assertions, the CPU's automatic prediction can look at the specific circumstances of the system and make predictions based on recent behavior and specific circumstances. Using static branch prediction could actually hurt.

Likely unlikely()s

Posted Dec 20, 2010 20:35 UTC (Mon) by jzbiciak (guest, #5246) [Link] (2 responses)

I did state my model was simplified, and did allow for the fact that branch prediction could change the cost over time and that that wasn't included in my model.

My understanding is that static prediction mainly involves either putting a hint on a branch opcode (on architectures that support it) and/or changing which path is the "fall through" path versus the "branch taken" path, overlaid with a rule such as "predict backward branches as taken; forward branches not-taken". In the presence of dynamic prediction, a branch hint should be just that: a hint. A strong prediction signal from the predictor ought to override a hint.

The main purpose of branch hinting is to help the cold-predictor case. This article describes one such scheme. Basically, if your branch isn't in the Branch History Table (BHT), then it falls back to the static prediction. This puts an upper bound on how much damage the static prediction can do, as well as an upper bound on how much it can help.

For static branch prediction to hurt, you'd need to have a large enough working set that the branch in question isn't in the BHT (to use the P4's scheme above), but still hit often enough to make a measurable performance difference.

For your bimodal case (day/night shifting patterns), which ever way you schedule your code, you'll penalize one or the other. If you have accurate long-term branch statistics, though, you can still pick the one that occurs less often and penalize that. So, if daytime has twice the load as nighttime, it should show up in the statistics and you can put likely/unlikely on the conditional to favor daytime.

Really, the argument that makes the most sense to me against likely/unlikely in marginal cases is that different users could have completely different workloads with wildly different statistics (multimodal across all users, even though each user might be unimodal), and so there's no good training set, so just let the compiler guess. There's no way to avoid penalizing someone, so let the compiler do it. Only put likely/unlikely where it's clear that it's right across the vast majority of workloads. Otherwise, there's a good chance it could be wrong for more than half the users out there, and it's also not worth the maintenance burden.

So, that leaves the question: What's the criteria for figuring that out? Taken vs. not-taken ratio seems like a shaky heuristic to me.

Likely unlikely()s

Posted Dec 21, 2010 13:58 UTC (Tue) by vonbrand (subscriber, #4458) [Link] (1 responses)

It has been observed that programmers are terrible at guessing where their programs spend most of their time... and that should be much easier than predicting the frequency of some critical branch going one way or the other.

Best leave it alone, or only annotate when extensive measurements show a high bias (and the compiler gets it wrong).

Likely unlikely()s

Posted Dec 27, 2010 11:00 UTC (Mon) by tesarik (subscriber, #52705) [Link]

*the* compiler? which compiler?

Surely, gcc heuristic decisions change with the version...