Likely unlikely()s
The kernel has two macros to assist the compiler and CPU in doing branch prediction: likely() and unlikely(). As their names imply, they are meant to annotate tests in the code based on the likelihood that they will evaluate to true. But, getting it wrong such that something marked likely is actually unlikely—or vice versa—can impact performance as the CPU may prefetch the wrong instructions.
Steven Rostedt has been looking
at the problem using the annotated branch profiler and found ten places "that really do not need to have
an annotated branch, or the annotation is simply the opposite of
what it should be
". So, he created a series of patches that either
switched the sense of the annotation or removed the
likely()/unlikely() entirely.
As an example, page_mapping() had an unlikely()
annotation that Rostedt reported as being "correct a total of 1909540379 times and
incorrect 1270533123 times, with a 39% being incorrect
". Those
numbers come from his main workstation which runs a variety of standard
programs (Firefox, XChat, etc.) as well as participating in his build farm,
so it should represent a reasonably "normal" workload. Being wrong 39% of
the time was pretty obviously too much and led
to the removal of the annotation for that test.
The changes are various subsystems including the scheduler, memory management, and VFS. So far, there have been no complaints, though there have been several requests to completely remove annotations that had just been changed in order to allow the compiler's and CPU's branch prediction logic make the decision. That, and breaking the patches up into separate sets for each subsystem, caused Rostedt to respin them. It would seem likely() that we'll see them make their way into 2.6.38.
Index entries for this article | |
---|---|
Kernel | likely() |
Posted Dec 16, 2010 4:53 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
Posted Dec 16, 2010 14:45 UTC (Thu)
by Yorick (guest, #19241)
[Link]
Posted Dec 16, 2010 5:09 UTC (Thu)
by jzbiciak (guest, #5246)
[Link] (11 responses)
What that's saying is that a 60:40 ratio is too close. So what is an appropriate threshold?
It seems like any notable bias in a particular direction should be reason enough. The main key is whether the bias is stable over a range of reasonable workloads. (Note I say "reasonable". If you have likely(doesn't segfault) and then run a workload full of segfaults, well that's just silly.)
Consider a simplified model: You have two costs, kcorrect and kincorrect. If you predict a branch is taken and the branch is in fact taken, then you get charged kcorrect, otherwise you get charged kincorrect. And, this whole exercise is only worthwhile if kcorrect < kincorrect. You also have two probabilities: pcorrect is the probability that your prediction matches what the computer did, and (1 - pcorrect) is the probability it didn't. In this model, then, overall performance should approach kcorrect * pcorrect + kincorrect * (1 - pcorrect) if you annotated your branch correctly. In the 60:40 example from the article, pcorrect would be 0.6. In this model, you end up with a few possibilities: Now you're left weighing the quality of the compiler against your ability to get the annotation right and your desire to maintain that annotation. And that's just with this simplified model. In a more complex model, the probability a branch gets taken can vary on other factors, some of which may be discoverable by the CPU's branch prediction logic for example. Given all that, I can see why you'd want a very clear signal for 'likely'/'unlikely'. But still, how clear?
Posted Dec 16, 2010 6:51 UTC (Thu)
by cras (guest, #7000)
[Link] (6 responses)
Posted Dec 16, 2010 13:45 UTC (Thu)
by dmk (guest, #50141)
[Link] (3 responses)
Posted Dec 16, 2010 13:58 UTC (Thu)
by cras (guest, #7000)
[Link] (2 responses)
while (likely(!is_world_ending()) {
And the branch prediction was wrong pretty much always, that would also increase the amount of time used to update world status. So even when it finally sees the world ending and jumps immediately to saving the world, perhaps it could have done that in the previous iteration already if the inner loop had been faster.
Or something like that :)
Posted Dec 18, 2010 20:54 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (1 responses)
dmk's basic idea seems valuable, but the example is bad. And cras seems to have misread that example. We need a better example.
The problem with what dmk writes ("we would want to get the decision really fast") is that likely() doesn't affect how fast the test is. It affects how fast what you do as a result of that test is. If the test is world_continues and you want to respond quickly when it does, then the truth is the right way to go: world_continues is likely, and that's how you want to annotate it.
I believe cras assumes we want to respond quickly if the world is ending, because it isn't really ending -- we can save it. That's not what dmk described: he said the world really is ending, so all we can do is say good bye, and it just won't make any difference how slowly we do that.
Posted Dec 27, 2010 10:54 UTC (Mon)
by tesarik (subscriber, #52705)
[Link]
while(unlikely(is_locked(lock)))
Although it's likely that the lock is held many times before we can take it,
Posted Dec 16, 2010 19:44 UTC (Thu)
by vonbrand (subscriber, #4458)
[Link] (1 responses)
Sounds nonsensical to me... if the annotated code is not in any way performance critical, doing the extra work of annotating it is pure waste.
Posted Dec 16, 2010 20:21 UTC (Thu)
by cras (guest, #7000)
[Link]
Although now that I think about it, maybe many of those checks I used to put unlikely() in my code are nowadays pointless, because they call functions that I've since marked with __attribute__((cold)), which should put the unlikely()s there automatically. (But do they? I remember testing that the cold attribute did nothing in some gcc version.)
Posted Dec 19, 2010 3:39 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link] (3 responses)
Think of a branch that goes one way during the day and another way at night. If we suppose that activity is twice as high during the day, then we'll see a 2:1 branch bias in favor of the daylight path. But over smaller intervals of time (say, an hour), the branch bias will be almost totally in favor of one path or the other. Unlike static assertions, the CPU's automatic prediction can look at the specific circumstances of the system and make predictions based on recent behavior and specific circumstances. Using static branch prediction could actually hurt.
Posted Dec 20, 2010 20:35 UTC (Mon)
by jzbiciak (guest, #5246)
[Link] (2 responses)
I did state my model was simplified, and did allow for the fact that branch prediction could change the cost over time and that that wasn't included in my model. My understanding is that static prediction mainly involves either putting a hint on a branch opcode (on architectures that support it) and/or changing which path is the "fall through" path versus the "branch taken" path, overlaid with a rule such as "predict backward branches as taken; forward branches not-taken". In the presence of dynamic prediction, a branch hint should be just that: a hint. A strong prediction signal from the predictor ought to override a hint. The main purpose of branch hinting is to help the cold-predictor case. This article describes one such scheme. Basically, if your branch isn't in the Branch History Table (BHT), then it falls back to the static prediction. This puts an upper bound on how much damage the static prediction can do, as well as an upper bound on how much it can help. For static branch prediction to hurt, you'd need to have a large enough working set that the branch in question isn't in the BHT (to use the P4's scheme above), but still hit often enough to make a measurable performance difference. For your bimodal case (day/night shifting patterns), which ever way you schedule your code, you'll penalize one or the other. If you have accurate long-term branch statistics, though, you can still pick the one that occurs less often and penalize that. So, if daytime has twice the load as nighttime, it should show up in the statistics and you can put likely/unlikely on the conditional to favor daytime. Really, the argument that makes the most sense to me against likely/unlikely in marginal cases is that different users could have completely different workloads with wildly different statistics (multimodal across all users, even though each user might be unimodal), and so there's no good training set, so just let the compiler guess. There's no way to avoid penalizing someone, so let the compiler do it. Only put likely/unlikely where it's clear that it's right across the vast majority of workloads. Otherwise, there's a good chance it could be wrong for more than half the users out there, and it's also not worth the maintenance burden. So, that leaves the question: What's the criteria for figuring that out? Taken vs. not-taken ratio seems like a shaky heuristic to me.
Posted Dec 21, 2010 13:58 UTC (Tue)
by vonbrand (subscriber, #4458)
[Link] (1 responses)
It has been observed that programmers are terrible at guessing where their programs spend most of their time... and that should be much easier than predicting the frequency of some critical branch going one way or the other.
Best leave it alone, or only annotate when extensive measurements show a high bias (and the compiler gets it wrong).
Posted Dec 27, 2010 11:00 UTC (Mon)
by tesarik (subscriber, #52705)
[Link]
Surely, gcc heuristic decisions change with the version...
I recall that there was once a compiler that implemented the user-specified branch likelihood hint backwards and it was at least a year before anyone noticed, but this may be an apocryphal story. Does anyone remember?
Likely unlikely()s
Likely unlikely()s
Likely unlikely()s
Being wrong 39% of the time was pretty obviously too much and led to the removal of the annotation for that test.
Likely unlikely()s
Likely unlikely()s
Likely unlikely()s
update_world_status();
}
save_the_world();
Likely unlikely()s
Likely unlikely()s
path (ie. on contention):
cpu_relax();
it's more important to run the critical section as fast as possible when it
finally gets available.
Likely unlikely()s
Likely unlikely()s
You're assuming that pcorrect and pincorrect are constant over time, and they're often not. CPUs these days have large global branch prediction buffers and can detect fairly complex patterns (e.g., branch every third time) automatically. The overall result of this dynamic prediction may be superior to a purely static assertion, even if that assertion is correct on average.
Likely unlikely()s
Likely unlikely()s
Likely unlikely()s
Likely unlikely()s