Similar to other prediction mechanisms, branch predictors are also better at predicting strongly biased branch outcomes.
This rule is well-understood and commonly used, e.g., the Intel Itanium compiler assumes that prediction accuracy = MAX(percentage_taken, percentage_not_taken) when performing its profile-guided optimizations. Thus, to improve branch prediction, we must increase the number of biased branches while reducing branches that are oscillating. This post shows a simple trick to do so.
Why are biased branches predicted better?
Predictors typically use a 2-bit counter to track a branch. Lets suppose we have the resources to assign every branch its own 2-bit counter (PAp predictor). Suppose a branch is continuously taken and the counter saturates to the strongly-taken (bits:11) state. The prediction accuracy will be 100%. However, there will be a misprediction each time the branch is not-taken, bumping the counter down to bits:10. The counter will bump back up to 11 next time the branch is taken. Thus, each instance of the branch which is not-taken will lead to one misprediction and prediction accuracy will roughly be equal to the percentage of times the branch is taken.
Since the branch is randomly distributed, it is possible for two consecutive instances to be not-taken. This will bump the counter down to the weakly-not-taken state (bits:01) and will lead to three mispredictions: two because of the not-taken instances and one because the next taken instance will mispredict. Using simple probability, one can see that a branch that is only 70% taken is more likely to be not-taken twice in a row than a branch which is 95% taken .
Thus, a general rule is that the less biased a branch is, the lower its prediction accuracy because of the 2-bit counter behavior.
Please note that the above analysis is equally applicable if the branch was biased not-taken.
What can software do?
Let me explain with an example. Lets suppose that X is random variable between 0 and 99. I want to run the following code:
if (X > 5 && X < 95) //branch is taken 90% of the time
However, if I write the code as:
if(X > 5) //branch is taken 95% of the time
if(X < 95) //branch is taken 95% of the time
The branch predictor has a better shot at predicting both these branches more accurately which may lead to better performance because the counters assigned to both these branches are more likely to stay saturated at taken (because two not-taken are less likely).
In general, any time you are ANDing/ORing conditions in an if-statement, you should think if the combination will be more biased or less biased and pick the version which is more biased.
Play with a 2-bit counter
I have created this small animation so that you can play with a 2-bit counter.
Last 10 accesses: