Edit: Moin pointed out a small problem so this post was edited on 5/19
A branch instruction can be a big overhead in pipelined processors. At every branch, the processor has to decide which instruction to fetch after the branch. Instead of waiting for the branch, it tires to predict the branch direction. A correct prediction is a win; but a misprediction requires flushing the whole pipeline wasting all instructions fetched after the branch. The deeper the pipeline, the more instructions need to be flushed. Pipelines are deep in today’s processors hence writing code with branch penalty in mind is a plus for performance as well as power.
Branch prediction is an important concept which all programmers must understand for two reasons: (1) branches are very frequent and can be a major performance overhead if they mispredict in the branch predictor, (2) unlike cache behavior, branch behavior can be impacted from any high level language. My post presents some background on the importance of branches and present two very basic techniques to reduce their overhead.
What kind of branches are hard-to-predict?
Almost all Branch prediction algorithms are history-based so problematic branches are the ones that do not follow a pattern.
The following branch is hard to predict because it depends on the data elements in array A. In the worst case, if A is completely random, the branch will be mispredicted 50% of the time:
for i = 0 to N: if A[i] > 0: # This if will be converted into a branch res = foo() else: res = bar()
How can branches be optimized (eliminated)?
Many branches can be eliminated via a technique called predication. In predication, we change the branch into boolean math. For example, the branch in Figure 1 can be re-written as follows:
for i = 0 to N: p=(A[i] > 0) f = foo() b = bar() res = p & f + !p & b
Yes, we are executing both foo() and bar() methods which introduces its own overhead. This is exactly why predication is not always a good idea unless the two branch directions (foo and bar) are small. See this paper for a good analysis.
Another technique is to promote branches outside a loop using loop splitting. If a branch is executed infrequently then it being mispredicted is a negligible penalty.