May 182011
 

Edit: Moin pointed out a small problem so this post was edited on 5/19

A branch instruction can be a big overhead in pipelined processors. At every branch, the processor has to decide which instruction to fetch after the branch. Instead of waiting for the branch, it tires to predict the branch direction. A correct prediction is a win; but a misprediction requires flushing the whole pipeline wasting all instructions fetched after the branch. The deeper the pipeline, the more instructions need to be flushed. Pipelines are deep in today’s processors hence writing code with branch penalty in mind is a plus for performance as well as power.

Branch prediction is an important concept which all programmers must understand for two reasons: (1) branches are very frequent and can be a major performance overhead if they mispredict in the branch predictor, (2) unlike cache behavior, branch behavior can be impacted from any high level language. My post presents some background on the importance of branches and present two very basic techniques to reduce their overhead.

What kind of branches are hard-to-predict?

Almost all Branch prediction algorithms are history-based so problematic branches are the ones that do not follow a pattern.

The following branch is hard to predict because it depends on the data elements in array A. In the worst case, if A is completely random, the branch will be mispredicted 50% of the time:

    for i = 0 to N:
        if A[i] > 0:     # This if will be converted into a branch
            res = foo()
        else:
            res = bar()

 

How can branches be optimized (eliminated)?

Many branches can be eliminated via a technique called predication. In predication, we change the branch into boolean math. For example, the branch in Figure 1 can be re-written as follows:

    for i = 0 to N:
        p=(A[i] > 0)
        f = foo()
        b = bar()
        res = p & f + !p & b

Yes, we are executing both foo() and bar() methods which introduces its own overhead. This is exactly why predication is not always a good idea unless the two branch directions (foo and bar) are small. See this paper for a good analysis.

Another technique is to promote branches outside a loop using loop splitting. If a branch is executed infrequently then it being mispredicted is a negligible penalty.

  4 Responses to “Basic techniques to help branch prediction”

  1. if A is completely random, the branch will be mispredicted almost all the time: ==> it will be mispredicted half the time

    The following branch is hard-to-predict because it is clearly oscillating (most predictors today are unable to predict an oscillating pattern): ==> A simple 2-level predictor will catch it. In fact one bit of local history is sufficient to get almost 100% accuracy

    • Good catch! I agree with the random comment. It will be mispredicted only half the time.

      However, I disagree with your second correction that the EVEN/ODD branch can be predicted with 100% accuracy with a 2-bit counter. An oscillating branch behaves as follows:


      T N T N T
      T= taken, N = not-taken

      Say the 2-bit counter was in 01 (weakly not taken) state at the start. The transitions will be as follows:

      Actual outcome:                  T    N    T    N    T
      State after update          01   10   01  10   01   10 
      Prediction                  N    T     N   T    N     T
      

      Thus its always mispredicted. Even if the state was strongly taken (11) or strongly not-taken (00) at the start, the accuracy will only be 50%.

      • I mentioned two level predictor with History Length of 1 bit and not a bimodal predictor.

        Consider a two level predictor with history of 1 bit and two counters PHT[0] and PHT[1] . After a few iterations PHT[0] will be set to 11 and PHT[1] will be set to 00. When this happens H=0 will predict an outcome Taken, and H=1 will predict NotTaken.

        The two-level predictor with longer history length works too, after the PHT warmup.

        • Ah! I wasnt not thinking 2-level predictors — an artifact of the branch prediction studies I have been doing lately. The post needs an edit, I will fix it. Thanks for pointing it out.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>