COMP 740: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 740: Computer Architecture and Implementation

Description:

... default predictions, which the compiler may reverse by setting a bit in the instruction ... Aliasing can occur since BHT uses only portion of branch instr address ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 22
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: COMP 740: Computer Architecture and Implementation


1
COMP 740Computer Architecture and Implementation
  • Montek Singh
  • Thu, Feb 19, 2009
  • Topic Instruction-Level Parallelism III
  • (Dynamic Branch Prediction)

2
Why Do We Need Branch Prediction?
  • Basic blocks are short, and we have done about
    all we can do for them with dynamic scheduling
  • control dependences now become the bottleneck
  • Since branches disrupt sequential flow of instrs
  • we need to be able to predict branch behavior to
    avoid stalling the pipeline
  • What we must predict
  • Branch outcome (Is the branch taken?)
  • Branch Target Address (What is the next
    non-sequential PC value?)

3
A General Model of Branch Prediction
Branch predictor accuracy
Branch penalties
  • T probability of branch being taken
  • p fraction of branches that are predicted
  • to be taken
  • A accuracy of prediction
  • j, k, m, n associated delays (penalties) for
  • the four events (n is usually
    0)

Branch penalty of a particular prediction method
4
Theoretical Limits of Branch Prediction
  • Best case branches are perfectly predicted (A
    1)
  • also assume that n 0
  • minimum branch penalty jT
  • Let s be the pipeline stage where BTA becomes
    known
  • Then j s-1
  • See static prediction methods in Lecture 7
  • Thus, performance of any branch prediction
    strategy is limited by
  • s, the location of the pipeline stage that
    develops BTA
  • A, the accuracy of the prediction

5
Review Static Branch Prediction Methods
  • Several static prediction strategies
  • Predict all branches as NOT TAKEN
  • Predict all branches as TAKEN
  • Predict all branches with certain opcodes as
    TAKEN, and all others as NOT TAKEN
  • Predict all forward branches as NOT TAKEN, and
    all backward branches as TAKEN
  • Opcodes have default predictions, which the
    compiler may reverse by setting a bit in the
    instruction

6
Dynamic Branch Prediction
  • Premise History of a branch instrs outcome
    matters!
  • whether a branch will be taken depends greatly on
    the way previous dynamic instances of the same
    branch were decided
  • Dynamic prediction methods
  • take advantage of this fact by making their
    predictions dependent on the past behavior of the
    same branch instr
  • such methods are called Branch History Table
    (BHT) methods

7
BHT Methods for Branch Prediction
8
A One-Bit Predictor
State 0 Predict Not Taken
State 1 Predict Taken
  • Predictor misses twice on typical loop branches
  • Once at the end of loop
  • Once at the end of the 1st iteration of next
    execution of loop
  • The outcome sequence NT-T-NT-T makes it miss all
    the time

9
A Two-Bit Predictor
  • A four-state Moore machine
  • Predictor misses once on typical loop branches
  • hence popular
  • Outcome sequence NT-NT-T-T-NT-NT-T-T make it miss
    all the time

10
A Two-Bit Predictor
  • A four-state Moore machine
  • Predictor misses once on typical loop branches
  • hence popular
  • Input sequence NT-NT-T-T-NT-NT-T-T make it miss
    all the time

11
Correlating Branch Outcome Predictors
  • The history-based branch predictors seen so far
    base their predictions on past history of branch
    that is being predicted
  • A completely different idea
  • The outcome of a branch may well be predicted
    successfully based on the outcome of the last k
    branches executed
  • i.e., the path leading to the branch being
    predicted
  • Much-quoted example from SPEC92 benchmark eqntott

if (aa 2) /b1/ aa 0 if (bb 2)
/b2/ bb 0 if (aa ! bb) /b3/
TAKEN(b1) TAKEN(b2) implies NOT-TAKEN(b3)
12
Another Example of Branch Correlation
if (d 0) //b1 d 1 if (d 1) //b2
...
  • Assume multiple runs of code fragment
  • d alternates between 2 and 0
  • How would a 1-bit predictor initialized to state
    0
  • behave?

BNEZ R1, L1 ADDI R1, R0, 1 L1 SUBI R3, R1,
1 BNEZ R3, L2 L2
13
A Correlating Branch Predictor
  • Think of having a pair of 1-bit predictors p0,
    p1 for each branch, where we choose between
    predictors (and update them) based on outcome of
    most recent branch (i.e., B1 for B2, and B2 for
    B1)
  • if most recent br was not taken, use and update
    (if needed) predictor p0
  • If most recent br was taken, use and update (if
    needed) predictor p1
  • How would such (1,1) correlating predictors
    behave if initialized to 0,0?

14
Organization of (m,n) Correlating Predictor
  • Using the results of last m branches
  • 2m outcomes
  • can be kept in m-bit shift register
  • n-bit self-history predictor
  • BHT addressed using
  • m bits of global history
  • select column (particular predictor)
  • some lower bits of branch address
  • select row (particular branch instr)
  • entry holds n previous outcomes
  • Aliasing can occur since BHT uses only portion of
    branch instr address
  • state in various predictors in single row may
    correspond to different branches at different
    points of time
  • m0 is ordinary BHT

15
Improved Dynamic Branch Prediction
  • Recall that, even with perfect accuracy of
    prediction, branch penalty of a prediction method
    is (s-1)T
  • s is the pipeline stage where BTA is developed
  • T is the frequency of taken branches
  • Further improvements can be obtained only by
    using a cache storing BTAs, and accessing it
    simultaneously with the I-cache
  • Such a cache is called a Branch Target Buffer
    (BTB)
  • BHT and BTB can be used together
  • Coupled one table holds all the information
  • Uncoupled two independent tables

16
Using BTB and BHT Together
  • Uncoupled solution
  • BTB stores only the BTAs of taken branches
    recently executed
  • No separate branch outcome prediction (the
    presence of an entry in BTB can be used as an
    implicit prediction of the branch being TAKEN
    next time)
  • Use the BHT in case of a BTB miss
  • Coupled solution
  • Stores BTAs of all branches recently executed
  • Has separate branch outcome prediction for each
    table entry
  • Use BHT in case of BTB hit
  • Predict NOT TAKEN otherwise

17
Parameters of Real Machines
18
Coupled BTB and BHT
19
Decoupled BTB and BHT
20
Reducing Misprediction Penalties
  • Need to recover whenever branch prediction is not
    correct
  • Discard all speculatively executed instructions
  • Resume execution along alternative path (this is
    the costly step)
  • Scenarios where recovery is needed
  • Predict taken, branch is taken, BTA wrong (case
    7)
  • Predict taken, branch is not taken (cases 4 and
    6)
  • Predict not taken, branch is taken (case 3)
  • Preparing for recovery involves working on
    alternative parh
  • On instruction level
  • Two fetch address registers per speculated branch
    (PPC 603 640)
  • Two instruction buffers (IBM 360/91, SuperSPARC,
    Pentium)
  • On I-cache level
  • For PT, also do next-line prefetching
  • For PNT, also do target-line prefetching

21
Predicting Dynamic BTAs
  • Vast majority of dynamic BTAs come from procedure
    returns (85 for SPEC95)
  • Since procedure call-return for the most part
    follows a stack discipline, a specialized return
    address buffer operated as a stack is appropriate
    for high prediction accuracy
  • Pushes return address on call
  • Pops return address on return
  • Depth of RAS should be as large as maximum call
    depth to avoid mispredictions
  • 8-16 elements generally sufficient
Write a Comment
User Comments (0)
About PowerShow.com