CSE 420598 Computer Architecture Lec 10 Chapter 2 DynPredBTB - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CSE 420598 Computer Architecture Lec 10 Chapter 2 DynPredBTB

Description:

CSE 420/598 Computer Architecture. Lec 10 Chapter 2 - DynPred-BTB. Sandeep K. S. Gupta ... Only PCs of predicted taken branches are stored (no need to store untaken) ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 21
Provided by: impac1
Category:

less

Transcript and Presenter's Notes

Title: CSE 420598 Computer Architecture Lec 10 Chapter 2 DynPredBTB


1
CSE 420/598 Computer Architecture Lec 10
Chapter 2 - DynPred-BTB
  • Sandeep K. S. Gupta
  • School of Computing and Informatics
  • Arizona State University

Based on Slides by David Patterson
2
Agenda
  • Dynamic Branch Prediction (Review)
  • BTB

3
Applying the Prediction
  • The earliest time we can begin using the
    prediction is when
  • the prediction bits are available
  • the branch target is available
  • The earliest time we can know whether we have
    predicted correctly is when
  • the branch condition is resolved
  • The difference between these times is roughly
    what is saved by a correct prediction
  • If the branch target is available late, the
    window of savings is reduced

4
Correlating Predictors
  • The prediction is a function of the last k branch
    outcomes
  • The branch history buffer is indexed by
  • m bits taken from address of branch
  • k bits of branch history
  • i.e., m k bits all told
  • Each entry in the branch history buffer has q
    bits (i.e., is a q-bit predictor)
  • The branch history buffer has 2mk ? q bits of
    storage

5
Correlating predictor with2 history bits and 2
state bits (2,2)
6
Local versus Global
7
Hashing Correlation
For the same amount of table storage, we can get
better associativity in the case of fewer
branches but highly correlated behavior.
8
Tournament Predictor
  • Move toward the other predictor when
  • I am wrong
  • He is right
  • Stay put when I am right and he is right, or I am
    wrong and he is wrong.

9
Tournament predictor local vs global
10
Alpha 21264 Branch Predictor
  • Tournament predictor (4K x 2) chooses between
    global and local
  • Global has 4K 2-bit entries indexed by last 12
    branch outcomes XORed with address
  • Local is also a two-level predictor
  • 1K x 10 branch history buffer (last 10 outcomes
    for indexed branch) indexed by address
  • The selected 10-bit history is XORed with address
    to index a table of 3-bit entries

11
Alpha 21264 Predictor
12
Branch Target Buffers (BTB) or Caches (BTC)
  • Branch target calculation is costly and stalls
    the instruction fetch.
  • To reduce the branch penalty
  • need to know what the address is by the end of IF
  • but the instruction isnt even decoded yet
  • so we have to wait a cycle and perhaps get a
    branch (penalty 1 for MIPS)
  • so use the branch instruction address
  • to predict the branch target
  • if prediction works then penalty goes to 0!

13
BTB - Idea
  • BTB stores PCs the same way as caches
  • Only PCs of predicted taken branches are stored
    (no need to store untaken)
  • The match tag is the PC (associative memory OK if
    its small)
  • The datafield is the predicted PC
  • The PC of a (potential) branch is sent to the BTB
  • When a match is found the corresponding Predicted
    PC is returned
  • If PC not in table, it is taken to mean
  • either not a branch
  • or not predicted taken
  • in either case, continue fetching from PC k (k
    4 for MIPS)
  • If the branch was predicted taken, instruction
    fetch continues at the returned predicted PC
  • BTB gets us the branch target address early

14
Branch Target Buffers
15
Changes in MIPS to incorporate BTB
16
Penalties Using BTB in MIPS
  • Note
  • Penalties for mis-prediction more complex
    machines are much higher

17
Questions Concerning BTBs
  • Can BTB be combined with branch prediction
    machinery introduced earlier in this lecture?
    How?
  • What kind of branches can a BTB accelerate that
    are out of the reach of ordinary branch
    predictors?

18
BTB coupled with BHT
19
Improvements
  • Store instructions rather than target address
  • increases entry size but removes Ifetch time
  • permits BTB to run slower and therefore be larger
  • permits branch folding - branches effectively
    disappear
  • branch job is to change PC and get the real
    instruction
  • if you have the instruction then the branch isnt
    there (folded out of the way)
  • result is 0-cycle jumps and effectively 0-cycle
    properly predicted branches
  • however - branches must be checked
  • in a parallel path the branch must be fetched and
    checked to see if the prediction is true
  • Predicting indirect jumps
  • major source is procedure return
  • obvious model is to use a stack as the return
    predictor
  • note this can be combined with the above to get
    jump folding

20
Dynamic Branch Prediction Summary
  • Prediction becoming important part of execution
  • Branch History Table 2 bits for loop accuracy
  • Correlation Recently executed branches
    correlated with next branch
  • Either different branches (GA)
  • Or different executions of same branches (PA)
  • Tournament predictors take insight to next level,
    by using multiple predictors
  • usually one based on global information and one
    based on local information, and combining them
    with a selector
  • In 2006, tournament predictors using ? 30K bits
    are in processors like the Power5 and Pentium 4
  • Branch Target Buffer include branch address
    prediction
  • Next Class Dynamic Scheduling
Write a Comment
User Comments (0)
About PowerShow.com