COMP 740: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation

Title:

COMP 740: Computer Architecture and Implementation

Description:

... default predictions, which the compiler may reverse by setting a bit in the instruction ... Aliasing can occur since BHT uses only portion of branch instr address ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 22

Provided by: Montek5

Learn more at: http://www.cs.unc.edu

Category:

more less

Transcript and Presenter's Notes

Title: COMP 740: Computer Architecture and Implementation

1
COMP 740Computer Architecture and Implementation

Montek Singh
Thu, Feb 19, 2009
Topic Instruction-Level Parallelism III
(Dynamic Branch Prediction)

2
Why Do We Need Branch Prediction?

Basic blocks are short, and we have done about
all we can do for them with dynamic scheduling
control dependences now become the bottleneck
Since branches disrupt sequential flow of instrs
we need to be able to predict branch behavior to
avoid stalling the pipeline
What we must predict
Branch outcome (Is the branch taken?)
Branch Target Address (What is the next
non-sequential PC value?)

3
A General Model of Branch Prediction
Branch predictor accuracy
Branch penalties

T probability of branch being taken
p fraction of branches that are predicted
to be taken
A accuracy of prediction
j, k, m, n associated delays (penalties) for
the four events (n is usually
0)

Branch penalty of a particular prediction method
4
Theoretical Limits of Branch Prediction

Best case branches are perfectly predicted (A
1)
also assume that n 0
minimum branch penalty jT
Let s be the pipeline stage where BTA becomes
known
Then j s-1
See static prediction methods in Lecture 7
Thus, performance of any branch prediction
strategy is limited by
s, the location of the pipeline stage that
develops BTA
A, the accuracy of the prediction

5
Review Static Branch Prediction Methods

Several static prediction strategies
Predict all branches as NOT TAKEN
Predict all branches as TAKEN
Predict all branches with certain opcodes as
TAKEN, and all others as NOT TAKEN
Predict all forward branches as NOT TAKEN, and
all backward branches as TAKEN
Opcodes have default predictions, which the
compiler may reverse by setting a bit in the
instruction

6
Dynamic Branch Prediction

Premise History of a branch instrs outcome
matters!
whether a branch will be taken depends greatly on
the way previous dynamic instances of the same
branch were decided
Dynamic prediction methods
take advantage of this fact by making their
predictions dependent on the past behavior of the
same branch instr
such methods are called Branch History Table
(BHT) methods

7
BHT Methods for Branch Prediction
8
A One-Bit Predictor
State 0 Predict Not Taken
State 1 Predict Taken

Predictor misses twice on typical loop branches
Once at the end of loop
Once at the end of the 1st iteration of next
execution of loop
The outcome sequence NT-T-NT-T makes it miss all
the time

9
A Two-Bit Predictor

A four-state Moore machine
Predictor misses once on typical loop branches
hence popular
Outcome sequence NT-NT-T-T-NT-NT-T-T make it miss
all the time

10
A Two-Bit Predictor

A four-state Moore machine
Predictor misses once on typical loop branches
hence popular
Input sequence NT-NT-T-T-NT-NT-T-T make it miss
all the time

11
Correlating Branch Outcome Predictors

The history-based branch predictors seen so far
base their predictions on past history of branch
that is being predicted
A completely different idea
The outcome of a branch may well be predicted
successfully based on the outcome of the last k
branches executed
i.e., the path leading to the branch being
predicted
Much-quoted example from SPEC92 benchmark eqntott

if (aa 2) /b1/ aa 0 if (bb 2)
/b2/ bb 0 if (aa ! bb) /b3/
TAKEN(b1) TAKEN(b2) implies NOT-TAKEN(b3)
12
Another Example of Branch Correlation
if (d 0) //b1 d 1 if (d 1) //b2
...

Assume multiple runs of code fragment
d alternates between 2 and 0
How would a 1-bit predictor initialized to state
0
behave?

BNEZ R1, L1 ADDI R1, R0, 1 L1 SUBI R3, R1,
1 BNEZ R3, L2 L2
13
A Correlating Branch Predictor

Think of having a pair of 1-bit predictors p0,
p1 for each branch, where we choose between
predictors (and update them) based on outcome of
most recent branch (i.e., B1 for B2, and B2 for
B1)
if most recent br was not taken, use and update
(if needed) predictor p0
If most recent br was taken, use and update (if
needed) predictor p1
How would such (1,1) correlating predictors
behave if initialized to 0,0?

14
Organization of (m,n) Correlating Predictor

Using the results of last m branches
2m outcomes
can be kept in m-bit shift register
n-bit self-history predictor
BHT addressed using
m bits of global history
select column (particular predictor)
some lower bits of branch address
select row (particular branch instr)
entry holds n previous outcomes
Aliasing can occur since BHT uses only portion of
branch instr address
state in various predictors in single row may
correspond to different branches at different
points of time
m0 is ordinary BHT

15
Improved Dynamic Branch Prediction

Recall that, even with perfect accuracy of
prediction, branch penalty of a prediction method
is (s-1)T
s is the pipeline stage where BTA is developed
T is the frequency of taken branches
Further improvements can be obtained only by
using a cache storing BTAs, and accessing it
simultaneously with the I-cache
Such a cache is called a Branch Target Buffer
(BTB)
BHT and BTB can be used together
Coupled one table holds all the information
Uncoupled two independent tables

16
Using BTB and BHT Together

Uncoupled solution
BTB stores only the BTAs of taken branches
recently executed
No separate branch outcome prediction (the
presence of an entry in BTB can be used as an
implicit prediction of the branch being TAKEN
next time)
Use the BHT in case of a BTB miss
Coupled solution
Stores BTAs of all branches recently executed
Has separate branch outcome prediction for each
table entry
Use BHT in case of BTB hit
Predict NOT TAKEN otherwise

17
Parameters of Real Machines
18
Coupled BTB and BHT
19
Decoupled BTB and BHT
20
Reducing Misprediction Penalties

Need to recover whenever branch prediction is not
correct
Discard all speculatively executed instructions
Resume execution along alternative path (this is
the costly step)
Scenarios where recovery is needed
Predict taken, branch is taken, BTA wrong (case
7)
Predict taken, branch is not taken (cases 4 and
6)
Predict not taken, branch is taken (case 3)
Preparing for recovery involves working on
alternative parh
On instruction level
Two fetch address registers per speculated branch
(PPC 603 640)
Two instruction buffers (IBM 360/91, SuperSPARC,
Pentium)
On I-cache level
For PT, also do next-line prefetching
For PNT, also do target-line prefetching

21
Predicting Dynamic BTAs

Vast majority of dynamic BTAs come from procedure
returns (85 for SPEC95)
Since procedure call-return for the most part
follows a stack discipline, a specialized return
address buffer operated as a stack is appropriate
for high prediction accuracy
Pushes return address on call
Pops return address on return
Depth of RAS should be as large as maximum call
depth to avoid mispredictions
8-16 elements generally sufficient