Title: Lecture 21: Instruction Level Parallelism Branch Prediction
1Lecture 21 Instruction Level Parallelism (Branch
Prediction)
- Computer Engineering 585
- Fall 2001
2Branch Prediction Buffer
IF
ID
EX
M
WB
I-Cache
PC
3Branch Target Buffer (BTB)
PC of instruction to fetch
Look up
Predicted PC
Number of
entries
in branch-
target
buffer
No instruction is
not predicted to be
Branch
branch. Proceed normally
predicted
taken or
untaken
Yes then instruction is branch and predicted
PC should be used as the next PC
FIGURE 4.22 A branch-target buffer.
4Branch Prediction Steps
5Dynamic Branch Prediction
- Performance (accuracy, cost of misprediction)
- Branch History Table is simplest
- Lower bits of PC address index table of 1-bit
values - Says whether or not branch taken last time
- No address check
- Problem in a loop, 1-bit BHT will cause two
mispredictions (avg is 9 iterations before exit) - End of loop case, when it exits instead of
looping as before - First time through loop on next time through
code, when it predicts exit instead of looping
61-Bit Prediction Drawbacks
LOOP Inst 1 Inst 2 Inst 3
. . Inst k
Branch
Taken 9 times Not taken 1 time
1-bit prediction mispredicts twice 20
misprediction rate
7Dynamic Branch Prediction
- Solution 2-bit scheme where change prediction
only if get misprediction twice (Figure 4.13, p.
264)
Taken
Not taken
Predict taken (11)
Predict taken (10)
Taken
Taken
Not taken
Not taken
Predict not taken (01)
Predict not taken (00)
Taken
Not taken
8BHT Accuracy
- Mispredict because either
- Wrong guess for that branch.
- Got branch history of wrong branch when indexing
the table. - 4096 entry table programs vary from 1
misprediction (nasa7, tomcatv) to 18 (eqntott),
with spice at 9 and gcc at 12 - 4096 about as good as infinite table(in Alpha
211164)
94096 entry 2-bit Prediction accuracy
nasa7
1
matrix300
0
1
tomcatv
5
doduc
spice
SPEC89
9
benchmarks
9
fpppp
gcc
12
5
espresso
18
eqntott
10
li
0
18
2
4
6
8
10
12
14
16
Frequency of mispredictions
FIGURE 4.14 Prediction accuracy of a 4096-entry
two-bit prediction buffer for t
he
SPEC89 benchmarks.
104096 entry Vs Infinite 2-bit prediction
1
nasa7
0
0
matrix300
0
1
tomcatv
0
5
doduc
5
9
spice
9
SPEC89
benchmarks
9
fpppp
9
12
gcc
11
5
espresso
5
18
eqntott
18
10
li
10
0
2
4
6
8
10
12
14
16
18
Frequency of mispredictions
4096 entries
Unlimited entries
2 bits per entry
2 bits per entry
11Correlating Branches
- Hypothesis recent branches are correlated that
is, behavior of recently executed branches
affects prediction of current branch. - Idea record m most recently executed branches as
taken or not taken, and use that pattern to
select the proper branch history table. - In general, (m,n) predictor means record last m
branches to select between 2m history tables each
with n-bit counters. - Old 2-bit BHT is then a (0,2) predictor
12Correlating Branches
if (d0) d1 if (d1)
B1 and B2 are correlated? B1 Not Taken ? B2 Not
Taken
13Correlating Branch Example
Assume d alternates between 2 and 0.
b1
b1
New b1
b2
b2
New b2
d?
prediction
action
prediction
prediction
action
prediction
2
NT
T
T
NT
T
T
0
T
NT
NT
T
NT
NT
2
NT
T
T
NT
T
T
0
T
NT
NT
T
NT
NT
1-bit predictor mispredicts every branch!
14Correlating Branch Example
Prediction if last branch
Prediction bits
not taken
Prediction if last branch taken
NT/NT
Not taken
Not taken
NT/T
Not taken
Taken
T/NT
Not taken
Taken
T/T
Taken
Taken
Initial prediction NT/NT
b2 action
b1 prediction
New b1 prediction
d?
b1 action
b2 prediction
New b2 pred
NT/
NT
2
T
T/NT
T
NT/T
NT
/NT
0
NT
T/NT
NT
NT/T
T/
NT
NT
/T
2
T
/NT
T
T/NT
NT/
T
NT/T
T
0
T/
NT
NT
T/NT
NT
/T
NT
NT/T
15Correlating Branches
- (2,2) predictor
- Then behavior of recent branches selects between,
say, four predictions of next branch, updating
just that prediction
Branch address
4
2-bit per branch predictors
XX prediction
XX
00
01
10
11
2-bit global branch history