Analysis of the OGEHL branch predictor

About This Presentation

Title:

Analysis of the OGEHL branch predictor

Description:

Let us adapt some history lengths to the behavior of each application. 8 tables: ... Adaptative history length fitting (2) Intuition: ... – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 25

Provided by: Sez79

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of the OGEHL branch predictor

1
Analysis of the O-GEHL branch predictor

Optimized GEometric History Length
André Seznec
IRISA/INRIA/HIPEAC

2
Objectives

State of the art accuracy
Any gain in branch prediction accuracy results in
performance gain and power consumption gain
Keep the design implementable
Rely on a single prediction scheme
Use only global history information
Designers hate maintaining speculative local
history

3
The basis A Multiple history length predictor
TO
T1
T2
?
L(0)
T3
L(1)
L(2)
T4
L(3)
L(4)
4
Selecting between multiple predictions

Classic solution
Use of a meta predictor
wasting storage !?!
chosing among 5 or 10 predictions ??
Neural inspired predictors
Use an adder tree instead of a meta-predictor

Lets use the adder tree
5
Final computation through a sum
L(0)
PredictionSign
6
From old experience on 2bcgskew

Some applications benefit from 100 bits
histories
Some dont !!

7
GEometric History Length predictor
The set of history lengths forms a geometric
series
0, 2, 4, 8, 16, 32, 64, 128
What is important L(i)-L(i-1) is drastically
increasing
Spends most of the storage for short history !!
8
Update policy

Perceptron inspired threshold based
Perceptron-like threshold did not work
Reasonable fixed threshold Number of tables

9
Dynamic update threshold fitting

On an O-GEHL predictor, best threshold depends
on
the application ?
the predictor size ?
the counter width ?
By chance, on most applications, for the best
fixed threshold,
updates on mispredictions updates on correct
predictions
Monitor the difference
and adapt the update threshold

10
Adaptative history length fitting (inspired by
Juan et al 98)

(½ applications L(7) lt 50)
?
(½ applications L(7) gt 150)
Let us adapt some history lengths to the behavior
of each application
8 tables
T2 L(2) and L(8)
T4 L(4) and L(9)
T6 L(6) and L(10)

11
Adaptative history length fitting (2)

Intuition
if high degree of conflicts on T7, stick with
short history
Implementation
monitoring of aliasing on updates on T7 through a
tag bit and a counter
Simple is sufficient
Flipping from short to long histories and
vice-versa

12
Evaluation framework

1st Championship Branch Prediction traces
20 traces including system activity
Floating point apps loop dominated
Integer apps usual SPECINT
Multimedia apps
Server workload apps very large footprint

13
Reference configurationpresented for
Championship Branch Prediction

8 tables
2 Kentries except T1, 1Kentries
5 bit counters for T0 and T1, 4 bit counters
otherwise
1 Kbits of one bit tags associated with T7
10K 5K 6x8K 1K 64K
L(1) 3 and L(10) 200
0,3,5,8,12,19,31,49,75,125,200

14
A case for the OGEHL predictor

2nd at CBP 2.82 misp/KI
Best practice award
The predictor the closest to a possible hardware
implementation
Does not use exotic features
Various prime numbers, etc
Strange initial state
Very short warming intervals
Chaining all simulations 2.84 misp/KI

15
A case for the OGEHL predictor (2)

High accuracy
32Kbits (3,150) 3.41 misp/KI
better than any implementable 128 Kbits
predictor before CBP
128 Kbits 2bcgkew (6,6,24,48) 3.55 misp/KI
176 Kbits PBNP (43) 3.67 misp/KI
1Mbits (5,300) 2.27 misp/KI
1Mbit 2bcgskew (9,9,36,72) 3.19 misp/KI
1888 Kbits PBNP (58) 3.23 misp/KI

16
A case for the OGEHL predictor (3)

Robustness to variations of history lengths
choices
L(1) in 2,6, L(10) in 125,300
misp. rate lt 2.96 misp/KI
Geometric series not a bad formula !!
best geometric L(1)3, L(10)223, 2.80 misp/KI
best overall 0, 2, 4, 9, 12, 18, 31, 54, 114,
145, 266 2.78 misp/KI

17
Impact of the number of components

4 components 8 components
64 Kbits 3.02 -- 2.84 misp/KI
256Kbits 2.59 -- 2.44 misp/KI
1Mbit 2.40 -- 2.27 misp/KI
6 components 12 components
48 Kbits 3.02 3.03 misp/KI
768Kbits 2.35 2.25 misp/KI
4 to 12 components bring high accuracy ?

18
Impact of counter width

Robustness to counter width variations
3-bit counter, 49 Kbits 3.09 misp/KI
Dynamic update threshold fitting helps a lot
5-bit counter 79 Kbits 2.79 misp/KI
4-bit is the best tradeoff

19
Prediction computation time

3 successive steps
Index computation a 3-entry XOR gate
Table read
Adder tree
May not fit on a single cycle
But can be ahead pipelined !

20
Ahead pipelining a global history branch
predictor (principle)

Initiate branch prediction X1 cycles in advance
to provide the prediction in time
Use information available
X-block ahead instruction address
X-block ahead history
To ensure accuracy
Use intermediate path information

21
Practice
Ahead OGEHL 8 // prediction computations
Ha A
22
Ahead Pipelined 64 Kbits OGEHL