Title: Analysis of the OGEHL branch predictor
1Analysis of the O-GEHL branch predictor
- Optimized GEometric History Length
- André Seznec
- IRISA/INRIA/HIPEAC
2Objectives
- State of the art accuracy
- Any gain in branch prediction accuracy results in
performance gain and power consumption gain - Keep the design implementable
- Rely on a single prediction scheme
- Use only global history information
- Designers hate maintaining speculative local
history
3The basis A Multiple history length predictor
TO
T1
T2
?
L(0)
T3
L(1)
L(2)
T4
L(3)
L(4)
4Selecting between multiple predictions
- Classic solution
- Use of a meta predictor
- wasting storage !?!
- chosing among 5 or 10 predictions ??
- Neural inspired predictors
- Use an adder tree instead of a meta-predictor
Lets use the adder tree
5Final computation through a sum
L(0)
PredictionSign
6From old experience on 2bcgskew
- Some applications benefit from 100 bits
histories - Some dont !!
7GEometric History Length predictor
The set of history lengths forms a geometric
series
0, 2, 4, 8, 16, 32, 64, 128
What is important L(i)-L(i-1) is drastically
increasing
Spends most of the storage for short history !!
8 Update policy
- Perceptron inspired threshold based
- Perceptron-like threshold did not work
- Reasonable fixed threshold Number of tables
9 Dynamic update threshold fitting
- On an O-GEHL predictor, best threshold depends
on - the application ?
- the predictor size ?
- the counter width ?
- By chance, on most applications, for the best
fixed threshold, - updates on mispredictions updates on correct
predictions - Monitor the difference
- and adapt the update threshold
10Adaptative history length fitting (inspired by
Juan et al 98)
- (½ applications L(7) lt 50)
- ?
- (½ applications L(7) gt 150)
- Let us adapt some history lengths to the behavior
of each application - 8 tables
- T2 L(2) and L(8)
- T4 L(4) and L(9)
- T6 L(6) and L(10)
11Adaptative history length fitting (2)
- Intuition
- if high degree of conflicts on T7, stick with
short history - Implementation
- monitoring of aliasing on updates on T7 through a
tag bit and a counter - Simple is sufficient
- Flipping from short to long histories and
vice-versa
12Evaluation framework
- 1st Championship Branch Prediction traces
- 20 traces including system activity
- Floating point apps loop dominated
- Integer apps usual SPECINT
- Multimedia apps
- Server workload apps very large footprint
13Reference configurationpresented for
Championship Branch Prediction
- 8 tables
- 2 Kentries except T1, 1Kentries
- 5 bit counters for T0 and T1, 4 bit counters
otherwise - 1 Kbits of one bit tags associated with T7
- 10K 5K 6x8K 1K 64K
- L(1) 3 and L(10) 200
- 0,3,5,8,12,19,31,49,75,125,200
14A case for the OGEHL predictor
- 2nd at CBP 2.82 misp/KI
- Best practice award
- The predictor the closest to a possible hardware
implementation - Does not use exotic features
- Various prime numbers, etc
- Strange initial state
- Very short warming intervals
- Chaining all simulations 2.84 misp/KI
15A case for the OGEHL predictor (2)
- High accuracy
- 32Kbits (3,150) 3.41 misp/KI
- better than any implementable 128 Kbits
predictor before CBP - 128 Kbits 2bcgkew (6,6,24,48) 3.55 misp/KI
- 176 Kbits PBNP (43) 3.67 misp/KI
- 1Mbits (5,300) 2.27 misp/KI
- 1Mbit 2bcgskew (9,9,36,72) 3.19 misp/KI
- 1888 Kbits PBNP (58) 3.23 misp/KI
16A case for the OGEHL predictor (3)
- Robustness to variations of history lengths
choices - L(1) in 2,6, L(10) in 125,300
- misp. rate lt 2.96 misp/KI
- Geometric series not a bad formula !!
- best geometric L(1)3, L(10)223, 2.80 misp/KI
- best overall 0, 2, 4, 9, 12, 18, 31, 54, 114,
145, 266 2.78 misp/KI
17Impact of the number of components
- 4 components 8 components
- 64 Kbits 3.02 -- 2.84 misp/KI
- 256Kbits 2.59 -- 2.44 misp/KI
- 1Mbit 2.40 -- 2.27 misp/KI
- 6 components 12 components
- 48 Kbits 3.02 3.03 misp/KI
- 768Kbits 2.35 2.25 misp/KI
- 4 to 12 components bring high accuracy ?
18Impact of counter width
- Robustness to counter width variations
- 3-bit counter, 49 Kbits 3.09 misp/KI
- Dynamic update threshold fitting helps a lot
- 5-bit counter 79 Kbits 2.79 misp/KI
- 4-bit is the best tradeoff
19Prediction computation time
- 3 successive steps
- Index computation a 3-entry XOR gate
- Table read
- Adder tree
- May not fit on a single cycle
- But can be ahead pipelined !
20Ahead pipelining a global history branch
predictor (principle)
- Initiate branch prediction X1 cycles in advance
to provide the prediction in time - Use information available
- X-block ahead instruction address
- X-block ahead history
- To ensure accuracy
- Use intermediate path information
21Practice
Ahead OGEHL 8 // prediction computations
Ha A
22Ahead Pipelined 64 Kbits OGEHL
- 3-block 2.94 misp/KI
- 4-block 2.99 misp/KI
- 5-block 3.04 misp/KI
Not such a huge accuracy loss ?
23A final case for the O-GEHL predictor
- delivers state-of-the-art accuracy
- uses only global information
- Very long history 200 bits !!
- can be ahead pipelined
- many effective design points
- Nb of tables, counter width, history lengths
- prediction computation logic complexity is low
- (compared with concurrent predictors?)
24The End ?