Title: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution
1Wish Branches Combining Conditional Branching
and Predication for Adaptive Predicated Execution
Hyesoon Kim Onur Mutlu Jared Stark Yale N. Patt
- The University of Texas at Austin Oregon
Microarchitecture Lab - Electrical and Computer Engineering Intel
Corporation
2Talk Outline
- Problem
- Wish Branches
- Experimental Methodology
- Results
- Conclusion
3Predicated Execution
(predicated code)
A
p1 (cond) (!p1) mov b, 1 (p1) mov
b, 0
B
C
D
add x, b, 1
- Convert control flow dependency to data
dependency - Pro Eliminate hard-to-predict branches
Cons (1) Fetch blocks B and C all the time
(2) Wait until p1 is resolved
4The Overhead of Predicated Execution
-2
13
16
non-predicated
p1 (cond) (!p1) mov b, 1 (p1) mov
b, 0
p1 (cond) (0) mov b,1 (1) mov
b,0
A
B
C
D
add x, b, 1
(Predicated code)
If all overhead is ideally eliminated, predicated
execution would provide 16 improvement in
average execution time
5The Problem
- Due to the predication overhead, predicated
execution sometimes reduces performance - Branch misprediction characteristics are
dependent on run-time behavior input set,
control-flow path and phase behavior. The
compiler cannot accurately estimate the run-time
behavior of branches
6Talk Outline
- Problem
- Wish Branches
- Experimental Methodology
- Results
- Conclusion
7Wish Branches
- A new type of control flow instruction
3 types wish jump/join and wish loop - The compiler generates code (with wish branches)
that can be executed either as predicated code or
non-predicated code (normal branch code) - The hardware decides to execute predicated code
or normal branch code at run-time based on the
confidence of branch prediction - Easy to predict normal branch code
- Hard to predict predicated code
8Wish Jump/Join
High Confidence
Low Confidence
A
wish jump
nop
B
wish join
Taken
Not-Taken
C
D
A
p1(cond) wish.jump p1 TARGET
p1 (cond) branch p1, TARGET
B
nop
(!p1) mov b,1 wish.join !p1 JOIN
(1) mov b,1 wish.join (1) JOIN
C
TARGET (p1) mov b,0
TARGET (1) mov b,0
D
JOIN
wish jump/join code
9Wish Loop
H
X
T
X
T
N
N
Low Confidence
High Confidence
Y
Y
H
mov p1, 1 LOOP (p1) add a,
a, 1 (p1) add i, i, 1 (p1) p1
(cond) wish. loop p1, LOOP EXIT
X
X
LOOP add a, a, 1 add i, i,
1 p1 (iltN) branch p1,
LOOP EXIT
(1) (1) (1)
Y
Y
wish loop code
normal backward branch code
10Mispredicted Case 1 Early-Exit
H
X1
X2
X3
Y
H
Correct execution
T
T
N
X
T
Early-exit (Low confidence)
Flush pipeline
N
X1
X2
Y
H
T
N
Y
X3
Y
N
- Compared to normal branch code
- predicate data dependency and one extra
instruction (-)
11Mispredicted Case 2 Late-Exit
H
Correct execution
X1
X2
X3
Y
H
T
T
N
X
T
nop
nop
Late-exit (Low confidence)
N
X1
X2
X3
X4
X5
Y
H
T
T
T
T
N
Y
- Compared to normal branch code
- pro reduce flush penalty ()
- cons predicate data dependency and one
extra instruction (-)
12Mispredicted Case 3 No-Exit
H
X1
X2
X3
Y
H
Correct execution
T
T
N
Flush pipeline
X
T
No-exit (Low confidence)
N
X1
X2
X3
X4
X5
X6
H
T
T
T
T
T
T
Y
Y
- Compared to normal branch code
- predicate data dependency and one extra
instruction (-)
13Advantages/Disadvantages of Wish Branches
- Advantages compared to predicated execution
- Reduce the overhead of predication
- Increase the benefits of predicated code by
allowing the compiler to generate more
aggressively-predicated code - Provide a mechanism to exploit predication to
reduce the branch misprediction penalty for
backward branches (Wish loops) - Make predicated code less dependent on machine
configuration (eg. branch predictor)
14Advantages/Disadvantages of Wish Branches
- Disadvantages compared to predicated execution
- Extra branch instructions use machine resources
- Extra branch instructions increase the contention
for branch predictor table entries - May constrain the compilers scope for code
optimizations
15Wish Branch Support
- ISA Support
- predicated execution, wish branch instruction
- Compiler Support
- Wish branch generation algorithms
- The compiler needs to decide which branches are
predicated, which are converted to wish branches,
and which stay as normal branches - Hardware Support
- Confidence estimator
- Front-end and branch misprediction
detection/recovery module
16Talk Outline
- Problem
- Wish Branches
- Experimental Methodology
- Results
- Conclusion
17Experimental Infrastructure
Source Code
IA-64 Binary
IA-64 Trace
µops
IA-64 Compiler (ORC)
Micro-op Translator
Micro-op Simulator
Trace generation module
- IA-64 provides full support for predication
- Convert IA-64 traces to micro-ops to simulate an
out-of-order superscalar processor model
18Simulation Methodology
- Nine SPEC 2000 integer benchmarks
- Baseline Processor Configuration
- Front End
- Large and accurate branch predictor (64KB
hybrid branch predictor gshare local) - Minimum 30-cycle branch misprediction penalty
- 64KB, 2-cycle latency I-cache
- Execution Core
- 8-wide out-of-order processor
- 512-entry instruction window
- Confidence Estimator
- 1KB tagged 16-bit history JRS confidence
estimator (Jacobsen et al. MICRO-29)
19Talk Outline
- Problem
- Wish Branches
- Experimental Methodology
- Results
- Conclusion
20Performance Improvement
-4
14
2.02
8
24
non-predicated
16 over conditional branch prediction (w/o
mcf) 11 over selective-predication (w/o mcf) 7
over aggressive predication (w/o mcf)
14 over conditional branch prediction and 13
over selective-predication and 16 over
aggressive-predication
12 over conditional branch prediction 11 over
selective-predication 13 over aggressive
predication
AGGRESSIVE-PREDICATION all branches that are
suitable for if-conversion are predicated
SELECTIVE-PREDICATION branches are selectively
predicated using compile-time cost-benefit
analysis
21Talk Outline
- Problem
- Wish Branches
- Experimental Methodology
- Results
- Conclusion
22Conclusion
- New control flow instructions wish branches
(jump/join/loop) - Wish branches improve performance by dividing the
work of predication between the compiler and the
microarchitecture - Compiler analyzes the control-flow graph and
generates code - Microarchitecture makes run-time decision to use
predication - Wish branches provide significant performance
benefits - 16 compared to conditional branch prediction
- 13 compared to selectively predicated code
- Wish branches can make predicated execution more
viable and effective in high performance
processors - By enabling adaptive and aggressive predicated
execution