Performance-Aware Speculation Control using Wrong Path Usefulness Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Performance-Aware Speculation Control using Wrong Path Usefulness Prediction

Description:

Stops fetching instructions on wrong path. to save energy. ... Degrade with Perfect Fetch Gating? ... 30% performance degradation with perfect fetch gating ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: ece9
Learn more at: http://users.ece.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance-Aware Speculation Control using Wrong Path Usefulness Prediction


1
Performance-Aware Speculation Control using Wrong
Path Usefulness Prediction
  • Chang Joo Lee
  • Hyesoon Kim
  • Onur Mutlu
  • Yale N. Patt

HPS Research Group University of Texas at Austin
School of Computer Science Georgia Institute of
Technology
Microsoft Research
2
Outline
  • Motivation
  • Mechanism
  • Experimental Evaluation
  • Conclusion

3
Fetch Gating (Pipeline Gating)
  • Proposed by Manne et al. ISCA98
  • Stops fetching instructions on wrong path to
    save energy.
  • Assumes wrong-path instructions do not contribute
    to performance and consume energy.
  • Various fetch gating mechanisms
  • Baniasadi and Moshovos ISLPED01, Karkhanis et
    al. ISLPED02, Aragon et al. HPCA03,
    Buyuktosunoglu et al. GLSVLSI03, Collins et
    al. MICRO04

4
Limitations of Previous Mechanisms
  • Hardware complexity
  • Branch confidence estimator,changes to
    critical/power-hungry structures.
  • Additional hardware can offset energy savings
    due to fetch gating.
  • Assumption
  • Wrong-path execution consumes energybut is
    useless for performance.

5
Is Wrong Path Execution Really Useless?
  • Perfect fetch gating

Performance of most benchmarks increases by
perfect fetch gating.
mcf Performance degrades by 30 and energy
consumption increases by 15
parser Energy consumption decreases by 28 but
performance degrades by 5
6
Why Does Performance Degrade with Perfect Fetch
Gating?
MPKI 36.6
MPKI 1.5
mcf almost all of wrong-path L2 fills used,
memory intensive (MPKI 36.6)? 30 performance
degradation with perfect fetch gating
parser 37 is used wrong path fills, 14 is
unused wrong path fills? 5 performance
degradation with perfect fetch gating
Wrong path execution can prefetch useful
data Butler Thesis93, Pierce and Mudge IPPS94,
MICRO96, Mutlu et al. IEEE TC05
7
Why Can Wrong Path ExecutionBe Useful?
  • From mcf
  • Hammock structure within a frequently executed
    loop
  • BR in BB2 is frequently mispredicted
  • Since memory latency is large, wrong path
    prefetching benefit can be significant
  • Taking into account wrong-path usefulness is
    important

Taken
Not-taken
.. BR BB4
BB2
Mispredicted
Misprediction recovery
BB3
BB4
Load A Load B .. JMP BB5
Load A Load B ..
L2 cache miss
Cache hit
Load C ..
BB5
Cache hit
L2 cache miss
8
Outline
  • Motivation
  • Mechanism
  • Experimental Evaluation
  • Conclusion

9
Our Solution Performance-Aware Speculation
Control
  • Hardware complexity Simple low cost fetch gating
    mechanism
  • Wrong-path Usefulness Low cost Wrong Path
    Usefulness Predictor (WPUP)

Performance-Aware Speculation Control
Lookup
Fetch Gating
WPUP
Useful
Branch Count
Gate Enable
Fetch Engine
Fetch gate only when wrong path execution is
useless
10
Our Fetch Gating Mechanism
  • Branch-count based mechanism
  • More branches ? higher chance of misprediction.
  • Fetch gate if ( of Branches) gt Threshold
  • Mispredictions show phase behavior.
  • Threshold is determined by branch prediction
    accuracy for a certain period.
  • Higher accuracy ? Higher threshold
  • No need for complex logic (e.g. confidence
    estimator)

11
Two WPUP Mechanisms
  • Branch PC-based WPUP (Fine grained)
  • Phase-based WPUP (Coarse grained)

Can be combined with other fetch gating
mechanisms.
12
Branch PC-based WPUP
  • Basic idea
  • Identifies and records conditional branch PCs
    that lead to useful wrong-path memory references
  • If the fetched branch is recorded as useful, do
    not fetch gate

13
Branch PC-based WPUP
  • Implementation
  • Fetch Engine
  • Latest Branch PC Register (LBPC, 16bits)
  • LBPC value carried through pipeline
  • Miss Status Holding Registers (MSHR)
  • Branch ID field (BID, 10bits)
  • Already used for branch misprediction recovery
  • Branch PC field (BPC, 16bits)
  • Wrong Path field (WP, 1bit)
  • WPUP Cache
  • 4 way set-associative, No Data Store, LRU

14
Branch PC-Based WPUP (Training)

LBPC
PC 2
Taken
Load A in BB3 with PC 2 and BID 2
Load B in BB3 with PC2 and BID 2
Load C in BB5 with PC 2 and BID 2
Load A in BB4
BID 2 from branch unit
Not-taken
BB2
.. BR 2
PC2
BID 2
L2 cache miss
Mispredicted
Misprediction recovery
BB3
BB4
Load A Load B .. JMP
Load A Load B ..
MSHR
Addr BID BPC WP



A
2
0
1
PC2
B
PC2
2
0
1
Load C ..
BB5
2
C
0
1
PC2
MSHR hit Wrong Path was useful. BPC 2 is stored
in WPUP cache.
15
Branch PC-Based WPUP (Prediction)

LBPC
PC 2
Taken
Not-taken
Fetch Gate?
Fetch Gate?
BB2
.. BR 2
PC2
Mispredicted
BB3
BB4
Load A Load B .. JMP
Load A Load B ..
WPUP Cache
Wrong-path Execution
Addr LRU



PC2


Load C ..
BB5

Hit Do not fetch gate.
16
Phase-based WPUP
  • Basic idea
  • Predict if the current phase will provide useful
    wrong-path memory references
  • If so, do not fetch gate

17
Phase-based WPUP
  • Implementation
  • Wrong Path Usefulness Counter (WPUC, 5bits)
  • Incremented for each useful wrong-path memory
    reference
  • Reset periodically
  • Do not fetch gate if WPUC gt threshold
  • BPC fields or WPUP cache not needed

18
Outline
  • Motivation
  • Mechanism
  • Experimental Evaluation
  • Conclusion

19
Simulation Methodology
  • Alpha ISA execution driven simulator
  • Baseline processor configuration
  • 2GHz, 8-wide issue, out-of-order, 128-entry ROB
  • Hybrid branch predictor (64K-entry gshare and
    64K-entry PAs)
  • 11 stages (minimum branch misprediction penalty)
  • 1MB, 8-way unified L2 cache
  • 32 L2 MSHRs, 300 cycle memory latency
  • Stream prefetcher
  • Wattch power model 100 nm, 1.2V technology
  • Mannes fetch gating
  • Gating threshold 3 low confidence branches
  • JRS confidence estimator (4K-entry, 4bit-MDC)
  • Tuned for the best energy-delay product
  • Branch Count-based fetch gating

BP Acc() 10099 9997 9795 9593 9390 9085 850
Threshold 18 16 13 12 11 7 3
20
Branch-Count Based Fetch Gating
Performance and energy savings are higher than
Mannes.
Mannes and our fetch gating degrade performance
of mcf and parser
21
WPUP Mechanisms
Improves performance and energy savings compared
to Mannes
Improves performance of mcf and parser
22
Hardware Cost
Performance-Aware Speculation Control
vs.Mannes Fetch Gating
Hardware cost Hardware cost Hardware cost
Fetch Gating WPUP Total
Manne 2049B - 2049B
FG-BR/PC-WPUP 6B 260B 266B
FG-BR/PHASE WPUP 6B 45B 51B
23
Comparison with Mannes Fetch Gating
WPUPs improve performance and energy efficiency
of Mannes
2.5 less performance degradation, 1.0 more
energy savings
24
Energy-Delay Product
Improves Energy-Delay Product (2.6 compared to
Mannes)
25
Conclusion
  • Performance-Aware Speculation Control
  • Branch count-based fetch gating
  • Simple and low cost.
  • Introduced Wrong Path Usefulness Prediction
  • Recovers performance loss due to fetch gating by
    executing useful wrong-path instructions.
  • Can be combined with other fetch gating
    mechanisms.
  • Reduces performance loss due to fetch gating and
    also saves energy.

26
Questions?
Write a Comment
User Comments (0)
About PowerShow.com