CS 7810 Lecture 13 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 7810 Lecture 13

Description:

A high PVN value can be achieved by using N low-confidence branches. to invoke gating if PVN is 30%, re-defining low-confidence as two ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 22
Provided by: rajeevbala
Category:
Tags: lecture | register

less

Transcript and Presenter's Notes

Title: CS 7810 Lecture 13


1
CS 7810 Lecture 13
Pipeline Gating Speculation Control For Energy
Reduction S. Manne, A. Klauser, D.
Grunwald Proceedings of ISCA-25 June 1998
2
Cost of Speculation
9.9
12.2
23.9
10.4
6.9
4.6
11.3
1.7
Mispredict rates ?
3
Pipeline Gating
  • Low confidence branches throttle instr fetch
    until they are resolved
  • Pipeline gating usually lasts for fewer than
    five cycles

4
Metrics
  • SPEC (specificity) fraction of all mispredicted
  • branches detected as low-confidence by the
  • confidence estimator (coverage)
  • PVN (predictive value of a negative test)
    probability
  • of a low-confidence branch being incorrectly
  • branch-predicted (accuracy)

5
Confidence Estimators
  • Perfect to gauge potential benefits
  • Static branches that have low prediction rates
  • JRS if a branch has yielded N successive
    correct
  • predictions, it has high confidence
  • Saturating counters unbiased counter value or
  • disagreement in two predictors ? low confidence
  • Distance mpreds are clustered, hence the first
    4
  • branches after a mispredict have low confidence

6
SPEC and PVN
SPEC (coverage) mispred branches detected by
low-confidence estimator PVN (accuracy) of
low-confidence branches that are branch mpreds
  • It is easier to achieve a high SPEC value than
    PVN
  • A high PVN value can be achieved by using N
    low-confidence branches
  • to invoke gating if PVN is 30, re-defining
    low-confidence as two
  • low-confidence branches increases PVN to 51

7
Perfect
8
Gating Results
9
Results
  • Can gating improve performance? only if cache
  • pollution is significant
  • Less than 1 performance loss and up to 38
  • reduction in extra work
  • Energy consumption could go up some work is
  • independent of number of executed instrs (clock
  • distribution) incr. execution time can incr.
    Energy
  • Pipeline gating should reduce power consumption

10
Results
11
CS 7810 Lecture 13
Cache Decay Exploiting Generational Behavior to
Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M.
Martonosi Proceedings of ISCA-28 July 2001
12
Leakage Power Trends
  • Circuit delay a 1/(V Vth)
  • Leakage a num transistors (incr)
  • supply voltage (decr)
  • (exp) low thresh. voltage (incr)
  • L1 and L2 caches are the biggest
  • contributors (high transistor budgets)

13
Vdd-Gating
  • Leakage can be reduced by gating off the
  • supply voltage to the circuit
  • When applied to a cache, the contents of the
  • SRAM cell are lost
  • Cache decay apply Vdd-gating when you do not
  • care about cache contents

14
Lifetime of a Cache Line
15
Overheads
  • Hardware to determine when to decay
  • Introduces additional cache misses
  • Normalized cache leakage power
  • Activeratio (fraction of cache that is powered
    on)
  • (Counter overhead Leak) x activity
  • (L2 access energy Leak) x num-misses
  • Increased execution time (lt 0.7)
  • L2 access/leakage ratio is 9

16
Skiers Dilemma
New skis 400 Ski rentals
20 Heuristic Buy skis after rental cost
purchase price Ski trips 5 10 15 20
25 50 Optimal 100 200 300
400 400 400 Heuristic 100 200
300 800 800 800 Likewise, decay a
cache line when the cost of an additional miss
equals leakage dissipated so far
17
Tracking Dead Time
  • Each line has a 2-bit counter that gets reset on
  • every access and gets incremented every 2500
  • cycles through a global signal (negligible
    overhead)
  • After 10,000 clock cycles, the counter reaches
  • the max value and triggers a decay
  • Adaptive decay Start with a short decay period
  • if you have a quick miss, double the period if
    there
  • is no miss, halve the period

18
Results
19
Overheads
20
Other Results
  • L2 cache is equally suitable to decay techniques
  • -- lifetimes are scaled by a factor of 10, an
    extra
  • miss also costs a lot more
  • For their experiments, there is little
    interference
  • from multiprogramming
  • Some instructions can easily be identified as
  • last touches to a cache block potential for
    early
  • cache decay
  • Can this apply to bpred, register file?

21
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com