Drowsy%20Caches%20Simple%20Techniques%20for%20Reducing%20Leakage%20Power - PowerPoint PPT Presentation

About This Presentation
Title:

Drowsy%20Caches%20Simple%20Techniques%20for%20Reducing%20Leakage%20Power

Description:

Drowsy Caches Simple Techniques for Reducing Leakage Power Kriszti n Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge krisztian.flautner_at_arm.com – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 17
Provided by: Krisz9
Category:

less

Transcript and Presenter's Notes

Title: Drowsy%20Caches%20Simple%20Techniques%20for%20Reducing%20Leakage%20Power


1
Drowsy CachesSimple Techniques for Reducing
Leakage Power
  • Krisztián Flautner
  • Nam Sung Kim
  • Steve Martin
  • David Blaauw
  • Trevor Mudge

krisztian.flautner_at_arm.com kimns_at_eecs.umich.edu st
evenmm_at_eecs.umich.edu blaauw_at_eecs.umich.edu tnm_at_ee
cs.umich.edu
2
Motivation
  • Ever increasing leakage power
  • as feature size shrinks
  • Vt scales down
  • exponential increase in leakage power
  • On-chip caches
  • responsible for 1520 of the total power
  • leakage power can exceed 50 of total cache power
    according to our projection using Berkeley
    Predictive Models

3
Processor power trends
  • Based on ITRS roadmap and transistor count
    estimates.
  • Total power in this projection cannot come true.

4
An observation about data caches
  • L1 data caches
  • Working set fraction of cache lines accessed in
    a time window.
  • Window size 2000 cycles.
  • Only a small fraction of lines are accessed in a
    window.

Working set of current 1, 8, and 32 previous
windows
Working set of current window
5
The Drowsy Cache approach
Instead of being sophisticated about predicting
the working set, reduce the penalty for being
wrong.
  • Algorithm
  • Periodically put all lines in cache into drowsy
    mode.
  • When accessed, wake up the line.
  • Optimize across circuit-microarchitecture
    boundary
  • Use of the appropriate circuit technique enables
    simplified microarchitectural control.
  • Requirement state preservation in low leakage
    mode.

6
Access control flow Awake tags
Awake tags
Awake tag match
Line wake up
Line access
Hit
Awake tag miss
Line wake up
Miss
Memory
Replacement
  • Drowsy hit / miss adds at most 1 cycle latency
  • Access to awake line is not penalized

7
Access control flow Drowsy tags
Drowsy tags
Awake tag match
Line wake up
Line access
Tag wake up
Hit
Awake tag miss
Line wake up
Tag wake up
Unneeded tags and lines back to drowsy
Miss
Memory
Replacement
  • Drowsy tags implementation is more complicated
  • Is the complexity worth it?
  • Tags use about 7 of data bits (32 bit address)
  • Only small incremental leakage reduction
  • Worst case 3 cycle extra latency

8
Low-leakage circuit techniques
Circuit Pros Cons
Gated-VDD Largest leakage reduction Fast mode switching Easy implementation Loses cell state
ABB-MTCMOS Retains cell state Slow mode switching
DVS Retains cell state Fase mode switching More power reduction than ABB More SEU noise susceptible
9
Drowsy memory using DVS
  • Low supply voltage for inactive memory cells
  • Low voltage reduces leakage current too!
  • Quadratic reduction in leakage power

supply voltage for normal mode
leakage path
supply voltage for drowsy mode
10
Leakage reduction using DVS
  • High-Vt devices for access transistors
  • reduce leakage power
  • increase access time of cache
  • Right Trade-off point
  • 91 leakage reduction
  • 6 cycle time increase

Projections for 0.07µm process
11
Drowsy cache line architecture
12
Energy reduction
  • Projections for 0.07µm process
  • High leakage lines have to be powered up when
    accessed.
  • Drowsy circuit
  • Without high vt device (in SRAM) 6x leakage
    reduction, no access delay.
  • With high vt device 10x leakage reduction, 6
    access time increase.

13
1 cycle vs. 2 cycle wake up
  • Fast wakeup is important but easy to accomplish
    !
  • Cache access time 0.57ns (for 0.07µm from CACTI
    using 0.18µm baseline).
  • Speed dependent on voltage controller size 64 x
    Leff 0.28ns (half cycle at 4 GHz), 32 x Leff
    0.42ns, 16 x Leff 0.77ns.
  • Impact of drowsy tags are quite similar to
    double-cycle wake up.

14
Policy comparison
15
Energy reduction
Normalized Total Energy Normalized Total Energy Normalized Leakage Energy Normalized Leakage Energy Run-time increase
DVS Theoretical min. DVS Theoretical min. Run-time increase
Awake tags 0.46 0.35 0.29 0.15 0.41
Drowsy tags 0.42 0.31 0.24 0.09 0.84
  • Theoretical minimum assumes zero leakage in
    drowsy mode
  • Total energy reduction within 0.1 of theoretical
    minimum
  • Diminishing returns for better leakage reduction
    techniques
  • Above figures assume 6x leakage reduction, 10x
    possible with small additional run-time impact

16
Conclusions
  • Simple circuit technique
  • Need high-Vt transistors, low Vdd supply
  • Simple architecture
  • No need to keep counter/predictor state for each
    line
  • Periodic global counter asserts drowsy signal
  • Window size (for periodic drowsy transition)
    depends on core 4000 cycles has good E-delay
    trade-off
  • Technique also works well on in-order procesors
  • Memory subsystem is already latency tolerant
  • Drowsy circuit is good enough
  • Diminishing returns on further leakage reduction
  • Focus is again on dynamic energy
Write a Comment
User Comments (0)
About PowerShow.com