Drowsy Caches: Simple Techniques for Reducing Leakage Power - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Drowsy Caches: Simple Techniques for Reducing Leakage Power

Description:

Advanced Computer Architecture Lab, The University of Michigan ... accesses when in drowsy mode since unchecked accesses to a drowsy line could ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 25
Provided by: harshit
Category:

less

Transcript and Presenter's Notes

Title: Drowsy Caches: Simple Techniques for Reducing Leakage Power


1
Drowsy Caches Simple Techniques for Reducing
Leakage Power
ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER
ARCHITECTURE 2002, VOL 29, pages 148-157
  • Authors
  • ARM Ltd
  • Krisztián Flautner,
  • Advanced Computer Architecture Lab, The
    University of Michigan
  • Nam Sung Kim, Steve Martin, David Blaauw
    Trevor Mudge
  • In-class presentation on 11/24/2008 by
  • Harshit Khanna (1200127817)

2
Outline
  • Summary
  • Motivation
  • Circuit Techniques
  • Traditional Circuit Techniques
  • Gated-VDD
  • ABB-MTCMOS
  • Dynamic VDD Scaling (DVS)
  • Comparison of various low-leakage circuit
    techniques
  • Proposed circuit technique
  • Policies
  • Implementation of drowsy cache line
  • Additions to the traditional cache line
  • Basic working description
  • Working set characteristics
  • Observations
  • Results
  • Policy evaluation
  • Policy evaluation
  • Test Setup

3
Summary
  • Simplest policy cache lines are periodically
    put into a low-power mode without regard to their
    access histories - can reduce the caches static
    power consumption by more than 80.
  • Total energy consumed in the cache can be reduced
    by an average of 54.
  • Fraction of leakage energy is reduced from an
    average of 76 in projected conventional caches
    to an average of 50 in the drowsy cache.
  • Performance degradation - 9 for crafty lt 4
    for equake.

4
Motivation
  • Speed density leakage (static)
    power consumption
  • Leakage power accounts for 15-20 of the total
    power on chips.
  • As processor technology moves below 0.1 micron,
    static power consumption is set to increase
    exponentially, setting static power consumption
    on the path to dominating the total power used by
    the CPU.
  • The on-chip caches are one of the main candidates
    for leakage reduction since they contain a
    significant fraction of the processors
    transistors.


5
Circuit Techniques
6
Traditional Circuit Techniques
  • Gated-VDD
  • Working
  • Reduces the leakage power by using a high
    threshold (high-Vt) transistor to turn off the
    power to the memory cell when the cell is set to
    low-power mode.
  • Advantages
  • Leakage significantly reduced.
  • Disadvantages
  • It loses any information stored in the cell when
    switched into low-leakage mode.
  • Performance penalty.
  • Requires special high-Vt devices for the control
    logic.

7
Traditional Circuit Techniques (contd.)
  • ABB-MTCMOS
  • Working
  • Threshold voltages of the transistors in the cell
    are dynamically increased when the cell is set to
    drowsy mode by raising the source to body voltage
    of the transistors in the circuit.
  • Advantages
  • Leakage significantly reduced
  • Disadvantages
  • Supply voltage of the circuit is increased,
    thereby offsetting some of the gain in total
    leakage power.
  • Requires special high-Vt devices for the control
    logic.

8
Dynamic VDD Scaling (DVS)
  • Disadvantages
  • Process variation dependent.
  • More noise susceptible.
  • Advantages
  • Retains cell information in low-power mode.
  • Fast switching between power modes.
  • Easy implementation.
  • More power reduction than ABB-MTCMOS.

9
Comparison of various low-leakage circuit
techniques

10
Proposed circuit technique
  • Choose between two different supply voltages in
    each cache line.
  • DVS technique - used in the past to trade off
    dynamic power consumption and performance.
  • Exploiting voltage scaling to reduce static power
    consumption.
  • Due to short-channel effects in deep-submicron
    processes, leakage current reduces significantly
    with voltage scaling.

11
Policies
12
Implementation of the drowsy cache line
  • L1 drowsy data caches.
  • All lines in an L2 cache can be kept in drowsy
    mode without significant impact on performance.

13
Additions to the cache line
  • word line gating circuit
  • prevent accesses when in drowsy mode since
    unchecked accesses to a drowsy line could destroy
    the memorys contents.
  • voltage controller
  • Determines operating voltage of an array of
    memory cells in the cache line
  • It switches the array voltage between the high
    (active) and low (drowsy) supply voltages
    depending on the state of the drowsy bit.
  • drowsy bit
  • Controlling the voltage to the memory cells

14
Basic working description
  • If a drowsy cache line is accessed, the drowsy
    bit is cleared, and consequently the supply
    voltage is switched to high VDD.
  • The wordline gating circuit is used to prevent
    accesses when in drowsy mode, since the supply
    voltage of the drowsy cache line is lower than
    the bit line precharge voltage unchecked
    accesses to a drowsy line could destroy the
    memorys contents.
  • Whenever a cache line is accessed, the cache
    controller monitors the condition of the voltage
    of the cache line by reading the drowsy bit.
  • If (accessed line normal mode)
  • Then read the contents of the cache line (without
    losing any performance because the power mode of
    the line can be checked by reading the drowsy bit
    concurrently with the read and comparison of the
    tag).
  • If (accessed line drowsy mode)
  • Then prevent the discharge of the bit lines of
    the memory array (because it may read out
    incorrect data).
  • The line is woken up automatically during the
    next cycle, and the data can be accessed during
    consecutive cycles.

15
Working set characteristics

ExecFactor - expected worst-case execution time
increase for the baseline algorithm accs - the
number of accesses wakelatency - wakeup latency
1 cycle accsperline - number of accesses per
line Memimpact (how much impact a single memory
access has on overall performance) assumption
increase in cache access latency increase in
execution time So memimpact is set to 1
16
Observations
Should tags be put into drowsy mode along with
the data?
  • In both cases, no extra latencies are involved
    when an awake line is accessed
  • A drowsy access takes at least three cycles to
    complete
  • In direct-mapped caches there is no performance
    advantage to keeping the tags awake. There is
    only one possible line for each index, thus if
    that line is drowsy, it needs to be woken up
    immediately to be accessed.

17
Results
  • The fraction of unique cache lines accessed
    during an update windowis relatively small.
  • On most benchmarks more than 90 of the lines can
    be in drowsy mode at any one time.
  • Performance degradation - 9 for crafty lt 4
    for equake.
  • Advantages
  • Significantly reduce the static power consumption
    of the cache
  • prediction techniques to control the drowsy cache
    not necessary if drowsy cache can transition
    between drowsy and awake modes relatively
    quickly.

18
Policy evaluation
19
Policy evaluation
  • The following parameters can be varied
  • Update window size specifies in cycles how
    frequently decisions are made about which lines
    to put into drowsy mode.
  • Simple or Noaccess policy The policy that uses
    no perline access history is referred to as the
    simple policy. In this case, all lines in the
    cache are put into drowsy mode periodically (the
    period is the window size). The noaccess policy
    means that only lines that have not been accessed
    in a window are put into drowsy mode.
  • Awake or drowsy tag specifies whether tags in
    the cache may be drowsy or not.
  • Transition time the number of cycles for
    waking up or putting to sleep cache lines. They
    only consider 1 or 2 cycle transition times,
    since the circuit simulations indicate that these
    are reasonable assumptions.

20
Test setup
  • They use various benchmarks from the SPEC2000
    suite on SimpleScalar using the Alpha instruction
    set.
  • All simulations were run for 1 billion
    instructions.
  • The simulator configuration parameters are
    summarized below
  • OO4 4-wide superscalar pipeline, 32K
    direct-mapped L1 icache, 32 byte line size - 1
    cycle hit latency, 32K 4-way set associative L1
    dcache, 32 byte line size - 1 cycle hit latency,
    8 cycle L2 cache latency.
  • IO2 2-wide in-order pipeline, cache parameters
    same as for OO4.

21
Energy consumption
  • The authors find that the simple policy with a
    window size of 4000 cycles reaches a reasonable
    compromise between simplicity of implementation,
    power savings, and performance.
  • The impact of this policy on leakage energy is
    characterized by
  • Normalized total energy - the ratio of total
    energy used in the drowsy cache divided by the
    total energy consumed in a regular cache.
  • Normalized leakage energy - the ratio of leakage
    energy in the drowsy cache to leakage energy in a
    normal cache.
  • The data in the DVS columns - energy savings
    resulting from the scaled-VDD (DVS) circuit
    technique.
  • The data in the theoretical minimum column -
    assumes that leakage in low-power mode can be
    reduced to zero (without losing state). i.e. it
    estimates the energy savings given the best
    possible hypothetical circuit technique.

22
  • Drowsy cache implementation reduces the total
    energy consumed in the data
  • cache by more than 50 without significantly
    impacting performance.
  • Total leakage energy is reduced by
  • - average of 71 when tags are always awake.
  • - average of 76 using the drowsy tag scheme.

23
Future work
  • The proposed scheme is not a solution to all
    caches in the processor.
  • L1 instruction cache does not do as well with the
    proposed algorithm.
  • Investigate the use of instruction prefetch
    algorithms combined with the drowsy circuit
    technique.
  • Extension of these techniques to other memory
    structures, such as branch predictors.
  • Impact of having adaptive window size.

24
Thank youQuestions?
Write a Comment
User Comments (0)
About PowerShow.com