Drowsy Caches: Simple Techniques for Reducing Leakage Power - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Drowsy Caches: Simple Techniques for Reducing Leakage Power

Description:

Advanced Computer Architecture Lab, The University of Michigan ... accesses when in drowsy mode since unchecked accesses to a drowsy line could ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 25

Provided by: harshit

Category:

more less

Transcript and Presenter's Notes

Title: Drowsy Caches: Simple Techniques for Reducing Leakage Power

1
Drowsy Caches Simple Techniques for Reducing
Leakage Power
ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER
ARCHITECTURE 2002, VOL 29, pages 148-157

Authors
ARM Ltd
Krisztián Flautner,
Advanced Computer Architecture Lab, The
University of Michigan
Nam Sung Kim, Steve Martin, David Blaauw
Trevor Mudge
In-class presentation on 11/24/2008 by
Harshit Khanna (1200127817)

2
Outline

Summary
Motivation
Circuit Techniques
Traditional Circuit Techniques
Gated-VDD
ABB-MTCMOS
Dynamic VDD Scaling (DVS)
Comparison of various low-leakage circuit
techniques
Proposed circuit technique
Policies
Implementation of drowsy cache line
Additions to the traditional cache line
Basic working description
Working set characteristics
Observations
Results
Policy evaluation
Policy evaluation
Test Setup

3
Summary

Simplest policy cache lines are periodically
put into a low-power mode without regard to their
access histories - can reduce the caches static
power consumption by more than 80.
Total energy consumed in the cache can be reduced
by an average of 54.
Fraction of leakage energy is reduced from an
average of 76 in projected conventional caches
to an average of 50 in the drowsy cache.
Performance degradation - 9 for crafty lt 4
for equake.

4
Motivation

Speed density leakage (static)
power consumption
Leakage power accounts for 15-20 of the total
power on chips.
As processor technology moves below 0.1 micron,
static power consumption is set to increase
exponentially, setting static power consumption
on the path to dominating the total power used by
the CPU.
The on-chip caches are one of the main candidates
for leakage reduction since they contain a
significant fraction of the processors
transistors.

5
Circuit Techniques
6
Traditional Circuit Techniques

Gated-VDD
Working
Reduces the leakage power by using a high
threshold (high-Vt) transistor to turn off the
power to the memory cell when the cell is set to
low-power mode.
Advantages
Leakage significantly reduced.
Disadvantages
It loses any information stored in the cell when
switched into low-leakage mode.
Performance penalty.
Requires special high-Vt devices for the control
logic.

7
Traditional Circuit Techniques (contd.)

ABB-MTCMOS
Working
Threshold voltages of the transistors in the cell
are dynamically increased when the cell is set to
drowsy mode by raising the source to body voltage
of the transistors in the circuit.
Advantages
Leakage significantly reduced
Disadvantages
Supply voltage of the circuit is increased,
thereby offsetting some of the gain in total
leakage power.
Requires special high-Vt devices for the control
logic.

8
Dynamic VDD Scaling (DVS)

Disadvantages
Process variation dependent.
More noise susceptible.
Advantages
Retains cell information in low-power mode.
Fast switching between power modes.
Easy implementation.
More power reduction than ABB-MTCMOS.

9
Comparison of various low-leakage circuit
techniques

10
Proposed circuit technique

Choose between two different supply voltages in
each cache line.
DVS technique - used in the past to trade off
dynamic power consumption and performance.
Exploiting voltage scaling to reduce static power
consumption.
Due to short-channel effects in deep-submicron
processes, leakage current reduces significantly
with voltage scaling.

11
Policies
12
Implementation of the drowsy cache line

L1 drowsy data caches.
All lines in an L2 cache can be kept in drowsy
mode without significant impact on performance.

13
Additions to the cache line

word line gating circuit
prevent accesses when in drowsy mode since
unchecked accesses to a drowsy line could destroy
the memorys contents.
voltage controller
Determines operating voltage of an array of
memory cells in the cache line
It switches the array voltage between the high
(active) and low (drowsy) supply voltages
depending on the state of the drowsy bit.
drowsy bit
Controlling the voltage to the memory cells

14
Basic working description

If a drowsy cache line is accessed, the drowsy
bit is cleared, and consequently the supply
voltage is switched to high VDD.
The wordline gating circuit is used to prevent
accesses when in drowsy mode, since the supply
voltage of the drowsy cache line is lower than
the bit line precharge voltage unchecked
accesses to a drowsy line could destroy the
memorys contents.
Whenever a cache line is accessed, the cache
controller monitors the condition of the voltage
of the cache line by reading the drowsy bit.
If (accessed line normal mode)
Then read the contents of the cache line (without
losing any performance because the power mode of
the line can be checked by reading the drowsy bit
concurrently with the read and comparison of the
tag).
If (accessed line drowsy mode)
Then prevent the discharge of the bit lines of
the memory array (because it may read out
incorrect data).
The line is woken up automatically during the
next cycle, and the data can be accessed during
consecutive cycles.

15
Working set characteristics

ExecFactor - expected worst-case execution time
increase for the baseline algorithm accs - the
number of accesses wakelatency - wakeup latency
1 cycle accsperline - number of accesses per
line Memimpact (how much impact a single memory
access has on overall performance) assumption
increase in cache access latency increase in
execution time So memimpact is set to 1
16
Observations
Should tags be put into drowsy mode along with
the data?

In both cases, no extra latencies are involved
when an awake line is accessed

A drowsy access takes at least three cycles to
complete

In direct-mapped caches there is no performance
advantage to keeping the tags awake. There is
only one possible line for each index, thus if
that line is drowsy, it needs to be woken up
immediately to be accessed.

17
Results

The fraction of unique cache lines accessed
during an update windowis relatively small.
On most benchmarks more than 90 of the lines can
be in drowsy mode at any one time.
Performance degradation - 9 for crafty lt 4
for equake.
Advantages
Significantly reduce the static power consumption
of the cache
prediction techniques to control the drowsy cache
not necessary if drowsy cache can transition
between drowsy and awake modes relatively
quickly.

18
Policy evaluation
19
Policy evaluation

The following parameters can be varied
Update window size specifies in cycles how
frequently decisions are made about which lines
to put into drowsy mode.
Simple or Noaccess policy The policy that uses
no perline access history is referred to as the
simple policy. In this case, all lines in the
cache are put into drowsy mode periodically (the
period is the window size). The noaccess policy
means that only lines that have not been accessed
in a window are put into drowsy mode.
Awake or drowsy tag specifies whether tags in
the cache may be drowsy or not.
Transition time the number of cycles for
waking up or putting to sleep cache lines. They
only consider 1 or 2 cycle transition times,
since the circuit simulations indicate that these
are reasonable assumptions.

20
Test setup

They use various benchmarks from the SPEC2000
suite on SimpleScalar using the Alpha instruction
set.
All simulations were run for 1 billion
instructions.
The simulator configuration parameters are
summarized below
OO4 4-wide superscalar pipeline, 32K
direct-mapped L1 icache, 32 byte line size - 1
cycle hit latency, 32K 4-way set associative L1
dcache, 32 byte line size - 1 cycle hit latency,
8 cycle L2 cache latency.
IO2 2-wide in-order pipeline, cache parameters
same as for OO4.

21
Energy consumption

The authors find that the simple policy with a
window size of 4000 cycles reaches a reasonable
compromise between simplicity of implementation,
power savings, and performance.
The impact of this policy on leakage energy is
characterized by
Normalized total energy - the ratio of total
energy used in the drowsy cache divided by the
total energy consumed in a regular cache.
Normalized leakage energy - the ratio of leakage
energy in the drowsy cache to leakage energy in a
normal cache.
The data in the DVS columns - energy savings
resulting from the scaled-VDD (DVS) circuit
technique.
The data in the theoretical minimum column -
assumes that leakage in low-power mode can be
reduced to zero (without losing state). i.e. it
estimates the energy savings given the best
possible hypothetical circuit technique.

Drowsy cache implementation reduces the total
energy consumed in the data
cache by more than 50 without significantly
impacting performance.
Total leakage energy is reduced by
- average of 71 when tags are always awake.
- average of 76 using the drowsy tag scheme.

23
Future work

The proposed scheme is not a solution to all
caches in the processor.
L1 instruction cache does not do as well with the
proposed algorithm.
Investigate the use of instruction prefetch
algorithms combined with the drowsy circuit
technique.
Extension of these techniques to other memory
structures, such as branch predictors.
Impact of having adaptive window size.

24
Thank youQuestions?

Write a Comment

User Comments (0)