Title: Low Static-Power Frequent-Value Data Caches
1Low Static-Power Frequent-Value Data Caches
- Chuanjun Zhang, Jun Yang, and Frank Vahid
- Dept. of Electrical Engineering
- Dept. of Computer Science and Engineering
- University of California, Riverside
- Also with the Center for Embedded Computer
Systems at UC Irvine - This work was in part supported by the National
Science Foundation and the Semiconductor Research
Corporation
2Leakage Power Dominates
- Growing impact of leakage power
- Increase of leakage power due to scaling of
transistors lengths and threshold voltages. - Power budget limits use of fast leaky
transistors. - Cache consumes much static power
- Caches account for the most of the transistors on
a die. - Related work
- DRGdynamically resizes cache by monitoring the
miss rate. - Cache line decay dynamically turns off cache
lines. - Drowsy cache low leakage mode.
3Frequent Values in Data Cache
(J. Yang and R. Gupta Micro 2002)
Microprocessor
data
data
data
data
data
data
data
address
address
address
address
address
address
address
L1 DATA CACHE
- Frequently accessed values behavior
4Frequent Values in Data Cache
(J. Yang and R. Gupta Micro 2002)
- 32 FVs account for around 36 of the total data
cache accesses for 11 Spec 95 Benchmarks. - FVs can be dynamically captured.
- FVs are also widespread within data cache
- Not just accesses, but also stored throughout.
- FVs are stored in encoded form.
- 4 or 5 bits represent 16 or 32 FVs.
- Non-FVs are stored in unencoded form.
- The set of frequent values remains fixed for a
given program run.
FVs
00000000
00000000
00000000
00000000
00100000
00000000
00000000
00000000
FFFFFFFF
FFFFFFFF
FFFFFFFF
FVs accessed
00000000
00000000
00100000
00000000
FF000000
FFFFFFFF
FFFFFFFF
FFFFFFFF
FVs in D
5Original Frequent Value Data Cache Architecture
- Data cache memory is separated as low-bit and
high-bit array. - 5 bits encodes 32 FVs.
- 27 bits are not accessed for FVs.
- A register file holds the decoded lines.
- Dynamic power is reduced.
- Two cycles when accessing Non-FVs.
- Flag bit 1-FV 0-NFV
6New FV Cache Design One Cycle Access to Non FV
- No extra delay in determining accesses of the
27-bit portion - Leakage energy proportion to program execution
time - New driver is as fast as the original by tuning
the NAND gates transistor parameters - Flag bit 0-FV 1-NFV
32 bits
driver
27 bits
decoder output
5 bits
Original cache line architecture
new word line driver
original word line driver
7Low leakage SRAM Cell and Flag Bit
Â
Vdd
Bitline
Gated-Vdd Control
Bitline
Vdd
Bitline
Bitline
Flag bit output
Gated_Vdd Control
Gnd
Gnd
  SRAM cell with a pMOS gated Vdd control.
 Flag bit SRAM cell
8Experiments
- SimpleScalar.
- Eleven Spec 2000 benchmarks
- Fast Forward the first 1 billion and execute 500M
Configuration of the simulated processor.
9Performance Improvement of One Cycle to Non-FV
Hit rate of FVs in data cache. Â
- Two cycles impact performance hence increase
leakage power - One cycle access to Non FV achieves 5.5
performance improvement (and hence impacts
leakage energy correspondingly)
5.5
Performance (IPC) improvement of one-cycle FV
cache vs. two-cycle FV cache.
10Distribution of FVs in Data Cache
- FVs are widely found in data cache memory. On
average 49.2. - Leakage power reduction proportional to the
percentage occurrence of FVs
Percentage of data cache words (on average) that
are FVs.
11Static Energy Reduction
- 33 total static energy savings for data caches.
12How to Determine the FVs
- Application-specific processors
- The FVs can be first identified offline through
profiling, and then synthesized into the cache so
that power consumption is optimized for the hard
coded FVs. - Processors that run multiple applications
- The FVs can be located in a register file to
which different applications can write a
different set of FVs. - Dynamically-determined FVs
- Embed the process of identifying and updating FVs
into registers, so that the design dynamically
and transparently adapts to different workloads
with different inputs automatically.
13Conclusion
- Two improvements to the original FV data cache
- One cycle access to Non FVs
- Improve performance (5.5) and hence static
leakage - Shut off the unused 27 bits portion of a FV
- The scheme does not increase data cache miss rate
- The scheme further reduces data cache static
energy by over 33 on average