Title: A highly Configurable Cache Architecture for Embedded Systems
1A highly Configurable Cache Architecture for
Embedded Systems
Chuanjun Zhang, Frank Vahid, and Walid
Najjar University of California,
Riverside The Center for Embedded Computer
Systems at UC Irvine ISCA 2003 Reviewer Liang
Zhu
2Outline
- Why a Configurable Cache?
- Configurable Associativity by Way Concatenation
- Configurable Way Concatenation and Way Shutdown
- Conclusions
3Computing Total Memory-Related Energy
- Considers CPU stall energy and off-chip memory
energy - Excludes CPU active energy
- Thus, represents all memory-related energy
energy_mem energy_dynamic energy_static
energy_dynamic cache_hits energy_hit
cache_misses energy_miss energy_static
cycles energy_static_per_cycle
- Underlined measured quantities
- SimpleScalar (cache_hits, cache_misses, cycles)
4Why Choose Cache Impacts Performance and Power
- Performance impacts are well known
- Power
- ARM920T Caches consume 50 of total processor
system power (Segars 01) - MCORE Unified cache consumes 50 of total
processor system power (Lee/Moyer/Arends 99)
5Cache Associativity
- Reduces miss rate thus improving performance
- Impact on power and energy?
6Associativity is Costly
- Associativity improves hit rate, but at the cost
of more power per access - Are the power savings from reduced misses
outweighed by the increased power per hit?
Energy per access for 8 Kbyte cache
7Associativity and Energy
- Best performing cache is not always lowest energy
8Associativety Dilemma
- Direct mapped cache
- Poor hit rate on most examples
- But Low power per access
- Four-way set-associative cache
- Good hit rate on nearly all examples
- But high power per access
9So Whats the Best Cache?
- Looking at popular embedded processors, theres
obviously no standard cache - Dilemma
- Direct mapped good performance and energy for
most programs - Four-way good performance for all programs, but
at cost of higher power per access for all
programs - Whether to design for the average case or the
worst case?
10Solution to the Dilemma
- Configurable cache can be configured as four way,
two way, or one way - Four-way set-associative base cache Ways can be
concatenated to form two-way - Can be further concatenated to direct-mapped
Way 1
Way 2
Way 3
Way 4
four-way
11Original Cache Layout
12Configurable Cache Design Way Concatenation
13Analyzing the Results
- Simulated the circuit in Cadences Spectra
- Note energy savings with reduction of ways
- Concerns over access time addressed
- With transistor sizing match delay, 1 area
14Way Concatenate Experiments
- Experiment
- Motorola PowerStone benchmark g3fax
- Considering dynamic power only
- L1 access energy, CPU stall energy, memory access
energy
15Previous Method Way Shutdown
- Albonesi proposed a cache where ways could be
shut down - -To save dynamic power
- Motorola MCORE has same way-shutdown feature
- Unified cache even allows setting each way as
I, D, both, or off
Way 1
Way 2
Way 3
Way 4
- Reduces dynamic power by accessing fewer ways
- But, decreases total size, so may increase miss
rate
16Experimental results
17Way Shutdown Can be Good for Static Power
- Static power (leakage) increasingly important in
nanoscale technologies - We combine way shutdown with way concatenate
- Use sleep transistor method of Powell (ISLPED
2000)
When off, prevents leakage. But 5 area overhead,
and increase 8 performance
18Way Concatenate Plus Way Shutdown
100 4-way conventional cache
19Conclusions
- have introduced a novel configurable cache design
- method called way concatenation.
- For dynamic power, way concatenation shows
average energy savings of 40 compared to a
conventional four-way set-associative - way concatenation to be superior to previously
proposed way shutdown methods - A configurable cache with way concatenation, way
shutdown can save a lot of energy
20