Title: Fast Configurable-Cache Tuning with a Unified Second-Level Cache
1Fast Configurable-Cache Tuning with a Unified
Second-Level Cache
- Ann Gordon-Ross and Frank Vahid
- Department of Computer Science and Engineering
- University of California, Riverside
- Also with the Center for Embedded Computer
Systems, UC Irvine
Nikil Dutt Center for Embedded Computer
Systems School for Information and Computer
Science University of California, Irvine
This work was supported by the U.S. National
Science Foundation and by the Semiconductor
Research Corporation
2Cache Hierarchy Optimizations
ARM920T(Segars 01)
- The cache hierarchy is a
good candidate for
optimizations - Applications require
highly diverse cache
configurations for optimal
energy consumption of the
cache subsystem - Over 50 energy savings possible in the cache
subsystem due to configuration Gordon-Ross 04
3Previous Cache Tuning Methodologies
- Previous methods limit configurability to
facilitate easier heuristic development
I
I
I
Tuner
Microprocessor
Tuner
Microprocessor
Main Memory
Main Memory
D
D
D
Single level cache subsystem with separate caches
- less than 50 configurations
Multi-level cache subsystem with separate caches
- a few hundred configurations
4Motivation
- Unified second level caches are commonplace in
desktop computers and are becoming increasingly
popular in embedded microprocessors - Current cache tuning heuristics do not directly
apply due to the complexity of tuning in the
presence of a unified second level of cache -
circular dependency - Search space explodes to 18,000 configurations
A change in any cache effects the performance of
all other caches in the hierarchy
L1 I
L2 U
L1 D
5Motivation
- We present an effective and efficient cache
tuning heuristic for a highly configurable cache
hierarchy including a unified second level of
cache.
I
Tuner
Microprocessor
U
Main Memory
D
6Level One Configurable Cache
- The base cache consists of 4 2KByte banks that
may individually be shutdown for size
configuration - Line size is
configurable - Way concatenation allows for
configurable associativity - For evaluation of energy
savings, we used a base cache
of size
8KB with a 32 byte line size and 4 way
associativity
Way shutdown
2 KB
2 KB
2 KB
2 KB
2 KB
2 KB
2 KB
2 KB
8 KBytes
4 KBytes
8 KBytes 2-way
2 KB
2 KB
2 KB
2 KB
Way concatenation
7Level Two Configurable Cache
- For maximum configurability, level two cache
utilized the Motorola MCORE style way management - Ways can be designated as instruction, data,
unified, or off - Line size is configurable
- For evaluation of energy savings, we used a base
cache size of 64 KB with a 64 byte line size and
4 fully unified ways
U-way
D-way
U-way
U-way
I-way
8Alternating Cache Exploration with Additive Way
Tuning (ACE-AWT)
Tune level one sizes
Tune level one line sizes
Tune level one associativities
D
I
D
D
I
I
Tune level two associativity
Tune level two line size
Tune level two size
D
These steps are difficult because changing size
and associativity is synonymous in a way
management style cache
9ACE-AWT - First Phase
- The first phase is applied during size exploration
DONE
10ACE-AWT - Fine Tuning Phase
- The fine tuning phase is applied during
associativity exploration
Start with resulting cache from the first phase
DONE
11Results - Energy Savings
- Heuristic achieved near optimal results (when
optimal could be computed) - 62energy savings compared to base cache
- Yet only searched 0.2 of the search space
- Also improved performance by 35 compared to base
cache due to tuned line sizes
12Conclusions and Future Work
- We developed an efficient and effective cache
tuning heuristic to tune a two level cache with a
unified second level of cache - 18,000 possible configurations
- Compared to a reasonable base cache
configuration - 62 energy savings
- Explores only 0.2 of the search space
- 35 improvement in performance
- Future work includes application of the tuning
heuristic to different execution phases in the
application