Title: Configurable Cache Subsetting for Fast Cache Tuning
1Configurable Cache Subsetting for Fast Cache
Tuning
- Pablo Viana1, Ann Gordon-Ross2, Eamonn Keogh2,
- Edna Barros1, Frank Vahid2
1Computer Science Institute CINFederal
University of Pernambuco - UFPE,
Brazil 2Department of Computer Science and
Engineering University of California, Riverside
This work was supported in part by the Capes
Foundation BEX 1366/04-1
2Outline
- Introduction
- Configurable cache tuning
- Cache configuration space
- Configuration space subsetting problem
- An exhaustive analysis
- A heuristic to subset the configuration space
- Applying the subsetting method to a 2-level cache
- Future work on other configurable platforms
3Introduction
- Caches can improve system performance. However,
caches are power hungry. - Thus, the cache is a good candidate for
optimization
Power analysis of ARM970TS. Segars (ISSCC, 2001)
4Motivation
- Tuning cache parameters (Total size, Line size,
Associativity) to an application can reduce
energy by 60 on average.
Microprocessor
Cache
Energy
Main Memory
Possible Cache Configurations
5Cache Configuration Tuning
Application 1
Application 2
Application 3
Application 4
Application 5
...
Application N
Cache Configuration Space
6Related Work
- Configurable caches
- Soft-core processors
- ARM
- MIPS
- Tensillica
- etc.
- Hard-core processors
- Motorola MCore (Malik, ISLPED00)
- Albonesi (MICRO00)
- Zhang (ISCA03)
- Configurable cache tuning
- Mostly done manually in practice
- Sub-optimal
- Time-consuming
7Related Work
- Automated Methods for Cache Tuning
- Methods and heuristics for Design Space
Exploration - Single-level caches (Tens of configurations)
- Platune (Givargis TCAD02, Palesi CODES02)
- Zhang (RSP03)
- Two-level caches (Hundreds to thousands of
configurations) - Tcat (Gordon-Ross, DATE04)
Level 2
Level 1
Total size Line size Associativity
2500 configs
Total size Line size Associativity
Say 50 configs.
Say 50 configs.
8Problem Context
- Do we really need such a large number of cache
configurations? - Could just few carefully-chosen configurations
adequately cover the large configuration space? - A smaller configuration space would be easier to
explore (via simulation-based or dynamic tuning)
9Problem Context
- Potential scenario
- A configurable microprocessor vendor pre-selects
a subset of configurations for each particular
application domain. - The user selects the most appropriate domain for
a target application and examine only the
pre-selected subset.
10Cache Configuration Tuning
Energy to run a given application on different
configurations
11Cache Configuration Subsetting
Application 1
Application 2
Application 3
Application 4
Application 5
...
Application N
Cache Configuration Space
Near optimal cache tuning Average energy
increases
12Cache Configuration Subsetting
- Problem Definition
- Identify the subset of configurations that
adequately covers the configuration space for the
application domain. - We state that p configurations are necessary to
cover a space of m points. - Exhaustive Approach
- Select the subset of p configurations from the
space m - Criterion for selection
- Choosing the subset which keep the average
energy of the tuned cache nearest to the optimal.
13Experimental Setup
- Configurable cache architecture for initial
experiments - Our target configurable cache architecture is
based on Zhang/Vahid/Najjars Highly-Configurable
Cache Architecturefor Embedded Systems, ISCA
2003 - We set the base cache to suport the following 18
configurations - Energy model for estimation
14Exhaustive Approach
- Subsets of size p18 down to 1 were chosen
through the exhaustive evaluation of the average
energy increase.
15Exhaustive Approach
- Good results, but exhaustively determining the
subsets requires too much computation
- For m18 and 1 lt p lt 18 it gives us 262,143
possible combinations. - Thats too expensive!
- We need a more efficient way to find the subset
of configurations...
16Looking for a Heuristic
- First attempt
- Choose the p configurations that offered optimal
energy for the largest number of applications.
Energy increase
17Looking for a Heuristic
- Second attempt
- Hierarchical clustering according to the
similarity between configurations on their
average energy savings.
Energy increase
18Similar Problem
- We found a problem similar to our subsetting
puzzle. - The problem of segmenting time series.
19Similar Problem
- The method proposed by Keoghs can be applied, to
reduce the number of primary color for displaying
an image - (IEEE Int. Conf. On Data Mining, 2001)
- Color/nuances of the image are merged according
to the associated error/difference in the final
image.
20Adapting to the Subsetting Problem
- Similarly, we may iteratively discard
configurations by merging them. - By merging two configs cj and ck into ck means
that all applications which were tuned by cj, now
use ck.
e(cj,ai)
21Keoghs Heuristic
- All the possible merges of two adjacent
(neighboring) configurations in the space are
evaluated to find the best merging.
c7
c1
c7
c2
c13
c1
22Keoghs Heuristic
23Comparing the Results
- We went on merging the configuration space while
the energy increase remains under 5 (4 configs).
24Two-level Configurable Cache
Average energy increase 3.36
25Where Else this Method Could be Applied?
- Other parameterized IP cores
- Buses, I/O devices (Word size, bandwidth, etc).
- Parameterized Platforms,
- Processors functional units, cache, bus,
multiplier, IPs. - Optimization for low energy, area, performance
and others.
26Conclusions
- The configurable cache subsetting problem was
presented - A Data Mining algorithm for segmenting time
series was adapted to tackle the cache subsetting
problem - Subsetting the Two-level cache configuration
space, keeping the average energy increase under
5 takes - Keoghs heuristic less than 1 minute
- Exhaustively around 53 hours.
- Thanks to the versatility of the proposed method,
our next step is to apply it for Platform
customization.
Thank you !