Configurable Cache Subsetting for Fast Cache Tuning - PowerPoint PPT Presentation

About This Presentation
Title:

Configurable Cache Subsetting for Fast Cache Tuning

Description:

1Computer Science Institute CIN. Federal University of Pernambuco - UFPE, Brazil ... Soft-core processors. ARM; MIPS; Tensillica. etc. Hard-core processors ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 27
Provided by: pablo79
Category:

less

Transcript and Presenter's Notes

Title: Configurable Cache Subsetting for Fast Cache Tuning


1
Configurable Cache Subsetting for Fast Cache
Tuning
  • Pablo Viana1, Ann Gordon-Ross2, Eamonn Keogh2,
  • Edna Barros1, Frank Vahid2

1Computer Science Institute CINFederal
University of Pernambuco - UFPE,
Brazil 2Department of Computer Science and
Engineering University of California, Riverside
This work was supported in part by the Capes
Foundation BEX 1366/04-1
2
Outline
  • Introduction
  • Configurable cache tuning
  • Cache configuration space
  • Configuration space subsetting problem
  • An exhaustive analysis
  • A heuristic to subset the configuration space
  • Applying the subsetting method to a 2-level cache
  • Future work on other configurable platforms

3
Introduction
  • Caches can improve system performance. However,
    caches are power hungry.
  • Thus, the cache is a good candidate for
    optimization

Power analysis of ARM970TS. Segars (ISSCC, 2001)
4
Motivation
  • Tuning cache parameters (Total size, Line size,
    Associativity) to an application can reduce
    energy by 60 on average.

Microprocessor
Cache
Energy
Main Memory
Possible Cache Configurations
5
Cache Configuration Tuning
Application 1
Application 2
Application 3
Application 4
Application 5
...
Application N
Cache Configuration Space
6
Related Work
  • Configurable caches
  • Soft-core processors
  • ARM
  • MIPS
  • Tensillica
  • etc.
  • Hard-core processors
  • Motorola MCore (Malik, ISLPED00)
  • Albonesi (MICRO00)
  • Zhang (ISCA03)
  • Configurable cache tuning
  • Mostly done manually in practice
  • Sub-optimal
  • Time-consuming

7
Related Work
  • Automated Methods for Cache Tuning
  • Methods and heuristics for Design Space
    Exploration
  • Single-level caches (Tens of configurations)
  • Platune (Givargis TCAD02, Palesi CODES02)
  • Zhang (RSP03)
  • Two-level caches (Hundreds to thousands of
    configurations)
  • Tcat (Gordon-Ross, DATE04)

Level 2
Level 1

Total size Line size Associativity
2500 configs
Total size Line size Associativity
Say 50 configs.
Say 50 configs.
8
Problem Context
  • Do we really need such a large number of cache
    configurations?
  • Could just few carefully-chosen configurations
    adequately cover the large configuration space?
  • A smaller configuration space would be easier to
    explore (via simulation-based or dynamic tuning)

9
Problem Context
  • Potential scenario
  • A configurable microprocessor vendor pre-selects
    a subset of configurations for each particular
    application domain.
  • The user selects the most appropriate domain for
    a target application and examine only the
    pre-selected subset.

10
Cache Configuration Tuning
Energy to run a given application on different
configurations
11
Cache Configuration Subsetting
Application 1
Application 2
Application 3
Application 4
Application 5
...
Application N
Cache Configuration Space
Near optimal cache tuning Average energy
increases
12
Cache Configuration Subsetting
  • Problem Definition
  • Identify the subset of configurations that
    adequately covers the configuration space for the
    application domain.
  • We state that p configurations are necessary to
    cover a space of m points.
  • Exhaustive Approach
  • Select the subset of p configurations from the
    space m
  • Criterion for selection
  • Choosing the subset which keep the average
    energy of the tuned cache nearest to the optimal.

13
Experimental Setup
  • Configurable cache architecture for initial
    experiments
  • Our target configurable cache architecture is
    based on Zhang/Vahid/Najjars Highly-Configurable
    Cache Architecturefor Embedded Systems, ISCA
    2003
  • We set the base cache to suport the following 18
    configurations
  • Energy model for estimation

14
Exhaustive Approach
  • Subsets of size p18 down to 1 were chosen
    through the exhaustive evaluation of the average
    energy increase.

15
Exhaustive Approach
  • Good results, but exhaustively determining the
    subsets requires too much computation
  • For m18 and 1 lt p lt 18 it gives us 262,143
    possible combinations.
  • Thats too expensive!
  • We need a more efficient way to find the subset
    of configurations...

16
Looking for a Heuristic
  • First attempt
  • Choose the p configurations that offered optimal
    energy for the largest number of applications.

Energy increase
17
Looking for a Heuristic
  • Second attempt
  • Hierarchical clustering according to the
    similarity between configurations on their
    average energy savings.

Energy increase
18
Similar Problem
  • We found a problem similar to our subsetting
    puzzle.
  • The problem of segmenting time series.

19
Similar Problem
  • The method proposed by Keoghs can be applied, to
    reduce the number of primary color for displaying
    an image
  • (IEEE Int. Conf. On Data Mining, 2001)
  • Color/nuances of the image are merged according
    to the associated error/difference in the final
    image.

20
Adapting to the Subsetting Problem
  • Similarly, we may iteratively discard
    configurations by merging them.
  • By merging two configs cj and ck into ck means
    that all applications which were tuned by cj, now
    use ck.

e(cj,ai)
21
Keoghs Heuristic
  • All the possible merges of two adjacent
    (neighboring) configurations in the space are
    evaluated to find the best merging.

c7
c1
c7
c2
c13
c1
22
Keoghs Heuristic
  • Accuracy of the results

23
Comparing the Results
  • We went on merging the configuration space while
    the energy increase remains under 5 (4 configs).

24
Two-level Configurable Cache
Average energy increase 3.36
25
Where Else this Method Could be Applied?
  • Other parameterized IP cores
  • Buses, I/O devices (Word size, bandwidth, etc).
  • Parameterized Platforms,
  • Processors functional units, cache, bus,
    multiplier, IPs.
  • Optimization for low energy, area, performance
    and others.

26
Conclusions
  • The configurable cache subsetting problem was
    presented
  • A Data Mining algorithm for segmenting time
    series was adapted to tackle the cache subsetting
    problem
  • Subsetting the Two-level cache configuration
    space, keeping the average energy increase under
    5 takes
  • Keoghs heuristic less than 1 minute
  • Exhaustively around 53 hours.
  • Thanks to the versatility of the proposed method,
    our next step is to apply it for Platform
    customization.

Thank you !
Write a Comment
User Comments (0)
About PowerShow.com