A First Look at the Interplay of Code Reordering and Configurable Caches - PowerPoint PPT Presentation

About This Presentation
Title:

A First Look at the Interplay of Code Reordering and Configurable Caches

Description:

A First Look at the Interplay of Code Reordering and Configurable Caches ... Cache energy - Cacti. Main memory energy - Samsung memory ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 20
Provided by: annEc6
Category:

less

Transcript and Presenter's Notes

Title: A First Look at the Interplay of Code Reordering and Configurable Caches


1
A First Look at the Interplay of Code Reordering
and Configurable Caches
  • Ann Gordon-Ross and Frank Vahid
  • Department of Computer Science and Engineering
  • University of California, Riverside
  • Also with the Center for Embedded Computer
    Systems, UC Irvine

Nikil Dutt Center for Embedded Computer
Systems School for Information and Computer
Science University of California, Irvine
This work was supported by the U.S. National
Science Foundation, and by the Semiconductor
Research Corporation
2
Optimizations
  • Optimization is an important part of the design
    of an application or system

Area
Performance
Power and/or energy
3
Instruction Cache Optimizations
  • The instruction cache is a good candidate for
    optimizations

Gordon-Ross 04
Instruction caches have predictable spatial and
temporal locality. 90 of execution time is spent
in 10 of the code
ARM920T(Segars 01)
Power hungry - 29 of power consumption
4
Instruction Cache Tuning - Code Reordering
  • Tune the instruction stream for increased cache
    utilization and thus increased performance
  • Reorder the code so that infrequently executed
    regions of code do not pollute the instruction
    cache.

Download
Compile
Link
obj file
App
Code reordering is typically applied during link
time however runtime methods do exist but incur
undesirable runtime overhead.
Execute
5
Instruction Cache Tuning - Code Reordering
while (input)
while (input)
Read input
Read input
no
100
Is the input valid?
Is the input valid?
Code Reordering
yes
yes
1
no
Process input
Error handling routine
Done
Process input
Done
Error handling routine
6
Instruction Cache Tuning - Configurable Cache
Tuning
  • Tune the cache to the instruction stream for
    decreased energy and/or increased performance
  • Cache tuning can be performed during
    application/platform design or even in system
    during runtime incurring no runtime overhead
    (Zhang - DATE04)

OR
7
Instruction Cache Tuning - Configurable Cache
Tuning
  • Tunable parameters include

Cache Associativity
Cache Line Size
Total Cache Size
L1 Cache
L1 Cache
L1 Cache
8
Motivation - Code Reordering Cache Configuration
Cache configuration tunes the cache to the
instruction stream
How do these optimizations affect each other?
Complement?
Obviate?
Instruction Cache
Degrade?
Code reordering tunes the instruction stream for
the cache
9
Pettis and Hansen Code Reordering
  • Many current code reordering techniques are based
    heavily off of the Pettis and Hansen code
    reordering algorithm - 1990
  • Reorder basic blocks using edge profiling to
    increase locality
  • Orders basic blocks so that the most frequently
    executed path through the basic blocks is placed
    as straight-line code

10
Pettis and Hansen Bottom-up Positioning Algorithm
Control Flow Graph
  • Process arc weights in decreasing order
  • For each arc, merge basic blocks at the source
    and destination of each arc to form a chain
  • If one of the blocks is already in the middle of
    a chain, form a new chain

Reordered basic block chains
Execution frequencies
Basic Blocks
11
Configurable Cache Architecture
  • We used the configurable cache architecture
    proposed by Zhang - ISCA03

12
Configurable Cache Architecture
  • The base cache consists of 4 2KByte banks that
    may individually be shutdown for size
    configuration
  • Way concatenation allows for
    configurable associativity

Way shutdown
8 KBytes
4 KBytes
8 KBytes 2-way
13
Configurable Cache Heuristic
L1 Cache
then tune cache line size
16, 32, and 64 bytes
and finally tune cache associativity
L1 Cache
Direct-mapped, 2-way and 4-way
L1 Cache
First tune cache size
2, 4, and 8 KBytes
14
Evaluation Framework
Cache Exploration Heuristic
No code reordering
Chosen cache configuration
Exhaustive search for comparison purposes
Instrument the executable to gather edge profiles
Execute the application
Code reordered executable
PLTO Pentium Link Time Optimizer
Hit and miss ratios for each configuration
Provide edge profiles to perform code reordering
Cache energy - Cacti Main memory energy - Samsung
memory
Execute the application to gather edge profiles
Provided by the University of Arizona
15
Results - Energy Savings
Base cache 2KB, d-m, 16 byte line size
Base Cache With Code Reordering
Base Cache Without Code Reordering
Configured Cache Without Code Reordering
Configured Cache With Code Reordering
1.5
1.5
  • Code reordering alone 3.5 energy reduction
  • Cache configuration alone 15 energy reduction
  • Cache configuration code reordering 17
    energy reduction

16
Results - Performance Benefits
Base Cache Without Code Reordering
Base Cache With Code Reordering
Configured Cache Without Code Reordering
Configured Cache With Code Reordering
1.5
1.6
  • Code reordering alone 3.5 performance benefit
  • Cache configuration alone 17 performance
    benefit
  • Cache configuration code reordering 18.5
    performance benefit
  • On average, code reordering gives little
    additional benefit over cache configuration
    alone. However a few benchmarks see added
    benefits.

17
Change in Cache Requirements Due to Code
Reordering
x
x
x
x

x


x


x
x

Powerstone Mediabench EEMBC
x
- reduction in cache area
- larger line size
- smaller cache size

18
Conclusions
  • We explore the interplay of two instruction cache
    optimization techniques - code reordering and
    cache configuration
  • Cache configuration largely obviates the need for
    code reordering with respect to energy and
    performance
  • Cache configuration applied dynamically during
    runtime eliminates the need for designer applied
    code reordering
  • Code reordering improved cache utilization in 52
    of the benchmarks
  • Reduced instruction cache size by an average of
    13 and as high as 90 - beneficial for small
    custom synthesized embedded systems where area is
    critical

19
Future Work
  • We plan to use a more advanced code reordering
    methodology that will take into account set
    assiociativity or multiple levels of cache
  • We plan to study the iterative interplay of code
    reordering and cache configuration using a code
    reordering technique that takes the cache
    configuration into consideration
Write a Comment
User Comments (0)
About PowerShow.com