Title: A First Look at the Interplay of Code Reordering and Configurable Caches
1A First Look at the Interplay of Code Reordering
and Configurable Caches
- Ann Gordon-Ross and Frank Vahid
- Department of Computer Science and Engineering
- University of California, Riverside
- Also with the Center for Embedded Computer
Systems, UC Irvine
Nikil Dutt Center for Embedded Computer
Systems School for Information and Computer
Science University of California, Irvine
This work was supported by the U.S. National
Science Foundation, and by the Semiconductor
Research Corporation
2Optimizations
- Optimization is an important part of the design
of an application or system
Area
Performance
Power and/or energy
3Instruction Cache Optimizations
- The instruction cache is a good candidate for
optimizations
Gordon-Ross 04
Instruction caches have predictable spatial and
temporal locality. 90 of execution time is spent
in 10 of the code
ARM920T(Segars 01)
Power hungry - 29 of power consumption
4Instruction Cache Tuning - Code Reordering
- Tune the instruction stream for increased cache
utilization and thus increased performance - Reorder the code so that infrequently executed
regions of code do not pollute the instruction
cache.
Download
Compile
Link
obj file
App
Code reordering is typically applied during link
time however runtime methods do exist but incur
undesirable runtime overhead.
Execute
5Instruction Cache Tuning - Code Reordering
while (input)
while (input)
Read input
Read input
no
100
Is the input valid?
Is the input valid?
Code Reordering
yes
yes
1
no
Process input
Error handling routine
Done
Process input
Done
Error handling routine
6Instruction Cache Tuning - Configurable Cache
Tuning
- Tune the cache to the instruction stream for
decreased energy and/or increased performance
- Cache tuning can be performed during
application/platform design or even in system
during runtime incurring no runtime overhead
(Zhang - DATE04)
OR
7Instruction Cache Tuning - Configurable Cache
Tuning
- Tunable parameters include
Cache Associativity
Cache Line Size
Total Cache Size
L1 Cache
L1 Cache
L1 Cache
8Motivation - Code Reordering Cache Configuration
Cache configuration tunes the cache to the
instruction stream
How do these optimizations affect each other?
Complement?
Obviate?
Instruction Cache
Degrade?
Code reordering tunes the instruction stream for
the cache
9Pettis and Hansen Code Reordering
- Many current code reordering techniques are based
heavily off of the Pettis and Hansen code
reordering algorithm - 1990 - Reorder basic blocks using edge profiling to
increase locality - Orders basic blocks so that the most frequently
executed path through the basic blocks is placed
as straight-line code
10Pettis and Hansen Bottom-up Positioning Algorithm
Control Flow Graph
- Process arc weights in decreasing order
- For each arc, merge basic blocks at the source
and destination of each arc to form a chain - If one of the blocks is already in the middle of
a chain, form a new chain
Reordered basic block chains
Execution frequencies
Basic Blocks
11Configurable Cache Architecture
- We used the configurable cache architecture
proposed by Zhang - ISCA03
12Configurable Cache Architecture
- The base cache consists of 4 2KByte banks that
may individually be shutdown for size
configuration - Way concatenation allows for
configurable associativity
Way shutdown
8 KBytes
4 KBytes
8 KBytes 2-way
13Configurable Cache Heuristic
L1 Cache
then tune cache line size
16, 32, and 64 bytes
and finally tune cache associativity
L1 Cache
Direct-mapped, 2-way and 4-way
L1 Cache
First tune cache size
2, 4, and 8 KBytes
14Evaluation Framework
Cache Exploration Heuristic
No code reordering
Chosen cache configuration
Exhaustive search for comparison purposes
Instrument the executable to gather edge profiles
Execute the application
Code reordered executable
PLTO Pentium Link Time Optimizer
Hit and miss ratios for each configuration
Provide edge profiles to perform code reordering
Cache energy - Cacti Main memory energy - Samsung
memory
Execute the application to gather edge profiles
Provided by the University of Arizona
15Results - Energy Savings
Base cache 2KB, d-m, 16 byte line size
Base Cache With Code Reordering
Base Cache Without Code Reordering
Configured Cache Without Code Reordering
Configured Cache With Code Reordering
1.5
1.5
- Code reordering alone 3.5 energy reduction
- Cache configuration alone 15 energy reduction
- Cache configuration code reordering 17
energy reduction
16Results - Performance Benefits
Base Cache Without Code Reordering
Base Cache With Code Reordering
Configured Cache Without Code Reordering
Configured Cache With Code Reordering
1.5
1.6
- Code reordering alone 3.5 performance benefit
- Cache configuration alone 17 performance
benefit - Cache configuration code reordering 18.5
performance benefit - On average, code reordering gives little
additional benefit over cache configuration
alone. However a few benchmarks see added
benefits.
17Change in Cache Requirements Due to Code
Reordering
x
x
x
x
x
x
x
x
Powerstone Mediabench EEMBC
x
- reduction in cache area
- larger line size
- smaller cache size
18Conclusions
- We explore the interplay of two instruction cache
optimization techniques - code reordering and
cache configuration - Cache configuration largely obviates the need for
code reordering with respect to energy and
performance - Cache configuration applied dynamically during
runtime eliminates the need for designer applied
code reordering - Code reordering improved cache utilization in 52
of the benchmarks - Reduced instruction cache size by an average of
13 and as high as 90 - beneficial for small
custom synthesized embedded systems where area is
critical
19Future Work
- We plan to use a more advanced code reordering
methodology that will take into account set
assiociativity or multiple levels of cache - We plan to study the iterative interplay of code
reordering and cache configuration using a code
reordering technique that takes the cache
configuration into consideration