Generational Cache Management of Code Traces in Dynamic Optimization Systems - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Generational Cache Management of Code Traces in Dynamic Optimization Systems

Description:

Generational Cache Management of Code Traces in Dynamic Optimization Systems ... Crafty: 292,486 misses eliminated resulting in a 8.09% reduction in execution time ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 23

Provided by: kimha88

Category:

more less

Transcript and Presenter's Notes

Title: Generational Cache Management of Code Traces in Dynamic Optimization Systems

1
Generational Cache Management of Code Traces in
Dynamic Optimization Systems

Kim Hazelwood Michael D. Smith
Harvard University
MICRO-36

2
Tasks of a Dynamic Optimizer

Apply code optimizations to binaries at run time
Observing execution, forming superblocks
Optimization
Code caching
To maximize performance, vast majority of
execution should occur in the code cache

3
Code Cache Management

Goals
Maximize execution in code cache
Minimize runtime overhead
Previous solutions
Cache flush on program phase change
Unbounded caches
All motivated by SPEC benchmark performance

4
Our Contributions

Characterization of different caching needs of
SPEC and interactive applications
Investigation of superblock lifetimes
Generational code cache algorithm
Evaluation
Miss rates
Overheads
Discussion of implementation challenges

5
Do We Need Cache Management?

For SPEC2000 Probably not

6
Interactive Windows Applications

Unbounded caches become impractical

7
As a General Rule (in DynamoRIO)
Code Expansion Final Code Cache Size
Application Footprint
8
Local vs. Global Cache Management

Local Cache Management Eviction policy for a
single code cache (LRU, FIFO, etc.)
Global Cache Management Policy of interaction
between multiple code caches
Basic block vs. superblock cache
Generational code caches

9
Basic Block Superblock Caches

DynamoRIO interprets by copying all basic
blocks into a code cache
Once the basic blocks become hot, superblocks
are formed and copied into the superblock cache
One weakness of a single FIFO cache is that all
superblocks are treated equally

Basic Block Cache
Superblock Cache
50 executions
Superblock Formation
10
Superblock Lifetimes
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
11
Generational Code Caches
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (Live) PROMOTE
Circular Buffer
Circular Buffer
If (Dead) DELETE
12
Generational Hypothesis

Generational hypothesis from garbage collection
Objects tend to die young
Unfortunately, garbage collectors know when an
object is dead
A superblock is dead when it will never be
executed again (impossible to determine before
program ends)
Guessing incorrectly doesnt impact our
correctness

13
The Probation Cache
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (threshold_met) PROMOTE
Probation Cache
Circular Buffer
Circular Buffer
Circular Buffer
If (threshold_not_met) DELETE
14
Experimental Approach
SB Details Insertions Accesses
Bench- Marks
Superblock Trace
DynamoRIO
Results
Code Cache Simulator
15
Experimental Comparison

Ensure pressure cacheSize 1/3maxCache
Local policy fixed at FIFO for all caches
Base Case One unified FIFO cache
Generational Case Nursery, probation, persistent
caches totaling cacheSize

16
Windows Application Miss Rates
17
SPEC2000 Miss Rates
18
Incorporating Overheads

Using Pentium performance monitors and PAPI, we
collected overheads for
All overheads reported in instructions

Overhead Calculation Size242B
Superblock formation 865 (SBSize)(0.8) 69834
Cache eviction 2.75 SBSize 2650 3316
Cache promotion 22 SBSize 8030 13354
DynamoRIO context switch 25 25
19
Generating Overhead Estimates
Each overhead estimate was generated using
least-squares linear regression over 30,000
samples
20
Reduction in Runtime Overhead
21
Performance Impact

Results varied and were highly dependent on
number of misses eliminated
Gzip 2,288 misses eliminated resulting in 0.07
reduction in execution time
Crafty 292,486 misses eliminated resulting in a
8.09 reduction in execution time

22
Conclusions

Large, interactive applications impose limiting
constraints on code caches
We can leverage observations of superblock
lifetimes to improve management policies
Based on trace-driven simulation, replacing a
single code cache with multiple generational code
caches results in
Reduced miss rates
Reduced runtime overhead

Write a Comment

User Comments (0)