Title: Generational Cache Management of Code Traces in Dynamic Optimization Systems
1Generational Cache Management of Code Traces in
Dynamic Optimization Systems
- Kim Hazelwood Michael D. Smith
- Harvard University
- MICRO-36
2Tasks of a Dynamic Optimizer
- Apply code optimizations to binaries at run time
- Observing execution, forming superblocks
- Optimization
- Code caching
- To maximize performance, vast majority of
execution should occur in the code cache
3Code Cache Management
- Goals
- Maximize execution in code cache
- Minimize runtime overhead
- Previous solutions
- Cache flush on program phase change
- Unbounded caches
- All motivated by SPEC benchmark performance
4Our Contributions
- Characterization of different caching needs of
SPEC and interactive applications - Investigation of superblock lifetimes
- Generational code cache algorithm
- Evaluation
- Miss rates
- Overheads
- Discussion of implementation challenges
5Do We Need Cache Management?
- For SPEC2000 Probably not
6Interactive Windows Applications
- Unbounded caches become impractical
7As a General Rule (in DynamoRIO)
Code Expansion Final Code Cache Size
Application Footprint
8Local vs. Global Cache Management
- Local Cache Management Eviction policy for a
single code cache (LRU, FIFO, etc.) - Global Cache Management Policy of interaction
between multiple code caches - Basic block vs. superblock cache
- Generational code caches
9Basic Block Superblock Caches
- DynamoRIO interprets by copying all basic
blocks into a code cache - Once the basic blocks become hot, superblocks
are formed and copied into the superblock cache - One weakness of a single FIFO cache is that all
superblocks are treated equally
Basic Block Cache
Superblock Cache
50 executions
Superblock Formation
10Superblock Lifetimes
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
11Generational Code Caches
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (Live) PROMOTE
Circular Buffer
Circular Buffer
If (Dead) DELETE
12Generational Hypothesis
- Generational hypothesis from garbage collection
Objects tend to die young - Unfortunately, garbage collectors know when an
object is dead - A superblock is dead when it will never be
executed again (impossible to determine before
program ends) - Guessing incorrectly doesnt impact our
correctness
13The Probation Cache
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (threshold_met) PROMOTE
Probation Cache
Circular Buffer
Circular Buffer
Circular Buffer
If (threshold_not_met) DELETE
14Experimental Approach
SB Details Insertions Accesses
Bench- Marks
Superblock Trace
DynamoRIO
Results
Code Cache Simulator
15Experimental Comparison
- Ensure pressure cacheSize 1/3maxCache
- Local policy fixed at FIFO for all caches
- Base Case One unified FIFO cache
- Generational Case Nursery, probation, persistent
caches totaling cacheSize
16Windows Application Miss Rates
17SPEC2000 Miss Rates
18Incorporating Overheads
- Using Pentium performance monitors and PAPI, we
collected overheads for - All overheads reported in instructions
Overhead Calculation Size242B
Superblock formation 865 (SBSize)(0.8) 69834
Cache eviction 2.75 SBSize 2650 3316
Cache promotion 22 SBSize 8030 13354
DynamoRIO context switch 25 25
19Generating Overhead Estimates
Each overhead estimate was generated using
least-squares linear regression over 30,000
samples
20Reduction in Runtime Overhead
21Performance Impact
- Results varied and were highly dependent on
number of misses eliminated - Gzip 2,288 misses eliminated resulting in 0.07
reduction in execution time - Crafty 292,486 misses eliminated resulting in a
8.09 reduction in execution time
22Conclusions
- Large, interactive applications impose limiting
constraints on code caches - We can leverage observations of superblock
lifetimes to improve management policies - Based on trace-driven simulation, replacing a
single code cache with multiple generational code
caches results in - Reduced miss rates
- Reduced runtime overhead