Generational Cache Management of Code Traces in Dynamic Optimization Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Generational Cache Management of Code Traces in Dynamic Optimization Systems

Description:

Generational Cache Management of Code Traces in Dynamic Optimization Systems ... Crafty: 292,486 misses eliminated resulting in a 8.09% reduction in execution time ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 23
Provided by: kimha88
Category:

less

Transcript and Presenter's Notes

Title: Generational Cache Management of Code Traces in Dynamic Optimization Systems


1
Generational Cache Management of Code Traces in
Dynamic Optimization Systems
  • Kim Hazelwood Michael D. Smith
  • Harvard University
  • MICRO-36

2
Tasks of a Dynamic Optimizer
  • Apply code optimizations to binaries at run time
  • Observing execution, forming superblocks
  • Optimization
  • Code caching
  • To maximize performance, vast majority of
    execution should occur in the code cache

3
Code Cache Management
  • Goals
  • Maximize execution in code cache
  • Minimize runtime overhead
  • Previous solutions
  • Cache flush on program phase change
  • Unbounded caches
  • All motivated by SPEC benchmark performance

4
Our Contributions
  • Characterization of different caching needs of
    SPEC and interactive applications
  • Investigation of superblock lifetimes
  • Generational code cache algorithm
  • Evaluation
  • Miss rates
  • Overheads
  • Discussion of implementation challenges

5
Do We Need Cache Management?
  • For SPEC2000 Probably not

6
Interactive Windows Applications
  • Unbounded caches become impractical

7
As a General Rule (in DynamoRIO)
Code Expansion Final Code Cache Size
Application Footprint
8
Local vs. Global Cache Management
  • Local Cache Management Eviction policy for a
    single code cache (LRU, FIFO, etc.)
  • Global Cache Management Policy of interaction
    between multiple code caches
  • Basic block vs. superblock cache
  • Generational code caches

9
Basic Block Superblock Caches
  • DynamoRIO interprets by copying all basic
    blocks into a code cache
  • Once the basic blocks become hot, superblocks
    are formed and copied into the superblock cache
  • One weakness of a single FIFO cache is that all
    superblocks are treated equally

Basic Block Cache
Superblock Cache
50 executions
Superblock Formation
10
Superblock Lifetimes
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
11
Generational Code Caches
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (Live) PROMOTE
Circular Buffer
Circular Buffer
If (Dead) DELETE
12
Generational Hypothesis
  • Generational hypothesis from garbage collection
    Objects tend to die young
  • Unfortunately, garbage collectors know when an
    object is dead
  • A superblock is dead when it will never be
    executed again (impossible to determine before
    program ends)
  • Guessing incorrectly doesnt impact our
    correctness

13
The Probation Cache
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (threshold_met) PROMOTE
Probation Cache
Circular Buffer
Circular Buffer
Circular Buffer
If (threshold_not_met) DELETE
14
Experimental Approach
SB Details Insertions Accesses
Bench- Marks
Superblock Trace
DynamoRIO
Results
Code Cache Simulator
15
Experimental Comparison
  • Ensure pressure cacheSize 1/3maxCache
  • Local policy fixed at FIFO for all caches
  • Base Case One unified FIFO cache
  • Generational Case Nursery, probation, persistent
    caches totaling cacheSize

16
Windows Application Miss Rates
17
SPEC2000 Miss Rates
18
Incorporating Overheads
  • Using Pentium performance monitors and PAPI, we
    collected overheads for
  • All overheads reported in instructions

Overhead Calculation Size242B
Superblock formation 865 (SBSize)(0.8) 69834
Cache eviction 2.75 SBSize 2650 3316
Cache promotion 22 SBSize 8030 13354
DynamoRIO context switch 25 25
19
Generating Overhead Estimates
Each overhead estimate was generated using
least-squares linear regression over 30,000
samples
20
Reduction in Runtime Overhead
21
Performance Impact
  • Results varied and were highly dependent on
    number of misses eliminated
  • Gzip 2,288 misses eliminated resulting in 0.07
    reduction in execution time
  • Crafty 292,486 misses eliminated resulting in a
    8.09 reduction in execution time

22
Conclusions
  • Large, interactive applications impose limiting
    constraints on code caches
  • We can leverage observations of superblock
    lifetimes to improve management policies
  • Based on trace-driven simulation, replacing a
    single code cache with multiple generational code
    caches results in
  • Reduced miss rates
  • Reduced runtime overhead
Write a Comment
User Comments (0)
About PowerShow.com