Generational Cache Management of Code Traces in Dynamic Optimization Systems - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Generational Cache Management of Code Traces in Dynamic Optimization Systems

Description:

Generational Cache Management of Code Traces in Dynamic Optimization Systems. Original Authors ... generates trace by copying all basic blocks into a code cache ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 40
Provided by: kimha88
Category:

less

Transcript and Presenter's Notes

Title: Generational Cache Management of Code Traces in Dynamic Optimization Systems


1
Generational Cache Management of Code Traces in
Dynamic Optimization Systems
  • Original Authors
  • Kim Hazelwood
  • Michael D. Smith
  • Harvard University
  • Presented w/ modifications by
  • William M. Jones
  • For ECE 903 Seminar (Spring 2004)
  • Clemson University

2
1
2
Core Algorithms
4
3
3
1
2
Core Algorithms
4
3
Just Kidding
4
Dynamic Optimization Systems
Profile
  • Run-Time Overheads
  • Observing execution
  • Forming code regions
  • Optimization
  • Code caching
  • ALL OVERHEAD

Transform Code
Code Cache
exe
CPU
For good performance, vast majority of execution
should occur in the code cache. Cache management
must be efficient and cannot dominate.
5
Code Caches Store Superblocks
Resulting Superblocks
Original CFG
Potential superblock caching entities
Common dynamic sequence of basic blocks
  • Essential Idea

6
Code Cache Management
  • Goals
  • Maximize execution in code cache
  • Minimize runtime overhead
  • Previous solutions
  • Cache flush on program phase change
  • Unbounded caches
  • Research tools
  • All motivated by SPEC benchmark performance

7
Contributions of the Research
  • Characterization of SPEC and interactive
    applications
  • Investigation of superblock lifetimes
  • Generational code cache algorithms
  • Evaluation
  • Miss rates
  • Overheads
  • Performance improvements

8
The DynamoRIO Collaboration
Dynamo From Hewlett-Packard Laboratories
  • Targets Windows NTs and Linux
  • Addresses challenges of providing dynamic
    optimization infrastructure


RIO (Runtime Introspection and Optimization) from
MITs Laboratory for Computer Science
9
System Layout
SB Details Insertions Accesses --
Windows Repeatability
Bench- Marks
Superblock Trace
DynamoRIO
Results
Code Cache Simulator
10
Initial Studies
  • Determine maximum code cache size
  • Calculate code expansion factors
  • Obtain trace generation frequency
  • Identify causes of fragmentation in cache

11
Do We Need Cache Management?
  • For SPEC2000 Probably not

12
Interactive Windows Applications
  • 20 fold increase
  • Unbounded caches become impractical

13
As a General Rule (in DynamoRIO)
Code Expansion Final Code Cache Size
Application Footprint
14
Trace Generation Frequency SPEC2000
15
Trace Generation Frequency Windows Benchmark
16
FragmentationSuperblocks Vary in Size
17
Unmapped MemoryAdditional Fragmentation
18
Local vs. Global Cache ManagementTwo
Granularities
  • Local Cache Management Eviction policy for a
    single code cache (FIFO, LRU, etc.)
  • Global Cache Management Policy of interaction
    between multiple code caches
  • Basic block vs. superblock cache
  • Generational code caches

19
Cache Management Challenges
  • Low overhead
  • Impacts runtime performance
  • Complex calculations not feasible
  • Emphasize temporal locality
  • Intuitively obvious (thats the whole point)
  • Minimize fragmentation
  • Insertion and deletion
  • Unmapping memory (dynamic loading)
  • Circular buffer management

20
Circular Buffer Challenges
  • Undeletable traces
  • Suspended execution
  • Exception handling
  • Program-forced evictions
  • Unmapped memory
  • Fragmentation
  • Complications warrant new buffer design
  • Pseudo-circular buffer (not strict FIFO)
  • Skip undeletable traces and ignore program-forced
    evictions

21
Basic Block Superblock Caches
  • DynamoRIO generates trace by copying all basic
    blocks into a code cache
  • Once the basic blocks become hot, superblocks
    are formed and copied into the superblock cache
  • One weakness of a single FIFO (circular) cache is
    that all superblocks are treated equally

Basic Block Cache
Superblock Cache
50 executions
Superblock Formation
22
Superblock Lifetimes -- SPEC2000
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
23
Superblock Lifetimes -- Windows
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
24
Generational Code CachesInitial Conceptual Design
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (Live) PROMOTE
Circular Buffer
Circular Buffer
If (Dead) DELETE
25
Generational Hypothesis
  • Generational hypothesis from garbage collection
    Objects tend to die young
  • Unfortunately, garbage collectors know when an
    object is dead
  • A superblock is dead when it will never be
    executed again (impossible to determine before
    program ends)
  • Guessing incorrectly doesnt impact our
    correctness just expensive

26
The Probation Cache
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (threshold_met) PROMOTE
Probation Cache
Circular Buffer
Circular Buffer
Circular Buffer
If (threshold_not_met) DELETE
27
Experimental Comparison
  • Ensure pressure cacheSize (1/3)maxCache
  • Local policy fixed at FIFO for all caches
  • Base Case One unified FIFO cache
  • Generational Case Nursery, probation, persistent
    caches totaling cacheSize
  • How big should each cache be ?
  • Probation threshold ?

28
Windows Application Miss Rates
29
SPEC2000 Miss Rates
30
Incorporating Overheads
  • Using Pentium-4 performance monitors and PAPI
    (counters), we collected overheads for
  • All overheads reported in instructions

31
Generating Overhead Estimates
Each overhead estimate was generated using
least-squares linear regression over 30,000
samples
32
Reduction in Runtime Overhead(45 -- 10 -- 45
proportion scheme)
33
Actual Execution Time Improvement
  • Results varied and were highly dependent on
    number of misses eliminated
  • Gzip 2,288 misses eliminated resulting in 0.07
    reduction in execution time
  • Crafty 292,486 misses eliminated resulting in a
    8.09 reduction in execution time

34
Conclusions
  • Large, interactive applications impose limiting
    constraints on code caches
  • Leverage observations of superblock lifetimes to
    improve management policies
  • Based on trace-driven simulation, replacing a
    single code cache with multiple generational code
    caches results in
  • Reduced miss rates
  • Reduced runtime overhead

35
Ongoing Research
  • Increasing cache pressure
  • Management overhead can dominate
  • Code cache eviction granularities
  • Fine grain lower miss ratio rate
  • Coarse grain less total management overhead
  • Going from SIMULATION to IMPLEMENTATION

36
Questions ?
37
Appendix A -- Metrics
HMEAN Harmonic Mean H
GMEAN Geometric Mean G
H lt G lt Arithmetic mean
38
Potential Questions
  • FIFO versus LRU, WHY?
  • Published paper at Interact02 BIB12

39
DynamoRIO Block Diagram
Write a Comment
User Comments (0)
About PowerShow.com