Generational Cache Management of Code Traces in Dynamic Optimization Systems - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Generational Cache Management of Code Traces in Dynamic Optimization Systems

Description:

Generational Cache Management of Code Traces in Dynamic Optimization Systems. Original Authors ... generates trace by copying all basic blocks into a code cache ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 40

Provided by: kimha88

Category:

more less

Transcript and Presenter's Notes

Title: Generational Cache Management of Code Traces in Dynamic Optimization Systems

1
Generational Cache Management of Code Traces in
Dynamic Optimization Systems

Original Authors
Kim Hazelwood
Michael D. Smith
Harvard University
Presented w/ modifications by
William M. Jones
For ECE 903 Seminar (Spring 2004)
Clemson University

2
1
2
Core Algorithms
4
3
3
1
2
Core Algorithms
4
3
Just Kidding
4
Dynamic Optimization Systems
Profile

Run-Time Overheads
Observing execution
Forming code regions
Optimization
Code caching
ALL OVERHEAD

Transform Code
Code Cache
exe
CPU
For good performance, vast majority of execution
should occur in the code cache. Cache management
must be efficient and cannot dominate.
5
Code Caches Store Superblocks
Resulting Superblocks
Original CFG
Potential superblock caching entities
Common dynamic sequence of basic blocks

Essential Idea

6
Code Cache Management

Goals
Maximize execution in code cache
Minimize runtime overhead
Previous solutions
Cache flush on program phase change
Unbounded caches
Research tools
All motivated by SPEC benchmark performance

7
Contributions of the Research

Characterization of SPEC and interactive
applications
Investigation of superblock lifetimes
Generational code cache algorithms
Evaluation
Miss rates
Overheads
Performance improvements

8
The DynamoRIO Collaboration
Dynamo From Hewlett-Packard Laboratories

Targets Windows NTs and Linux
Addresses challenges of providing dynamic
optimization infrastructure

RIO (Runtime Introspection and Optimization) from
MITs Laboratory for Computer Science
9
System Layout
SB Details Insertions Accesses --
Windows Repeatability
Bench- Marks
Superblock Trace
DynamoRIO
Results
Code Cache Simulator
10
Initial Studies

Determine maximum code cache size
Calculate code expansion factors
Obtain trace generation frequency
Identify causes of fragmentation in cache

11
Do We Need Cache Management?

For SPEC2000 Probably not

12
Interactive Windows Applications

20 fold increase
Unbounded caches become impractical

13
As a General Rule (in DynamoRIO)
Code Expansion Final Code Cache Size
Application Footprint
14
Trace Generation Frequency SPEC2000
15
Trace Generation Frequency Windows Benchmark
16
FragmentationSuperblocks Vary in Size
17
Unmapped MemoryAdditional Fragmentation
18
Local vs. Global Cache ManagementTwo
Granularities

Local Cache Management Eviction policy for a
single code cache (FIFO, LRU, etc.)
Global Cache Management Policy of interaction
between multiple code caches
Basic block vs. superblock cache
Generational code caches

19
Cache Management Challenges

Low overhead
Impacts runtime performance
Complex calculations not feasible
Emphasize temporal locality
Intuitively obvious (thats the whole point)
Minimize fragmentation
Insertion and deletion
Unmapping memory (dynamic loading)
Circular buffer management

20
Circular Buffer Challenges

Undeletable traces
Suspended execution
Exception handling
Program-forced evictions
Unmapped memory
Fragmentation
Complications warrant new buffer design
Pseudo-circular buffer (not strict FIFO)
Skip undeletable traces and ignore program-forced
evictions

21
Basic Block Superblock Caches

DynamoRIO generates trace by copying all basic
blocks into a code cache
Once the basic blocks become hot, superblocks
are formed and copied into the superblock cache
One weakness of a single FIFO (circular) cache is
that all superblocks are treated equally

Basic Block Cache
Superblock Cache
50 executions
Superblock Formation
22
Superblock Lifetimes -- SPEC2000
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
23
Superblock Lifetimes -- Windows
Lifetime LastExecutionTime FirstExecutionTime
TotalExecutionTime
24
Generational Code CachesInitial Conceptual Design
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (Live) PROMOTE
Circular Buffer
Circular Buffer
If (Dead) DELETE
25
Generational Hypothesis

Generational hypothesis from garbage collection
Objects tend to die young
Unfortunately, garbage collectors know when an
object is dead
A superblock is dead when it will never be
executed again (impossible to determine before
program ends)
Guessing incorrectly doesnt impact our
correctness just expensive

26
The Probation Cache
Nursery Cache
Persistent Cache
New SuperBlock
FiFo Eviction
If (threshold_met) PROMOTE
Probation Cache
Circular Buffer
Circular Buffer
Circular Buffer
If (threshold_not_met) DELETE
27
Experimental Comparison

Ensure pressure cacheSize (1/3)maxCache
Local policy fixed at FIFO for all caches
Base Case One unified FIFO cache
Generational Case Nursery, probation, persistent
caches totaling cacheSize
How big should each cache be ?
Probation threshold ?

28
Windows Application Miss Rates
29
SPEC2000 Miss Rates
30
Incorporating Overheads

Using Pentium-4 performance monitors and PAPI
(counters), we collected overheads for
All overheads reported in instructions

31
Generating Overhead Estimates
Each overhead estimate was generated using
least-squares linear regression over 30,000
samples
32
Reduction in Runtime Overhead(45 -- 10 -- 45
proportion scheme)
33
Actual Execution Time Improvement

Results varied and were highly dependent on
number of misses eliminated
Gzip 2,288 misses eliminated resulting in 0.07
reduction in execution time
Crafty 292,486 misses eliminated resulting in a
8.09 reduction in execution time

34
Conclusions

Large, interactive applications impose limiting
constraints on code caches
Leverage observations of superblock lifetimes to
improve management policies
Based on trace-driven simulation, replacing a
single code cache with multiple generational code
caches results in
Reduced miss rates
Reduced runtime overhead

35
Ongoing Research