Title: 380C
1380C
- Where are we where we are going
- Managed languages
- Dynamic compilation
- Inlining
- Garbage collection
- What else can you do when you examine the heap a
lot? - Why you need to care about workloads
- Alias analysis
- Dependence analysis
- Loop transformations
- EDGE architectures
2380C lecture 18
- Garbage Collection
- Why use garbage collection?
- What is garbage?
- Reachable vs live, stack maps, etc.
- Allocators and their collection mechanisms
- Semispace
- Marksweep
- Performance comparisons
- Mark Region
- Incremental age based collection
- Write barriers Friend or foe?
- Generational
- Beltway
3Mark Region and Other Advances in Garbage
Collection
PLDI08 Immix A Mark-Region Collector
WithSpace Efficiency, Fast Collection, and
Mutator Performance
- Kathryn S. McKinley Stephen M. Blackburn
- University of Texas at Austin Australian
National University
4Isnt GC a bit retro?
Languages without automated garbage collection
are getting out of fashion. The chance of running
into all kinds of memory problems is gradually
outweighing the performance penalty you have to
pay for garbage collection. Paul Jansen,
managing director of TIOBE Software, in Dr Dobbs,
April 2008
5GC FundamentalsThe TimeSpace Tradeoff
6GC FundamentalsThe TimeSpace Tradeoff
Our Goal
7GC FundamentalsAlgorithmic Components
Identification
Sweep-to-Free
Tracing (implicit)
Free List
Compact
Reference Counting (explicit)
Bump Allocation
Evacuate
8GC FundamentalsCanonical Garbage Collectors
9Mark-SweepFree List Allocation Trace
Sweep-to-Free
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
10Mark-CompactBump Allocation Trace Compact
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
11Semi-SpaceBump Allocation Trace Evacuation
Space inefficient
Space inefficient
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
12Mark-Regionwith Sweep-To-Region
Reclamation
Sweep-to-Free
13Mark-RegionBump Allocation Trace
Sweep-to-Region
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
14Naïve Mark-Region
0
- Contiguous allocation into regions
- Excellent locality
- For simplicity, objects cannot span regions
- Simple mark phase (like mark-sweep)
- Mark objects and their containing region
- Unmarked regions can be freed
15ImmixEfficient Mark-Region Garbage Collection
16Lines and Blocks
? More contiguous allocation
? Increased metadata o/h
? Constrained object sizes
Free
Free
Recyclable lines
Recyclable lines
0
? TLB locality, cache locality
? Block gt 4 X max object size
? Objects span lines
? Lines marked with objects
? Less fragmentation
? Fast common case
17Allocation Policy(Recycling)
- Recycle partially marked blocks first
- Minimizes fragmentation
- Maximizes sharing of freed blocks
- Recycle in address order
- We explored other options
- Allocate into free blocks last
18Opportunistic Defragmentation
- Opportunistically evacuate fragmented blocks
- Lightweight, uses same allocation mechanism
- No cost in common case (specialized GC)
0
- Identify source and target blocks
- (see paper for heuristics)
- Evacuate objects in source blocks
- Allocate into target blocks
- Opportunistic
- Leave in place if no space, or object pinned
19Other Optimizations
? Most objects small
? Small objects implicitly mark next line
? V. Fast common case
? Large objects mark lines exactly
?
? Multi-line objects may skip many small holes
? Overflow allocation (used on failure)
20ResultsComplete data available
athttp//cs.anu.edu.au/Steve.Blackburn/pubs
21Evaluation
Collectors
- Core 2 Duo
- 2.4GHz, 32KB L1, 4MB L2, 2GB RAM
- AMD Athlon 3500
- 2.2GHz, 64KB L1, 512KB L2, 2GB RAM
- PowerPC 970
- 1.6GHz, 32KB L1, 512KB L2, 2GB RAM
- DaCapo
- SPECjvm98
- SPEC jbb2000
- Full Heap
- Immix
- MarkSweep
- MarkCompact
- SemiSpace
- Generational
- GenIX
- GenMS
- GenCopy
- Sticky
- StickyIX
- StickyMS
Methodology
- MMTk
- Jikes RVM 2.9.3
- (Perf HotSpot 1.5)
- Replay compiler
- Discard outliers
- Report 95th ile
Please see the paper for details.
22 Mutator Time
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
23Minimum Heap
24 GC Time
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
25Total Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
26Generational Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
27Sticky Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
28PseudoJBB 2000
On 2.4GHz Core 2 Duo
29PseudoJBB 2000
On 2.4GHz Core 2 Duo
30Prior Work
- http//www.ibm.com/developerworks/ibm/library/i-ga
rbage1/ - IBM product collector
- Mark-Region not characterized
- Collector not evaluated
- Product and basis for other research
- Domani et al 2000Kermany Petrank 2006
31Mark-Region Collection
Sweep-to-Free
32ImmixEfficient Mark-Region Collection
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
33Open Source Code available in JikesRVM 2.9.3
onward.http//www.jikesrvm.orgComplete data
available athttp//cs.anu.edu.au/Steve.Blackbur
n/pubs
34Research History
- PLDI 1998
- Clinger Hanson postulated the radioactive decay
model for object lifetimes - Genesis of Older-First
- Stefanovic, McKinley, Moss OOPSLA99
35Garbage Collection Hypotheses
- Generational hypothesis younger objects die
quickly, so collect them first - Older-first hypothesis the collector can collect
less the longer it waits
Age ordered heap
Survival function s(v) for object
lifetime distribution
s(v)
younger ? older 0 1/2V
V
36 Older-first Algorithm
37Next Steps
- Beltway
- BJMM PLDI02
- Increments
- Belts
- Combines generational and older-first
- Ulterior Reference Counting
- BM OOPSLA03
- Reference count on-per-object basis
- Responsiveness and throughput
- MMTk BCM SIGMETRICS04 ICSE04
- Toolkit for building understanding GC
- Motivated todays work
38Garbage Collection is the Answer to All Your
Problems
- Improves data and code locality
- Huang et al. OOPSLA02 ISMM04, VEE04
- Cooperative GC optimizations
- Colocation Guyer OOPSLA05
- Free-me Guyer et al. PLDI06
- Finds leaks
- Bond ASPLOS06, Jump POPL07
- Tolerates leaks
- Bond OOSLA08
- Helps with dynamic software updating!
- Subramaniam, Hicks ??08
- DaCapo Benchmarks
- Blackburn et al. OOPSLA06 CACM08
39380C
- Where are we where we are going
- Why you need to care about workloads
- Managed languages
- Dynamic compilation
- Inlining
- Garbage collection
- Opportunity to improve data locality on-the-fly
- Read X. Huang, S. M. Blackburn, K. S. McKinley,
J. E. B. Moss, Z. Wang, and P. Cheng, The Garbage
Collection Advantage Improving Program Locality,
ACM Conference on Object Oriented Programming,
Systems, Languages, and Applications (OOPSLA),
pp. 69-80, Vancouver, Canada, October 2004. - Alias analysis
- Dependence analysis
- Loop transformations
- EDGE architectures