Title: A Comparative Evaluation of Parallel Garbage Collectors
1A Comparative Evaluation ofParallel Garbage
Collectors
- Dick Attanasio David F. Bacon
- Anthony Cocchi Stephen Smith
- IBM T.J. Watson Research Center
- Presented at the University of Washington
- June 5, 2001
2Jalapeño Goals
- Java VM for large MP servers
- Scalability
- Large heaps
- Large numbers of processors
- Large numbers of threads
- Example IBM RS/6000 S80 running Websphere
- 50 GB RAM
- 24 64-bit PowerPC RS64 III Processors
- Thousands of servlets/beans
- GC must be parallel and/or concurrent
3The Problem with Parallel GC
- Limited experience
- Logic programming, functional languages
- Applicability to Java?
- Unknown relative performance of techniques
- Scaling properties not known
- Processor scaling
- some studies
- Memory scaling
- zilch
4Compare Parallel GCs
- Implement a wide variety of techniques
- Gain first-hand experience
- Both general and Java-specific issues
- Short term
- Find best GC and use as default
- Long term
- Understand how to target GCs to apps/desiderata
- Select automatically or even adaptively
5Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
6Jalapeño Features
- Entirely written in Java
- M x N threading
- Java threads multiplexed onto virtual processors
- Processor-local allocation
- Small requests handled from VP-local blocks
- Safe points
- Context switch at compiler-controlled points
- Type accuracy
- Exact GC
- Inlined allocation
- Just another inlined Java method
7Modular Collector Design
- Multiple compilers (base, quick, opt)
- Multiple GCs
- Shared code
- Parallel collection phases and synchronization
- Stack/object scanning
- Large object management
- GC must be selected at build time
- Bootstrapping issues
8 Garbage Collectors Implemented
- Copying semi-space (CP)
- Non-compacting mark-sweep (MS)
- Generational
- Copying (CG)
- Mark-sweep (MSG)
- Hybrid copying young, mark-sweep old (H)
- Concurrent reference counting/cycle collecting
- Described in PLDI01 and ECOOP01 papers
9Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
10Memory Layout
11Large Object Space
Small Object Space
Boot Image
- Uses first-fit strategy with 4KB pages
- For objects larger than 2KB
- Objects never move
- Large object code shared by all collectors
12Semi-space Heap Layout
SEMISPACE 1
SEMISPACE 2
Boot Image
Large Object Space
Allocated Objects
13Generational Semi-space Layout
SEMISPACE 1
SEMISPACE 2
NURSERY
Boot Image
Large Object Space
Allocated Objects
14Mark-sweep Heap Layout
Boot Image
Large Object Space
15Mark-sweep Heap Layout
Boot Image
Large Object Space
Mark Arrays
16Hybrid Heap Layout
NURSERY
MATURE SPACE
Boot Image
Large Object Space
17Allocation
- Inlined automatically
- all Java code
- Properties analyzed at compile-time
- Object size
- Finalizable
- Allocation cost
- 10 PowerPC instructions for copying
- 17 PowerPC instructions for mark-sweep
18Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
19General Properties of Collectors
- Parallel
- Stop-the-world
- Phases separated by barrier synchronization
- Work buffers for scanning
- Load balancing
- Since its in Java, copying collector
- Must move stack, register save objects
20Alloc Fails
A
B
C
D
E
F
VP1
VP2
VP3
Init.
Find Roots
Mark/Copy All Reachable Objects
Find Final.
Scan Final.
End GC
Mutator Activity
Garbage Collection
21Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
22Performance Factors Winner(s)
- Object allocation CP H CG
- Copying MS H CG
- Frequency of Collection MS H
- Space loss
- Semispace (mature, nursery) MS H
- Fragmentation CP
- Write barriers CP MS
- Locality CP
23Performance Fastest
24Performance Smallest
25Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
26SPECjbb 400 MB Processor Scaling
27SPECjbb 8 CPUs Heap Scaling
28SPECjbb Total Mature GC Time
29SPECjbb Total GC Time
30SPECjbb Mutator-to-GC Switch
31SPECjbb Major Collection Time
32SPECjbb Minor Collection Time
33Outline
- Motivation
- Overview of Collectors
- Memory Organization
- Phases of Collection
- Performance overview 6 benchmarks
- Performance details SPECjbb
- Conclusions
34Conclusions
- JVM in Java wins
- M x N threading allows fast GC transition
- Exact GC
- When memory is abundant
- Winner is not obvious
- When memory is constrained
- Mark-sweep or hybrid always wins
35How to choose GC dynamically?
- Memory headroom
- Working set size
- Allocation rate
36Current Status
- Mature PowerPC compiler
- Optimizer
- Feedback-directed optimization
- Six garbage collectors
- Linux/Intel port underway
- Baseline compiler due 6/01 opt 12/01
- Other unofficial ports
- Win32
- Linux/Macintosh
37Availability
- University license available. Licensees
- Michigan State U. Kent
- Purdue U. U. Massachussets
- Rutgers U. U. New Mexico
- U. Colorado U. Wisconsin
- U. Illinois
- More information
- dfb_at_watson.ibm.com
- http//www.research.ibm.com/jalapeno