Parallel Garbage Collection - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Parallel Garbage Collection

Description:

Collected objects linked to free lists used by allocator ... Object traversal status kept by object coloring ... pages when object modified. Dirty pages ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: tims175
Category:

less

Transcript and Presenter's Notes

Title: Parallel Garbage Collection


1
Parallel Garbage Collection
  • Timmie Smith
  • CPSC 689
  • Spring 2002

2
Outline
  • Sequential Garbage Collection Methods
  • Multi-threaded Methods
  • Parallel Methods for Shared Memory
  • Parallel Methods for Distributed Memory

3
Motivation
  • Good software design requires it
  • Modular programming, OO even more so, mandates
    components be independent
  • Explicit memory management requires modules to
    know what others are doing so they can deallocate
    objects safely.
  • Introduces bookkeeping that makes modules
    brittle, hard to reuse, and hard to extend
  • Garbage collection allows modules to not worry
    about memory management
  • Modules dont have to have bookkeeping code
  • Reusability and extensibility are improved
    immediately
  • Memory leaks are avoided

4
Sequential Garbage Collection
  • Basic Collection Techniques
  • Reference Counting
  • Mark-Sweep
  • Mark-Compact
  • Copying
  • Non-Copying Implicit Collection
  • Incremental Tracing Techniques
  • Generational Techniques

5
Garbage Collection Abstraction
  • An object is not garbage if it is live, or is
    reachable from any live object.
  • 2-phase abstraction of garbage detection followed
    by collection used.
  • Detection determines which objects are live.
  • Root Set all global objects,local objects, and
    objects on stack
  • Iteratively find and add objects to the Root Set
    reachable from the Root Set until nothing is
    added
  • Collection frees any object that is not live.

6
Reference Counting
  • Object headers store number of references to
    object
  • Object collected as soon as there are no
    references to it
  • Operations to update count make technique
    expensive
  • Reference cycles between objects limit
    effectiveness
  • Method can be incremental to limit program pauses
  • Overhead of method is proportional to work done
    by program

7
Mark-Sweep Collectors
  • Traces from the root set and marks all live
    objects, then sweeps heap to collect unmarked
    objects
  • Collected objects linked to free lists used by
    allocator
  • Disadvantages include fragmentation, cost of
    collection, and decrease of locality
  • Fragmentation caused by objects not being
    compacted
  • Cost of collection is proportional to size of the
    heap
  • Spatial locality lost as objects allocated among
    older objects

8
Mark-Compact Collectors
  • Sweep phase of Mark-Sweep modified
  • Collected objects not linked to free list
  • Marked objects copied into contiguous memory
  • Pointer to end of contiguous space maintained for
    new allocation
  • Overhead of Sweep not improved
  • Entire heap still swept to find unreachable
    objects
  • Live objects must be swept several times
  • First pass relocates objects
  • Additional passes required to update pointers
  • Mechanisms to handle pointers also adds overhead
  • Lookup table kept while objects being relocated
  • Indirection of forward pointers used if program
    not stopped

9
Copying Collectors
  • Heap is split into from space and to space
  • Collection triggered when object cannot be
    allocated in the current space
  • Program stopped to avoid pointer inconsistencies
  • Forward pointers used to handle objects
    referenced multiple times
  • Work proportional to number of live objects
  • Collection frequency decreased by increasing size
    of memory spaces

10
Non-copying Collectors
  • Spaces of copying collector treated as a set
  • Tracing moves live objects to second set
  • After tracing objects in first set are garbage
  • Sets are implemented as a linked list
  • Subject to same locality and fragmentation issues
    as Mark-Sweep collectors

11
Incremental Tracing Collectors
  • Collection interleaved with program execution
  • No Stop the World pause in program execution.
  • Program can change reachability of objects while
    collector is running.
  • Program is referred to as the mutator.
  • Collector must be conservative to be correct
  • Restarting to collect all garbage caused by
    changes doesnt help.
  • Some garbage floating until the next collection

12
Tri-color marking system
  • Object traversal status kept by object coloring
  • Simple mark-sweep or copying need only two colors
    because collection occurs when mutator paused.
  • Incremental approaches require third color to
    handle changes in reachability.
  • Black object is live and all children have been
    traversed
  • Grey object is live, children have not been
    traversed
  • White object not yet reached
  • Mutator must coordinate with collector if a
    pointer to a white object is added to a black
    object.

13
Tri-color Marking Example
A
A
B
C
C
B
D
D
  • Mutator modifies A and B while garbage collector
    examines Bs descendants
  • Mutator must coordinate with garbage collector to
    prevent D being collected.

14
Mutator/Collector Coordination
  • Coordination must update collector when a pointer
    is overwritten.
  • Read Barrier detects when mutator accesses a
    pointer to a white object and immediately colors
    the object grey.
  • Write Barrier mutator attempts to write a
    pointer into an object are trapped.
  • Two different write barrier approaches

15
Write Barrier Approaches
  • Snapshot-at-the-Beginning
  • Ensures a pointer to an object is not destroyed
    before the collector traverses it.
  • Pointers are saved before they are overwritten.
  • Incremental Update
  • When a pointer is written into a black object,
    the object is changed to gray and is rescanned
    before collection is completed.
  • No extra bookkeeping structure needed.

16
Generational Collectors
  • Based on empirical evidence that most objects are
    short lived.
  • Heap space split into generational spaces
  • Older generation spaces are smaller
  • Spaces collected when allocation in the space
    fails
  • Live objects found during collection of a
    generation advanced to older generation
  • Long-lived objects copied fewer times than in
    copying collector
  • Heuristics used to determine when to advance
    objects to next generation

17
Intergenerational References
  • Method must be able to collect one generation
    without collecting others
  • Pointers from older generations to younger
    generation.
  • Table to store pointers in older objects used in
    root set
  • Write barrier technique used in incremental
    collectors
  • Pointers from young generations into older
    generations
  • Write barrier technique to trap all pointer
    assignments
  • Use live objects in all younger generations in
    root set

18
Multi-threaded Methods
  • Attempt to reduce pauses caused by stopping the
    world 2
  • Garbage collector is a separate thread that is
    run concurrently with the application.
  • Coordination with application is minimized
  • Sweep proceeds while application running
  • Application marks pages when object modified
  • Dirty pages rescanned before collection

19
Parallel Garbage Collection
  • Parallelization of sequential methods
  • Mark-and-Sweep
  • Reference Counting
  • Different issues in each environment
  • Shared variable access in shared memory systems
  • Disjoint address spaces in distributed memory
    systems
  • Scheduling in both environments involves stopping
    application threads during tracing.
  • Long pauses avoided by incremental collection
  • Improves performance in SPMD programs since
    application has frequent global synchronizations.

20
Shared Memory
  • Reference Counting
  • References to object updated by all processors
  • Locks on object headers limit scalability
  • Mark-Sweep
  • Each processor begins marking from a local root
    set, and atomically marks an object
  • Poor scalability unless some mechanism for load
    balancing implemented
  • Processor must mark all descendants of an object
    it marks
  • Work stealing allows load rebalancing and
    improved results
  • Splitting large objects also allows for better
    load balance.

21
Distributed Memory
  • Biggest challenge is representing cross-processor
    references.
  • Remote Processor a stub entry is pointed to by
    the pointer
  • Processor id of the object owner
  • Complement of the remote object address
  • Local Processor an entry table maintains all
    references
  • First export of an object reference enters object
    in table
  • Object is never reclaimed without cooperation of
    processors
  • Fields of stub and entry table objects are the
    same
  • Flag distinguishes type of object
  • Count a count of the number of unrecieved
    messages referencing the object.

22
Distributed Memory
  • Marking Phase
  • Processors begin with local root set and mark all
    local objects
  • When local marking is complete, mark messages
    are sent to remote processors for each marked
    stub
  • Remote processor receives message and adds object
    to mark stack and continues local marking.
  • When local marking complete and no more messages
    are received, remote processor acknowledges
    messages sent.
  • Marking complete when acknowledgement for first
    message sent is received.

23
Distributed Memory
  • Collection Phase
  • Expand the heap
  • Processors notified of largest local heap at end
    of each collection. ?H? lt cM, where c lt 1 and M
    is the max heap size.
  • Local collection occurs when the heap cannot be
    expanded.
  • Global collection occurs when local collection is
    insufficient.
  • Global collection allows entry tables to be
    cleared.
  • Infrequent global collections minimize impact of
    collector on application performance.

24
Summary
  • Non-copying methods are the safest for languages
    where pointers are not identifiable
  • Fragmentation and loss of locality limit
    performance of these methods
  • Copying collectors are preferred in cases where
    memory is limited and pointers can be found
  • Parallel Garbage Collection can be based on
    parallelization of sequential methods.
  • Parallel collectors subject to same issues as
    their sequential counterparts
  • Parallel collectors also subject to
    synchronization and communication issues while
    maintaining references and performing collection.

25
References
  • 1 Hans Boehm and Mark Weiser. Garbage
    Collection in an Uncooperative Environment.
    Software Practice and Experience. September,
    1988.
  • 2 Hans-J. Boehm, Alan J. Demers, and Scott
    Shenker Mostly Parallel Garbage Collection.
    Proceedings of the Conference on Programming
    Language Design and Implementation (PLDI). 1991  
  • 3 Hans-J. Boehm Fast Multiprocessor Memory
    Allocation and Garbage Collection. External
    Technical Report HPL-2000-165, HP Labs. December
    2000.
  • 4 David L. Detlefs, Al Dosser and Benjamin
    Zorn. Memory Allocation Costs in Large C and C
    Programs. Technical Report CU-CS-665-93,
    University of Colorado - Boulder, 1993.
  • 5 John R. Ellis and David L. Detlefs. Safe,
    efficient garbage collection for c. Technical
    report, Xerox Palo Alto Research Center, June
    1993.
  • 6 Kenjiro Taura and Akinori Yonezawa An
    Effective Garbage Collection Strategy for
    Parallel Programming Languages on Large Scale
    Distributed-Memory Machines. Proceedings of the
    Symposium on Principles and Practice of Parallel
    Programming (PPOPP). 1997.
  • 7 Paul R. Wilson Uniprocessor Garbage
    Collection Techniques. Proceedings of the
    International Workshop on Memory Management
    (IWMM). 1992.
  • 8 Toshio Endo, Kenjiro Taura and Akinori
    Yonezawa, A Scalable Mark-Sweep Garbage Collector
    on Large-Scale Shared-Memory Machines in
    Proceedings of High Performance Networking and
    Computing (SC97), November 1997.
  • 9 Hirotaka Yamamoto, Kenjiro Taura, and Akinori
    Yonezawa. Comparing Reference Counting and Global
    Mark-and-Sweep on Parallel Computers in Lecture
    Notes for Computer Science (LNCS), Languages,
    Compilers, and Run-time Systems (LCR98), volume
    1511, pp. 205-218. May 1998.
Write a Comment
User Comments (0)
About PowerShow.com