A Parallel, RealTime Garbage Collector - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

A Parallel, RealTime Garbage Collector

Description:

Base on Cheney's simple copying collector ... Cheney's technique. Keeping them in contiguous locations in to-space. Pros. Simple. Cons ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 30
Provided by: csM6
Category:

less

Transcript and Presenter's Notes

Title: A Parallel, RealTime Garbage Collector


1
A Parallel, Real-Time Garbage Collector
  • Author Perry Cheng,
  • Guy E. Blelloch
  • Presenter Jun Tao

2
Outline
  • Introduction
  • Background and definitions
  • Theoretical algorithm
  • Extended algorithm
  • Evaluation
  • Conclusion

3
Introduction
  • First garbage collectors
  • Non-incremental, non-parallel
  • Recent collector
  • Incremental
  • Concurrent
  • Parallel

4
Introduction
  • Scalably parallel and real-time collector
  • All aspects of the collector are incremental
  • Parallel
  • Arbitrary number of application and collector
    threads
  • Tight theoretical bounds on
  • Pause time for any application
  • Total memory usage
  • Asymptotically but not practically efficient

5
Introduction
  • Extended collector algorithm
  • Work with generations
  • Increase the granularity of the incremental steps
  • Separately handle global variables
  • Delay the copy on write
  • Reduce the synchronization cost of copying small
    objects
  • Parallelize the processing of large objects
  • Reduce double allocation during collection
  • Allow program stacks

6
Background and Definitions
  • A semispace Stop-Copy Collector
  • Divide heap memory into two equally-sized
  • From-space and to-space
  • Suspend mutator and copy reachable objects to the
    to-space when from-space is full
  • Update root values and reversing the role of
    from-space and to-space

7
Background and Definitions
  • Types of Garbage Collectors

8
Background and Definitions
  • Type of Garbage Collector (continued)

9
Background and Definitions
  • Real-time Collector
  • Maximum pause time
  • Utilization
  • The fraction of time that the mutator executes
  • Minimum Mutator Utilization
  • A function of window size
  • Minimum utilization at all windows of that size
  • 0 when window size lt maximum pause time

10
Theoretical Algorithm
  • A Parallel, incremental and concurrent collector
  • Base on Cheneys simple copying collector
  • All objects are stored in a shared global pool of
    memory
  • Two atomic instruction
  • FetchAndAdd
  • CompareAndSwap
  • Collector interfaces with the application
  • Allocating space for a new object
  • Initializing the fields of a new object
  • Modifying the field of an existing object

11
Theoretical Algorithm
  • Scalable Parallelism
  • Maintain the set of gray objects
  • Cheneys technique
  • Keeping them in contiguous locations in to-space
  • Pros
  • Simple
  • Cons
  • Restricts the traversal order to breadth-first
  • Difficult to implement in a parallel setting

12
Theoretical Algorithm
  • Scalable Parallelism (continued)
  • Explicitly managed local stack
  • Each processor maintains a stack
  • A shared stack of gray objects
  • Periodically transfer gray objects between local
    and shared stack
  • Avoid idleness
  • Pushes (or pops) can proceed in parallel
  • Reserve a target region before transfer
  • Pushes and pops are not concurrent
  • Room sychronization

13
Theoretical Algorithm
  • Scalable Parallelism (continued)
  • Avoid white objects being copied twice
  • Exclusive access by atomic instructions
  • Copy-copy synchronization

14
Theoretical Algorithm
  • Incremental and Replicating Collection
  • Bakers incremental collector
  • Copy k units of data when allocate a unit of data
  • Bound the pause time
  • Mutator can only see copied objects in to-space
  • A read barrier is needed
  • Modification to avoid the read barrier
  • Mutator can only see the original objects in
    from-space
  • A write barrier is needed

15
Theoretical Algorithm
  • Concurrency
  • Program and collector execute simultaneously
  • Program manipulate primary memory graph
  • Collector manipulate replica graph
  • A copy-write synchronization is needed
  • Replica objects should be modified
    correspondently
  • Avoid race condition
  • Mark objects being copied
  • Mutators update to replica should be delay
  • A write-write synchronization is needed
  • Prohibit different mutator threads from modifying
    the same memory location concurrently

16
Theoretical Algorithm
  • Space and Time Bounds
  • Time bounds on each memory operation
  • ck
  • C a constant
  • K the number of words we collect per word
    allocated
  • Space bounds
  • 2(R(11.5/k)N5PD) 2(R(11.5/k)
  • R reachable space
  • N maximum object count
  • P P-way multiprocessor
  • D maximum memory graph depth

17
Extended Algorithm
  • Globals, Stacks and Stacklets
  • Globals
  • Updated when collection ends
  • Arbitrary many -gt unbound time
  • Replicate globals like other heap objects
  • Every global has two location
  • A single flag is used for all globals
  • Stacks and Stacklets
  • Divided stacks into fixed-size stacklets
  • At most one stacklet is active and the other can
    be replicated savely
  • Also bound the waste space per stack

18
Extended Algorithm
  • Granularity
  • Block Allocation and Free Initialization
  • Avoid calling FetchAndAdd for every memory
    allocation
  • Each processor maintain a local pool in
    from-space and a local pool in to-space when
    collector is on
  • Using a FetchAndAdd when allocating a local pool
  • Write Barrier
  • Avoid updating copied objects every time
  • Record a triple ltx, i, ygt in a write log and
    defer
  • Invoke the collector when the write log is full
  • Eliminating frequent context switches

19
Extended Algorithm
  • Small and Large Objects
  • Original Algorithm
  • One field at a time
  • Reinterpretation of the tag word
  • Transferring the object from and to the local
    stack
  • Extended Algorithm
  • Small objects
  • Locked down and copied at a time
  • Large objects
  • Divided into segments
  • One segment at a time

20
Extended Algorithm
  • Algorithmic Modifications
  • Reducing double allocation
  • One allocation by mutator and one by collector
  • Deferring the double allocation
  • Rooms and Better Rooms
  • A push room and a pop room
  • Only one room can be non-empty
  • Rooms
  • Enter the pop room, fetch work and perform,
    transition to the push room, push objects back to
    the shared stack
  • Graying objects is time-consuming
  • Wait for entering the push room

21
Extended Algorithm
  • Algorithm modifications
  • Rooms and Better Rooms (continued)
  • Better rooms
  • Leave the pop room after fetching work from
    shared stack
  • Detect the shared stack is empty by maintaining a
    borrow counter
  • Generational Collection
  • Nursery and tenured space
  • Trigger a minor collection when nursery space is
    full
  • Trigger a major collection when tenured space is
    full
  • Tenured references might not be modified during
    collection
  • Hold two fields for mutable pointer
  • one for mutator to use, the other for collector
    to update

22
Evaluation
23
Evaluation
24
Evaluation
25
Evaluation
26
Evaluation
27
Evaluation
28
Evaluation
29
Conclusion
  • Implements a scalably parallel, concurrent,
    real-time garbage collector
  • Thread synchronization is minimized
Write a Comment
User Comments (0)
About PowerShow.com