Title: Conservative Garbage Collection
1Conservative Garbage Collection
- Stephan LeschJanuary 9, 2002slesch_at_studcs.uni-sb
.de
2Contents
- Intro
- Conservative GC
- Mostly Copying Collection
- Hidden Pointer Problems
- GC for C
3So Far
- Type-accurate GC
- locations of pointers are known
- no pointer arithmetic
- often tailored to one software product
- usually supported by compiler/runtime system
4Ambiguous Roots Collection
- every register/word potiential pointer
- non-supportive environment
- little/no knowledge about
- register usage
- object/stack layout
- should work with any C/C programs
- programmers dont want to pay for GC unless
needed - must coexist with explicit memory management
- The middle way
- programmer/compiler provide information to
recognize pointers
5Conservative GC
- Boehm/Demers/Weiser (Xerox PARC) 1988
- non-moving mark-and-deferred-sweep collector
- fully conservative, no reliance on compiler
- no extra bits to distinguish pointer/non-pointer
- no additional object headers
- for C and C
- for Unix, OS/2, Mac, Win95/NT
- supports incremental/generational collection
- can function as space leak detector
6Heap Layout
- Two logically distinct heaps
- Standard heap
- malloc / free
- compatible with existing code
- no pointers to collected heap!
- Collected heap
- GC_malloc
- GC_free to free known garbage
- pointers to standard heap ignored
7Layout of Collected Heap
- made up of blocks (e.g. 4 K, aligned to 4 K
boundaries) - one object size per block
- for each object size
- bitmap to mark allocated objects
- freelist (linked list of heap block slots)
- reclaimable blocks queue (deferred sweep)
- heap-block free-list
8Finding headers bit maps
9Allocation
for objects gt 1/2 block allocate chunk of
blocks(heap-block free list) none available
GC not enough space reclaimed expand
heap
for small objects pop free-list for this
size free-list is empty resume sweep
phase still empty GC not enough
space reclaimed expand heap
Clear object after allocation!
10Finding Roots Pointers
- possible roots registers, stack, static areas
- no cooperation from compiler
- treat every word as potential pointer
- ignore interior pointers (standard)
- prefer marking from false pointers over ignoring
valid pointers - Conservative Pointer Identification given word
p - does p refer to the collected heap?
- does it point into heap block allocated by
collector? - does it point to the beginning of an object in
that block? - if yes,
- mark object in block header
- push object onto mark stack
- finally reset mark bits of objects on free-lists
11Misidentification
- integers accidentally fulfilling validity tests
- avoid need to trace from interior pointers...
- ... or unaligned pointers 000000090000000A
- avoid addresses with lots of trailing 0s
- try to avoid generating false references
- collector clears non-atomic objects after alloc
- GC_malloc_atomic for objects without pointers
- programmer initialize structures
- programmer destroy obsolete pointers (dead
pointers on stack are often the most significant
source of leaks)
12Black Listing
- Idea dont allocate in heap blocks at addresses
likely to collide with invalid pointers - black list references to vincinity of heap which
fail validity tests - extra run before first allocation finds false
references in static data - additional space overhead lt 10
- but difficult to allocate gt100K without spanning
black-listed blocks
13Influence of Data Structures
- Problems with
- large structures interior pointers
- strongly connected structures
- Lisp
- small disjoint garbage structures
- lists constructed of cons-cells
- gt Conservative GC worked well, memory leaks
remain bounded (lt8 leakage, constant amount) - KRC
- large, strongly connected structures
- next pointers in objects
- gt collector thrashed
- Wentworth, 1990
14Efficiency (1)
- Comparative studies by Zorn, 1992 Detlefs et al.
1994 - real-world C programs (perl, xfig,
GhostScript) - comparing BDW w. explicit managers
- replace malloc() w. GC_malloc(), remove free()
- no further adaption
- used outdated versions (4.3 vs. 1.6/2.6)
15Efficiency (2)
- realistic alternative to explicit mem
management(20 avg execution time overhead over
best managers, up to 57 in worst case) - marks 3 MB/s on SparcStation II
- up to 3 times heap usage for small heaps (fixed
cost for collectors internal structs) - needs substantially more space to avoid
over-frequent GC - works best w. programs using very small objects
- might co-exist poorly with cache management(heap
blocks aligned on 4K boundaries)
16Incremental/Generational Mode
- marking in small steps interleaved with mutator
- need to detect later changes to connectivity in
traced parts of graph - read dirty bits for pages
- write-protect memory and catch faults
- when mark stack is emptytrace from all marked
objects on dirty heap blocks - reduces avg. pause times, increases total exec
time - generational GC uses knowledge which pages were
recently modified
17Mostly Copying Collection
- Joel Bartlett, 1988 (Digital)
- hybrid conservative / copying collector
- roots are treated conservative (dont move
referenced objects) - objects only accessible from heap-allocated
objects are copied(assumes pointers in
heap-allocated data can be found accurately) -
- faster allocation less problems with pointer
identification - more accurate GC
18Object layout
header
size
pointers
pointers
user data
non-pointers
- programmer has no control over object layout
- what if object layout should match hardware
registers or file structures?
19Heap layout
blocks with space identifiers
root
- current_space 1
- next_space 1
0
1
currently unused
1
42
currently unused
20Allocation
- within a block
- inc free-pointer
- dec free-slots-count
- if necessary search for free block
- (space_id ? current_space/next_space)
- set its space_id to next_space
- current_space next_space during allocation
21Collection
- GC when heap is half full (half of heap blocks
have space_idcurrent_space) - next_space current_space 1 mod n
- Fromspace current_space blocks
- Tospace next_space blocks
- scan roots conservatively for pointers into heap
- move potentially referred objects to Tospace
- changing space_id of their blocks to next_space
- add block to Tospace scan list
- copy graphs accessible from blocks on scan list
22Heap after Collection
root
- current_space 2
- next_space 2
2
2
1
42
currently unused
currently unused
23Bartletts GC algorithm (1)
- gc()
- next_space (current_space 1) mod 077777
- Tospace_queue empty
- for R in Roots
- promote(block(R))
- while Tospace_queue ! empty
- blk pop(Tospace_queue)
- for obj in blk
- for S in Children(obj)
- S copy(S)
- current_space next_space
24Bartletts GC algorithm (2)
- promote (block)
- if Heap_bottom ? block ? Heap_top
- and space(block) current_space
- space(block) next_space
- allocatedBlocks allocatedBlocks 1
- push(block, Tospace_queue)
- copy (p)
- if space(p) next_space or p nil
- return p
- if forwarded(p)
- return forwarding_address(p)
- np move(p, free)
- free free size(p)
- forwarding_address(p) np
- return np
25Generational Mode (1)
- One bit in space_id indicates young/old
generation - Other bits approximate age of objects/blocks
- Minor collection
- when 50 of free space after last GC is full
- young objects reachable from roots/remembered set
are promoted en masse (change space_id/copy) - remembered set maintained via memory protection
26Generational Mode (2)
- Major collection (mark-compact)
- when old generation occupies gt85 of heap
- mark accessible objects in old generation
- pass 1 find old generation blocks lt1/3
filledcopy objects to free space leaving
forwarding addresses - pass 2 rescan old generation, correct pointers
using forwarding addresses - expand heap if gt75 full
- maintaining remembered set costs time, but often
saves more time during GC(20 time improvement
on Scheme compiler)also reduces pause times in
interactive programs
27Efficiency (1)
- no thorough studies
- space overhead space_ids, type info, block
links, promotion bits 2 for 512 byte blocks
tagging data increases overhead - Mostly Copying vs. BDWMostly Copying probably
better with many shortlived objects, benefit from
faster allocation
28Experiences
- generational version 20 runtime improvement for
Scheme-to-C compiler - significant performance increase in CAD program
(reduced paging) - bad results for non-generational collector for
Modula-2 w. very large heaps (10s of Megabytes) - choose GC strategy that fits behaviour of mutator
29The optimising Compiler/User Devil
- conservative GC defeated by temporarily hidden
pointers - parts of graph may be unreachable
during a GC - pointer arithmetic
- adding tag bits
- e.g. optimized array traversal
xend xSIZE for( xltxend x) ...x... x -
SIZE ...x...
for (i0 iltSIZE i) ...xi... ...x...
inside loop x is interior pointer, afterwards x
points one past the end
30Machine-specific Optimizations
- struct l_thing
- char thing35000
- struct l_thing next
-
- struct l_thing
- tail(struct l_thing x)
- return (x-gtnext)
-
- on IBM RISC System/6000, tail() translates to
- AIU r3r3,1 r365536
- L r3SHADOW(r3, -30536) r335000
- BA lr
31Boehm and Chases Solution (1)
- local root set of function f at any point in
execution - register/auto variables
- previously computed values of direct
sub-expressions of incompletely evaluated
expressionsmallocs return value in
malloc(size) 4 - global root set
- declared static and extern variables
- local root sets of all call sites in call chain
- any values stored in other areas scanned by
collector - valid base pointer
- pointer to anywhere inside an object or one past
its end - BDW can handle such pointers
32Boehm and Chases Solution (2)
- every object on garbage collected heap must be
accessible from global root set through chain of
base pointersconservative collection safe with
strictly ANSI-compatible programs - suggested implementation
- preprocess source using macros that prevent code
generator from discarding live base pointers
prematurely - compile normally
- post-process assembly code, removing macro
artifacts - transparent to programmer compiler
- may interfere with instruction scheduling
- may increase register pressure
33Ellis and Detlefs solution
- annotate operations on pointers with names of
base pointers from which theyre derived - compiler treats these operations as uses of the
original base pointers, extending their live
ranges - code generation must respect live ranges
- requires changes to compiler
- does not alter sources
- does not rely on behaviour of volatile
declarations
34GC for C
- object-oriented languages often use more
heap-allocated data - generate more complex data structures
- GC uncouples memory management from class
interfaces instead of dispersing it through code
35Conservative GC for C
- requires no changes to language
- restriction on coding style holds no hidden
pointers (converted to int) - existing code may violate the restriction
- aggressive optimisers may as well
- safety must be enforced in code-generator
- some support for finalization (GC_register_finaliz
er) - assuming few objects need finalization
36Mostly Copying for C
- storing all pointers at beginning of objects
interferes with inheritance (fast field lookup) - here user supplies callback methods to identify
pointers
class Tree public Tree left Tree
right int data Tree (int x) GCCLASS(Tree)
...
GCPOINTERS(Tree) gcpointer(left) gcpointer(ri
ght)
GCPOINTERS macro generates callback method
TreeGCPointers
- currently no support for finalisation
37Benefits of pointer locating methods
- programmer may solve unsure reference
problemunion int n thing ptr x - enables semantically accurate markinge.g.
stacks, queues - automatic GC retains uncleared references to
removed elements - programmer can omit them
- even better than type-accurate GC
38Using Object Descriptors
- Detlefs, 1991 extension to Mostly Copying
- insert descriptor into object headers
- Bitmap format
- 1 word with 32 bits indicating pointer/non-pointer
words - use if only first 32 words of user data contain
pointers, cant handle unsure references - Indirect format
- pointer to byte array encoding sure/unsure
references and non-pointer values - array can be compressed using repeat counts
- Fast indirect format
- array of ints 1st number indicates repetitions
of rest - subsequent numbers number of words to skip to
reach next pointer, negative number indicates
unsure reference
39Conclusion
- GC effective for traditional imperative languages
- realistic alternative to explicit mem management
for most applications - not yet suitable for real-time / safety-critical
applications - no big onstraints to coding style, except hidden
pointer problem - gcing allocators competitive even with code not
written for GC - GC should have hooks for client/programmer to
communicate their knowledge - explicit deallocation calls
- atomic objects
- hints of appropriate times to collect