Garbage collection for data structure organisation - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Garbage collection for data structure organisation

Description:

Traverses the objects in the heap determining if they are live or not. Tracing from the roots. ... Tracing continues as normal across the heap. ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 25
Provided by: george253
Category:

less

Transcript and Presenter's Notes

Title: Garbage collection for data structure organisation


1
Garbage collection for data structure organisation
  • by George Kurian

2
The basics
  • Garbage Collector
  • Responsible for managing memory in Java (and
    other languages).
  • Its run to free memory.
  • Traverses the objects in the heap determining if
    they are live or not.
  • Tracing from the roots.
  • Method stack.
  • Global variables
  • Others
  • Once every linked object is scanned and marked,
    we know everything else is garbage.
  • Live objects and dead objects
  • Live objects are those currently still in use by
    the running program.
  • References contained somewhere.
  • Dead objects have no references to them such
    objects can be garbage collected and their space
    recovered.

Live objects
External References
Dead objects
3
The problem
  • Complex data structures
  • Highly connected internally.
  • Object references not always a good indication of
    the necessity of an object.
  • Object might be garbage but is still referenced
    internally.
  • Garbage Collection only understands Object
    references.
  • No concept of the internals of the data
    structure.
  • Not possible for data structure to correctly
    identify needed objects without knowledge of
    every object that uses it.
  • Garbage Collector has this knowledge.
  • Due to us continually maintaining unused objects
    we create a memory leak.

4
The target problem
  • Persistent Data Structures
  • Persistent Array
  • Stores multiple versions of the same data.
  • Might have versions that no one uses.
  • The items are garbage but we cannot safely remove
    them easily.
  • Union Find
  • Provides the simplest case for reasoning about
    the solution.

5
The solution must be..
  • Transparent
  • The complexity of addressing the cleanup should
    not be exposed to the user or the programmer of
    the data structure (or at least be minimised).
  • Efficient
  • We should not impose undue cost on the usual
    functioning of the GC. The advantages must be
    worth having.
  • Minimal Change
  • There should be the least change possible made to
    the garbage collector.
  • Ensures better portability and generalisability.
  • Safe
  • We should not affect the semantics of existing
    objects.

6
Failed solutions
  • The various approaches affected the balances.
  • Branch adjusted list
  • Data structure intensive solution.
  • Relies on run time reorganisation.
  • Uses standard GC to clean up objects.
  • Fat Nodes and Finalizers
  • Uses already available GC features.
  • Data structure representation vastly different
    from usual.
  • Table of weak references
  • Support from GC and Data Structure
  • Mutator time entry addition, collection time
    entry verification.
  • Data Structure controls cleanup.

7
Failed solutions (2)
  • Fat Nodes and Finalizers
  • No significant change to garbage collector
    required.
  • - Exceedingly bulky data structure, even for
    simple purposes.
  • - Low user transparency, must always deal with
    tokens.
  • Branch adjusted list
  • No change to the garbage collector required.
  • Access of elements has the shortest possible
    path to the parent.
  • - High penalty on addition and removal of data.
  • - Low sharing Greater number of objects.
  • - Low programmer transparency, has to deal with
    how exactly to translate structure.

0

T
R
S
a
d
n
1
S
b
T
R
S
8
The basics (2)
  • Generational Copying Collector
  • Heap divided into different regions.
  • Maintains different generations of objects.
  • Mature, Nursery.
  • Collection is done on separate regions.
  • Newly created objects die first.
  • Nursery collection is always done
  • Mature collection done when we need additional
    space.
  • Conservative, assumes mature objects are always
    live.
  • Copies live objects from one half to the other.
  • Minimises the time taken in doing a GC .
  • We dont check if every object in the heap is
    live every time.
  • Write barrier.
  • Objects in nursery that have a reference from
    Mature always need to be preserved.
  • Remset
  • Offers a way of storing Object References during
    the GC.

9
Generational Copying Collector
Allocation into nursery
  • Normal GC

Nursery
Mature1
Mature0
Live objects copied into mature
Nursery
Mature1
Mature0
Dead objects left behind
10
Generational Copying Collector
Mature object
Allocation into nursery
  • Full GC

Nursery
Mature1
Mature0
Live objects copied into new mature
Nursery
Mature1
Mature0
Dead objects left behind
Once GC is finished the space is available to be
allocated in again.
11
JikesRVM
  • Provides a framework for implementing GC
    algorithms in Java.
  • Extended current implementation of GenCopy to add
    complex data structure handling.

12
Solution components
  • Data structure
  • Identify needed elements within the structure.
  • Reorganise the data structure as not to point to
    unneeded objects.
  • Garbage Collector
  • Inform the GC about which objects we need
    information about.
  • The data structure objects.
  • GC tells us which of these are live and calls
    appropriate methods on them.
  • We need information about the fringes of the data
    structure.
  • Clears up unneeded items.
  • Items left behind after reorganisation as well as
    other dead objects.

13
Features of the Data Structure
  • Interface for cleanup
  • Specifies the actions the data structure must do
    with the information provided by the garbage
    collector.
  • Cleanup treats every object independently of data
    structure.
  • No way to determine which object belongs to which
    structure.
  • Can generalise cleanup for anything that
    implements the interface.
  • Reference count field
  • Exists per item of the data structure.
  • Maintains a count of how many versions require
    that specific object.
  • Three stage cleanup
  • Stage 1
  • Needed nodes identified by looking if the index
    has been previously seen in the version or not.
  • As we traverse the data structure, increment the
    count of every needed object.
  • Stage 2
  • Traverse the data structure again but if there
    are objects that are not marked as needed we
    change the parent of that object to the parent of
    the unneeded object and continue.
  • Stage 3
  • Reset the counts on the objects in the data
    structure to prepare it for the next time GC is
    run.

14
Identifying Special objects
  • Allocating special objects
  • Requires identification of object.
  • Check every object
  • Implements an interface.
  • Very expensive.
  • Advice file
  • Specifies where the object is created in the
    program.
  • Value passed along with allocation is used to
    allocate into special space.
  • Annotations?
  • Can be picked up by the runtime to handle the
    object in a special way.

15
Garbage Collector
  • Modified the number of spaces.
  • Added a special space to hold our data structure
    objects.
  • Allows us to identify our special objects from
    every other object.
  • Can call appropriate methods on special objects
    to clean the structure.
  • Can find out if an object is a special object
    based on its physical location.
  • Makes checking easier during tracing.
  • Modified the allocator
  • Allocates the objects into the special space.
  • Reads the allocation details before we run the
    program.
  • Modified the Trace
  • We dont carry out a standard trace over our
    special objects without processing first.
  • Two Stages
  • Tracing continues as normal across the heap.
  • If we are referring to a special object, we copy
    the object into the new half.
  • Further tracing is not done, instead we hold it
    in our own remset for processing later.
  • After tracing is over
  • Process the remset, reshuffling the links between
    the objects
  • Restore objects back to standard trace and finish
    the trace
  • Modified the collection time

16
Union Find example
  • Given we have interconnected nodes in our special
    space.
  • We want to reroot the externally referenced
    nodes.
  • During the trace we store away these external
    referenced objects into our own remset.
  • Once the trace is finished we look through the
    remset, casting them to the interface type.
  • We call the reroot methods on the nodes.
  • We restore the nodes that we kept into the main
    tracing remset and we complete tracing over them.
  • In the persistent data structure we removed
    unneeded versions.

17
Garbage Collector internals
  • Space layout

Special0
Mature1
Mature0
Special1
Nursery
Special Space
Externally referenced
18
Constant problems
  • Allocation during Garbage Collection.
  • Space might not be available during the GC.
  • Might trample on pre existing data.
  • Compromises some transparency of data structures.
  • Have to add fields to the objects themselves to
    help us.
  • Trivial cost, we make the data structure pay for
    carrying out cleanup.
  • Loading of unknown methods during GC causes
    allocation.
  • Problem faced and not ideally resolved.
  • Methods for cleanup must be preloaded.

19
Resolution
  • We do a normal trace over all the objects in the
    special space.
  • At the very end once we are clear of the Garbage
    Collector we process the remset.
  • Avoids allocation problems as the allocator is
    pointing to the right space to allocate into.
  • Differed garbage collection of data structure
    till the next GC.
  • But normal garbage will still be collected.

20
Optimisations
  • Optimisations to the data structure like minimal
    connectivity can be done during Garbage
    Collection.
  • Additional cost in terms of algorithmic
    complexity and time we spend in a GC.
  • Improves overall access times to elements in the
    data structure.
  • The costs have to be weighed, not always possible
    on data structures
  • It was possible in our list based structure.

21
Optimisation example
  • Nodes not garbage, but connectivity could be
    improved.

From
To
ROOT
ROOT
1
1
1
2
1
2
3
3
Versions
Versions
22
Conclusions
  • Complexity distributed over GC and data
    structure.
  • Good transparency.
  • The user has to know very little about how the
    garbage collector will collect the structure.
    Just needs to implement simple methods.
  • Object comparison
  • Rest are inherited
  • Good efficiency.
  • Initial results show time taken on standard GC is
    shorter.
  • Results still underway.
  • Reasonable changes.
  • Implementation extends the use of existing models
    already in use.
  • Keeps the implementation simple.

23
Generalisations
  • Works for a specific case.
  • Should work for any case that needs to identify
    external references.
  • Requires assistance from both Garbage Collector
    and Data Structure.
  • Cheapest way of cleaning up the data structure.
  • Data structure abstract class deals list based
    structures only at present.
  • To Do
  • Cycles.
  • Simpler way of allocating objects
  • Annotations
  • Other data structures that arent list based.
  • Loading of the interface methods before usage.
  • Requires change to the Class Loader

24
The End.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com