Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 9 Class 1
- Section 5.2.3 to 5.2.4.3
- (415-425 )
- Ray Schneider
2Topics of the Day
- Actual Garbage Collection Algorithms
- REFERENCE COUNTING
- MARK and SCAN
3Reference Counting
- an intuitive garbage collection algorithm that
records in each chunk the number of pointers that
point to it - when the number drops to zero the chunk can be
declared garbage - Reference Counting literally collects Garbage
while the other methods actually collect
reachable chunks
4How Reference Counting Works
- When a Chunk is allocated from the Heap its
Reference Count is initialized to one - Whenever a reference to the Chunk is duplicated
its Reference Count is incremented - Whenever a reference to the Chunk is deleted its
Reference Count is decremented - If the Reference Count drops to zero the Chunk
can be freed because it is no longer reachable
5Chunks with Reference Count in a Heap
Returning the chunk with a zero reference count
to the free list is not enough to reclaim all
garbage.
6Result of removing the reference to b
For example, deleting the reference to chunk b
means that chunk e also becomes garbage, but f
remains in use since there is still one reference
left so its reference count does not drop to zero.
7Two Main Issues
- Implementing Reference Counting requires
- 1) Keeping track of all reference manipulations,
and - 2) Recursively freeing chunks with zero reference
count - Compiler plays an important part in keeping track
of references - the compiler inserts special code for all
reference manipulations - Recursive freeing is accomplished by a run-time
routine
8Compiler Tracking of references
- Compiler inserts special code for ALL reference
manipulations - incrementing the reference count when a reference
to a chunk is duplicated - References are typically duplicated as an effect
of some assignment in the source language a
variable in the program data area, a field in a
dynamically allocated data structure, parameter
transfers since reference passing is effectively
an assignment statement to the local variable - Not all references are to the heap, many point to
the program data area and these must not be
followed - decrementing the reference count when such a
reference is deleted - deleted implicitly by assignment. Assignment to
a reference variable overwrites the current
reference with a new value, so before installing
the new reference must decrement the current
reference
9Code generated for pointer assignment
- Given pointer assignment pq we generate code
such as that shown below - Another source of reference deletion is returning
from a routine call - On return all Local Variables are deleted
- Local variables holding references to chunks
should be processed to decrement the associated
reference count - This also applies to parameters that hold
references
Processing a Pointer Assignment
IF Points into the heap (q) Increment q
.reference count IF Points into the heap (p)
Decrement p .reference count IF p .reference
count 0 Free recursively depending on
reference counts (p) SET p TO q
10Memory Deallocation
- proper way to reclaim memory allocated to a chunk
is to first decrement recursively the reference
counts of ALL references contained in the chunk
and then return it to the free list - RECURSION tends to require an unpredictable
amount of stack space needs to be avoided - Three Techniques
- BEST is using pointer reversal which we'll see in
Section 5.2.4.3 - Adequate solution is "bounded stack" discussed in
problem 5.10(c) - Simple trick that alleviates the problem some but
is not a solution is Tail Recursion Elimination
11Recursively freeing chunks While Avoiding
Tail Recursion
Use repetition rather than recursion for the last
pointer in the chunk to be freed. This avoids
using the stack for freeing chunks that form a
linear list.
PROCEDURE Free recursively depending on reference
counts (Pointer) WHILE Pointer / No chunk
IF NOT Points into the heap (Pointer)
RETURN IF NOT Pointer reference count 0
RETURN FOR EACH Index IN 1 .. Pointer
.number of pointers 1 Free recursively
depending on reference counts
(Pointer .pointer Index) SET Aux
pointer to Pointer IF Pointer .number of
pointers 0 SET Pointer TO No chunk
ELSE Pointer .number of pointers gt 0
SET Pointer TO Pointer .pointer
Pointer .number of pointers Free
chunk(Aux pointer) // the actual freeing
operation
12On the other hand ... Serious Drawbacks
- 1) Reference Counting cannot reclaim cyclic data
structures - reference count of a was two and drops to one
when the pointer from the program data area is
released but the chunk cannot be reclaimed
result Memory Leaks
heap
d
a
1
1
c
f
1
1
13... more Serious Drawbacks
- 2) Reference Counting is inefficient due to the
amount of memory monitoring which it must do.
Other techniques don't require pointer monitoring
and reclaim garbage chunks only when needed. - 3) the final problem is memory fragmentation.
The free memory is reclaimed but remains
fragmented. One could do compaction in principle
but few do. - Still Reference Counting is a popular technique
for managing small numbers of data structures in
handwritten software (ex. UNIX kernal's recovery
of file descriptors.)
14Mark and Scan (also called Mark and Sweep)
- Mark and Scan garbage collection is very
effective. It frees all the memory that can be
freed. - When combined with compaction it also provides
the largest possible chunk of free memory - Two Phase Method 1) Marking Phase marks all
chunks that are still reachable, and 2) Scan
Phase -- scans the allocated memory and consider
free all chunks that are not marked reachable and
makes them available again
15Marking 1
- Based on two principles
- 1) Chunks reachable through the root set are
reachable, and - 2) Any chunk reachable from a pointer in a
reachable chunk is itself reachable - We are assuming that the root set resides in a
program data area or the topmost activation
record, and a data type description for it has
been constructed and made available by the
compiler - Chunks are marked reachable in a simple recursive
depth first search guaranteed to terminate due to
the fact that chunks are marked only once and
finite in number
16Marking 2
- Marking requires another bit to be added to the
chunk, the MARKED BIT in addition to the FREE BIT - Main Problem with a recursive process like this
is that it needs a stack of unknown size just at
a time when memory is limited (after all garbage
collection has been triggered) - Where to put the stack? is the question. A
clever and workable answer is to distribute the
required memory throughout the chunks - One pointer and a small counter are sufficient
- The pointer points back to the parent chunk
- The counter counts how many pointers have already
been processed in the present chunk
17Marking the third child of a chunk
- This technique costs room for one pointer, one
counter, and one bit per allocated chuck.
Whether or not this is a problem depends on the
average size of a chunk. We will see a way to
mark the directed graph of all reachable chunks
without using space for the extra pointer in a
brief discussion of the Schorr and Waite
algorithm.
18Scanning and freeing
- Freeing the Unreachable Chunks is now easy
- Using the lengths noted in the chunks we step
through memory from chunk to chunk checking if it
has been marked reachable (if reachable we clear
the marked bit and if not we set the free bit) - We can also do relatively easy Compaction
- we keep a pointer F to the first free chunk, find
and note its size and as long as we keep meeting
free blocks we just add up their size until we
run into a chunk that is in use. The we update
the administration of the chunk pointed to by F
to the total size of the chucks we've encountered - Repeat on encountering next free chunk
- Result of Mark and Scan is a heap in which ALL
chunks marked in use are reachable and each pair
of free chunks is separated by chunks in use.
This is as good as it gets unless you move chunks
in a Compaction Phase so that all free chunks are
merged into one large chunk.
19Pointer Reversal marking without using stack
space
- We can reduce the pointer overhead almost
entirely by using a method called Pointer
Reversal - Allows one to visit all the nodes of a directed
graph without using additional stack space for
the graph, also called Schorr and Waite algorithm
after its inventors (Schorr and Waite, 1967) - How it works? (the basic observation)
- When the marking algorithm is working on chunk C,
it finds the pointers in C, which point to the
children of C, and visits them one by one. If
marking algorithm has gone off to visit the nth
child of C, say D, when it returns from its visit
it can retain a pointer to D, the chunk it just
left which is the same pointer residing in C in
the nth pointer field. So while visiting the nth
child of C, the contents of the nth pointer field
in C is redundant - AHA!
20Pointer Reversal 2
- Schorr and Waite graph marking takes advantage of
this redundancy to store the pointer (not on the
stack) but .. - algorithm maintains two auxiliary pointers Parent
pointer and Chunk pointer - Chunk pointer points to the chunk being processed
- Parent pointer points to its parent
- Each chunk also contains a Counter Field which
records the number of the pointer in the chunk
that is being followed starting with 0 and
finishing when it has reached the total number of
pointers in the chunk
21Schorr and Waite Arriving at C
The Situation when the processing of chunk C
begins! We assume that the processing of
pointers in C proceeds until we reach the nth
pointer, pointing to D
22Moving to D
P
Parent pointer
nth pointer
Chunk pointer
To move to D we shift the contents of the Parent
pointer and the Chunk pointer, and the
nth pointer field in C circularly using a
temporary variable Old parent pointer
D
0
23the hoochey koochey ...
//C is pointed to by Chunk Pointer. SET Old
parent pointer TO Parent pointer SET Parent
pointer TO Chunk pointer //C is pointed to by
Parent pointer. SET Chunk pointer TO n-th pointer
field in C SET n-th pointer field in C TO Old
parent pointer
Results in return pointer to the parent P of
C being stored in the nth pointer of C which
normally points to D. Situation is now
equivalent to our first arrival at C except C is
now the parent and D is the chunk being
processed.
24Return to D
Shows the situation when we are about to return
from visiting D. Only difference is that the
counter in D has reached the number of pointers
in D. Now we have to circularly shift back the
pointers and increment the counter in C.
25Gett'in outta Dodge returning from D
// C is pointed to by Parent pointer. SET Old
parent pointer TO Parent pointer // C is pointed
to by Old parent pointer. SET Parent pointer TO
n-th pointer field in C SET n-th pointer field
in C TO Chunk pointer SET Chunk pointer TO Old
parent pointer // C is pointed to by Chunk
pointer.
P
C
pp
D
opp
Cn
C
The whole fancy footwork is then repeated for
n1-th pointer in C until all children of C have
been visited. Then we return from C
cp
26About to return from C
What we've seen is a clever technique for
avoiding a stack while visiting all the nodes
in a graph. To avoid looping the nodes need to
be marked at the beginning of a visit and if
marked not visited again.
27Next time ...
- We'll start looking at Chapter 6 which is the
last chapter we'll be looking at ... - Paradigm Specific Techniques
- Specifically IMPERATIVE and OBJECT ORIENTED
PROGRAMS - We'll be covering the material from
- Section 6.0 to 6.2.3.2 28 pages
- Section 6.2.4 to 6.2.10 17 pages, and
- Section 6.4 to 6.4.2.3 19 pages for a total of
64 pages
28Homework for Week 11
29References
- Text Modern Compiler Design
30(No Transcript)