Type Compatibility - PowerPoint PPT Presentation

1 / 139
About This Presentation
Title:

Type Compatibility

Description:

'different names, different meanings' Type Compatibility ... If Ada used strict type name compatibility, could not assign count to index or vice-versa ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 140
Provided by: WakeF
Category:

less

Transcript and Presenter's Notes

Title: Type Compatibility


1
Type Compatibility
  • Another Type Checking issue What are equivalent
    types?
  • Need to be concerned with how type is named
    and/or defined
  • Equivalence by name gt name type compatibility
  • Equivalence through definition structure type
    compatibility

2
Type Compatibility
  • Name Type Compatibility
  • Two variables have compatible types only if they
    are defined in declarations that use the same
    type name.
  • Easy to implement
  • Checking of type bindings stored in symbol table
  • Safe
  • different names, different meanings

3
Type Compatibility
  • Strict name type compatibility is very
    restrictive
  • The following is an Ada declaration
  • type Indextype is 1..100
  • count Integer
  • index Indextype
  • If Ada used strict type name compatibility,
    could not assign count to index or vice-versa

4
Type Compatibility
  • Name type compatibility clearly makes sense in
    one direction
  • index count should be illegal (as Indextype is
    a subset of the integers)
  • count index should be alright, so we may need
    to augment name type compatibility to allow this
    to happen

5
Type Compatibility
  • Structure type compatibility
  • Two variables have compatible types only if their
    types have identical structures
  • Usually, for aggregates, requires that grouping
    type is composed of same types in same order
  • Same memory layout
  • More flexible than name compatibility, but
    difficult to implement.
  • Implementation
  • Instead of just checking names in the symbol
    table, the entire structure of two types must be
    compared.

6
Type Bindings
  • Type binding also provides us with information on
    how much memory is required for a variable
  • Primitive types specific number of bytes
  • int x gt 4 bytes
  • Aggregate types sum over (requirements from
    grouping multiple primitives or other aggregates)
  • int x100 gt 400 bytes

7
Memory Layouts
Note layout of structs Support additive
movement through structs, arrays
8
Type Compatibility
  • Checking type structures
  • Arrays?
  • Same data-type, same length?
  • Same data-type, same length, different array
    indices?
  • Same data-type, different length?
  • Self-referencing types?
  • struct LinkedListNode int data LinkedListNode
    next

9
Type Compatibility
  • Checking type structures
  • Are these two types compatible?
  • struct person string name int age
  • struct vehicle string name int numberOfTires
  • Are these two types compatible?
  • struct PersonType1 int age string name
  • struct PersonType2 string name int age
  • How do you differentiate?
  • struct person string name int age
  • struct vehicle string name int age
  • Different names generally mean desire different
    abstractions maybe structure compatibility is
    too flexible?

10
Type Compatibility
  • Languages you are using
  • In general, Java and C use name type
    compatibility
  • Plain old vanilla C, for all types except structs
    and unions, uses structural type compatibility

11
C Type Compatibility
  • In most cases, plain, original C uses structural
    type compatibility
  • Every struct and union declaration creates a new
    type incompatible with other types, even if they
    are structurally the same.
  • Unless structs and unions declared in separate
    files do use structural type equivalence.
  • Any type defined with typedef is equivalent to
    its parent type (as typedef is just providing an
    alias).

12
Example of C structural compatibility vs C name
compatibility
Same program written in C, C enums are
constructions over integers (each enum item maps
to an integer) C structural Theyre both ints!
Feel free to assign away! C name two
different things (Suits and Colors)
Differences arise when compiled using respective
compilers
13
Coercion/Casting
  • Most modern languages support automatic coercion
    between types
  • Coercion Automatic translation of a type to
    another type to fulfill type checking
    requirements
  • A more specific type is translated to the more
    general type
  • int i 2 double v i
  • Truck t new Truck() Automobile A t
  • Guaranteed that operations defined on more
    general type are defined on more specific type,
    as all items of specific type are also items of
    general type
  • Doubles are a larger set of numbers than
    integers.
  • Automobiles are a larger set of vehicles than
    Trucks.

14
Coercion/Casting
  • Think of coercion as automated casting.
  • Casting Programmer specified translation of
    types
  • Translation from more general to more specific
  • double c 2.5 int i (int)c
  • Truck t (Truck)myVector.elementAt(0)
  • Java containers store Objects, the super-class of
    everything (most general)

15
Coercion/Casting
  • Moving to a more general type widening
    conversion
  • Widening is almost always safe (not losing
    information), so usually performed automatically.
  • Moving to a more specific type narrowing
    conversion
  • Narrowing can lose information (preciseness of
    number 2.5 as a double vs 2 as an int), so
    usually requires programmer request.

See pg. 323 (Section 7.4) for more info on this
topic
16
Coercion/Casting
  • Liskov Substitution Principle
  • I view it as an argument for why automatic
    coercion is safe
  • Let q(x) be a property provable about objects x
    of type T. Then q(y) should be true for objects y
    of type S where S is a subtype of T
  • From Wikipedia based on the idea of
    substitutability that is, if S is a subtype of
    T, then objects of type T in a program may be
    replaced with objects of type S without altering
    any of the desirable properties of that program
    (e.g. correctness)

http//en.wikipedia.org/wiki/Liskov_substitution_p
rinciple
17
Implementation of casting/coercion
Casting instructions are hardware based a good
thing.
18
Semantic checks
Why is this check required? Can it raise runtime
errors?
19
Reinterpret casts
  • Reinterpret
  • Static
  • Dynamic
  • Used in subtype
  • cast like Java,
  • does runtime check

20
Coercion/Casting
  • While C/C/Java do support automatic coercion,
    not all languages do?
  • An example?
  • ML

21
Location Bindings
  • Location binding Binding a variable to a
    particular location in memory
  • Also referred to as storage binding
  • Two parts of the process, with familiar terms
  • Allocation Reserving a block from a pool of
    available memory
  • Deallocation Returning a bound block to the pool
    of available memory

22
Pools of Memory
  • Two areas where memory can be allocated from
  • Stack
  • Heap
  • Stack is allocated in a
  • strict order
  • Heap is much more
  • like a free pool of memory
  • Hope they dont collide with
  • each other

Heap
Memory Allocated To Program
Stack
Another example (Windows related) http//www.nosta
rch.com/download/greatcode2_ch8.pdf
23
Lifetime
  • The period for which a variable is bound to a
    particular memory address is its lifetime.
  • There are four common storage/lifetime bindings
    for variables
  • Static Explicit heap-dynamic
  • Stack-dynamic Implicit heap-dynamic

24
Static Variables
  • Static variables
  • Bound to memory cells before execution, released
    at termination
  • Compile time binding of storage
  • Placed in a spot than can be accessed directly,
    rest of code can ask for that spot specifically
  • Commonly allocated from (around) the heap area
  • Used for global variables in languages that
    support globals, and for true static variables in
    C (where value is persistent across function
    calls)

25
Static Variables
  • Fast to use no runtime overhead to
    create/delete, fixed direct addressing
  • But, if have only static variables in your
    language, cant support recursion
  • In recursion, same variables are used multiple
    times, and previous uses are still important and
    must be maintained

26
Stack Dynamic Variables
  • Run time binding of storage
  • Allocated when declaration is encountered
  • Deallocated when move out of scope of declaration
    (no longer visible)
  • Types are already statically bound via
    compilation
  • Used for allocation of local variables in
    subprograms
  • Usually all local variables, even if declared in
    middle of function, are allocated space at start
    of function call
  • Deallocation when subprogram terminates
  • Allocated from stack part of memory
  • Well talk about stack management more later
    (Chapter 10)

27
Stack Dynamic Variables
  • Important for recursion
  • Allows each recursive call to allocate memory
    from the stack for the variable instances in that
    particular call to the subprogram
  • Disadvantages
  • Overhead of allocation, deallocation at runtime
    (every method call)
  • Not that expensive though compiler can ask for
    a whole chunk as it can precompute amount it
    needs to ask for (again wait until Chapter 10)
  • Require indirect addressing (relative position in
    stack)
  • Dont know where your method is being put on the
    stack until method starts
  • Does not allow history sensitive variables like
    static does

28
Explicit Heap-Dynamic Variables
  • Allocated, deallocated explicitly by the
    programmer via special instructions
  • Referenced through pointer/reference variables
  • Indirect addressing (2 memory accesses)
  • Run time binding of storage

29
Explicit Heap-Dynamic Variables
  • C - new and delete statements, can use on all
    types (scalars, aggregates)
  • int intnode
  • intnode new int
  • delete intnode
  • Java every object (instance of a class)
  • PrintWriter pw new PrintWriter()
  • // no delete for Java well see this again
    (Chapter 6)

30
Explicit Heap-Dynamic Variables
  • Advantages
  • Useful for dynamic structures (linked lists,
    trees) that can adapt to programs data
    requirements
  • Disadvantages
  • Multiple memory accesses for pointers
  • Difficulty of programming with pointers correctly
  • Heap management well come back to this, not
    trivial (Chapter 6)

31
Implicit Heap-Dynamic Variables
  • Bound to heap storage only when assigned values
  • Essentially, these are dynamically typed
    variables, where all features (type, value,
    location) are bound upon assignment
  • Javascript example again
  • list 10.2, 3.5
  • list 47 gt reallocated storage?
  • list 10.2, 3.5, 28.2 gt reallocation
    (bigger)
  • While flexible, suffers from the usual dynamic
    binding problems discussed earlier, as well as
    heap management problems mentioned on previous
    slide

32
Examples
33
Constants
  • Constants are interesting, two key ways of
    implementing
  • Placement in special read only memory
  • Compiler verification wont allow changes after
    constant defined
  • Any guess on what C does?

34
ExampleConstant
35
Pointers
  • Pointer definition
  • A data object (variable) whose value is
  • The memory location of another data object, or
  • Null, a general term for a pointer to nowhere
  • Pointers, when available, are at the same level
    as other types
  • Can declare pointer variables
  • Can hold pointers in an array
  • Can have a pointer as part of an aggregrate
    data-structure (ListNode for example)

36
Pointer Specifications
  • Attributes
  • Pointer variable name
  • Type of data being pointed to
  • Double myDoublePointer
  • Truck myTruckObjectPointer
  • Values pointer can take on
  • Any addressable memory address
  • Usually any integer
  • (64 bit architectures?)

37
Pointer Specifications
  • Operations Declaration
  • Sets up space for pointer
  • Could come from stack stack-dynamic
  • (if pointer is a local variable)
  • Could come from heap heap-dynamic
  • (if requested at runtime)

38
Pointer Specifications
  • Operations Assignment with object creation
  • A very common use of pointers is when objects and
    variable are heap-dynamic.
  • C Syntax
  • double myDoublePtr new double
  • RHS requests allocation of a fixed size data
    object
  • The return value from the new statement is the
    address of the data object just created.
  • Address is stored in pointer variable

Pointer
2012
2000
2004
2.345
2012 2020
Actual Data
39
Pointer Specifications
  • Operations Assignment with object creation
  • double myDoublePtr new double
  • Data objects created on the RHS are anonymous
  • Not bound to a name in the program
  • Thus, they can be lost as well
  • Creation and assignment can be performed at
    anytime during program execution

40
Review Question
  • Imagine this is a part of a method you are
    writing
  • List l new List()
  • Which part(s) are stack dynamic and which are
    heap dynamic?
  • Which parts(s) are allocated at method startup
    and which are allocated at statement execution
    time?

41
Pointer Specifications
  • Operations Dereferencing
  • The operation that allows the data referenced by
    the pointer to be accessed
  • In C/C, this uses the (asterisk) operator
  • Syntactic Sugar As a shortcut to a field/method
    of an aggregate datastructure, can use -gt (arrow)
    operator
  • cout ltlt myDoublePointer ltlt endl
  • cout ltlt (myAutomobilePointer).getYear() ltlt endl
  • cout ltlt myAutomobilePointer-gtgetYear() ltlt endl

42
Pointer Specifications
  • Operations Assignment with addressing
  • Pointers can be used to reference variables that
    arent created using new
  • To obtain address of an arbitrary variable, use
    addressing operator.
  • C/C - use (ampersand)
  • int myIntVariable 5
  • int myIntPointer myIntVariable
  • myIntPointer myIntPointer 1

43
Pointer Specifications
  • Operations Arithmetic
  • Mathematical operations that work directly on the
    pointer
  • int ptr new int
  • ptr ptr 1
  • Changes the value of the pointer, not what is
    being referenced
  • Doesnt necessarily update address by one byte
    (as it syntactically would appear)

44
Pointer Specifications
  • Pointers reference a particular type
  • i.e. an integer pointer references an item 4
    bytes long
  • All fixed size data structures (primitives,
    classes, structs, unions, etc) the compiler can
    figure out the size beforehand
  • Pointer arithmetic moves the pointer up by the
    size of the data item being pointed to
  • (ie it moves completely over that item)

45
Example
46
Pointer Specifications
  • Many systems implement arrays using pointers
  • int list5 int ptr list
  • // int list new int5 (dynamic allocation)
  • cout ltlt ptr ltlt endl //print list0
  • cout ltlt (ptr 1) ltlt endl //print list1
  • //print first 5 items
  • int i 0 while (i lt 5) cout ltlt (ptr) ltlt
    endl i
  • void updateArray(int list) // array parameter

47
Review Question
  • What is the difference in outputs between these
    two sets of code?
  • int list new int3 list0 24 list1
    33 list2 52
  • for (int i 0 i lt 3 i)
  • list
  • i
  • // print list next
  • int list new int3 list0 24 list1
    33 list2 52
  • for (int i 0 i lt 3 i)
  • (list)
  • i
  • // print list next

48
Pointer Specifications
49
Pointers with Arrays
  • Are there advantages?

50
Pointer specifications
  • Operations Object deletion through pointer
  • Free up memory from dynamically allocated objects
    by calling delete on pointer to object
  • double myDoublePtr new double
  • delete myDoublePtr
  • Truck myTruckPtr new Truck()
  • delete myTruckPtr
  • int myIntegerArray new int10
  • delete myIntegerArray

51
Object deletion example
52
Pointer Specifications
  • Most languages specify that a pointer points to a
    particular type
  • Would it be reasonable for another use of
    pointers to be allowed to point to any type?
  • What would this require for the language to do?

53
Pointer Specifications
  • Object oriented languages allow a third technique
    for pointers
  • Can point to any type that is a subtype of the
    original pointer type
  • Doesnt hold for primitive type-subtype
    double/int relationships

What does this mean the system has to do
for pointers to objects? Runtime
bindings Dynamic method calls?
54
void pointers
  • In C/C, can use void pointers
  • Syntax for dealing with pointers where referenced
    type is unimportant
  • Used in malloc, free (old school memory
    allocation)
  • Used in functions where want arbitrary types to
    be accepted
  • Have to cast to appropriate type pointer before
    dereferencing or doing arithmetic

55
void pointers Examples
  • stdlib.h defines qsort (quicksort) as
  • void qsort(void base, size_t num_elements,
    size_t element_size, int (compare)(void const
    a, void const b))
  • Takes an array of any type of object you just
    have to make sure you send it how to compare
    those two objects.
  • Compare must return lt 0 if a lt b 0 if a b gt 0
    if a gtb

56
void pointers Examples
57
Function Pointers
  • C/C also allow you to use function pointers
  • / function returning pointer to int /
  • int func(int a, float b)
  • / pointer to function returning int /
  • int (func)(int a, float b)

58
Function Pointers
  • void qsort(void base, size_t num_elements,
    size_t element_size, int (compare)(void const
    a, void const b))
  • qsort(studentArray, numberOfStudents,
    sizeof(Student), compareByName)
  • Array names, function names converted to
    addresses

59
Function pointers
60
(No Transcript)
61
Problems with Pointers
  • Dangling Pointer
  • When a pointer continues to hold the memory
    address of a heap-allocated variable that has
    been deallocated.
  • Why is this a problem?
  • Memory pointed to could now be in use by another
    variable.

62
Dangling Pointers
  • New variable in old memory spot may have
    different type than previous, almost definitely
    new value
  • Type checking is going to OK everything oh,
    hes adding two integers through integer pointers
    thats fine
  • But, meaning of underlying data may be different
  • Writing into that spot could mess up other
    variable

63
Dangling Pointers
  • Code which results in dangling pointers?
  • int p1 new int
  • int p2 p1
  • delete p1
  • Some languages will set p1 to 0, but wont touch
    p2 (not my version of C though it doesnt
    even clean up p1).

64
Problems with Pointers
  • Lost heap dynamic variables
  • When the address of the variable being pointed to
    is lost
  • Why is this a problem?
  • Prevents further use of the variable (dont know
    how to get to it)
  • Cant delete and reuse that part of memory
    (already allocated)

65
Lost Heap Dynamic Variables
  • How do you end up with lost heap dynamic
    variables?
  • int p1 new int
  • p1 new int
  • Is there a common name for this?
  • Commonly referred to as memory leak
  • See memory eaten away

66
Solutions to Pointer Problems
  • Ways to work around dangling pointers
  • 1 Dont allow the user control (Java)
  • Requires system-controlled memory management,
    garbage collection
  • 2 Safety algorithms
  • These dont prevent the problem, but prevent the
    user from fiddling with memory they shouldnt by
    throwing an error

67
Safety Algorithms
  • If you trust the programmer, dangling pointers
    can be resolved if
  • A programmer always sets all pointers to a
    variable to null after the variable is
    de-allocated
  • A system is likely to only set one to zero (the
    one through which the deletion occurred), rest
    are up to the programmer
  • Letting the system do it all large overhead of
    recording who is pointing to whom

68
Safety Algorithms
  • A way to work around dangling pointers
  • Tombstones Each heap-dynamic variable, when
    allocated, is also given another memory location
    called the tombstone.
  • This tombstone memory location is a pointer to
    the variable.
  • All user defined pointers to the variable
    actually get the address of the tombstone

69
Tombstones
Pointer to Data
  • Old way
  • With tombstone

Data
Pointer to Data
Pointer to Data
Data
Tombstone
Pointer to Data
70
Tombstones
  • When variable is de-allocated, the tombstone
    remains and is set to null.
  • Only one pointer has to be set to zero (the
    tombstone)
  • All of the pointers that were pointing to the
    variable all hit the tombstone zero value
  • If a reference is made through any of those
    pointers, they refer to a zero address, which is
    an error

71
Tombstones
  • Tombstones work around the dangling pointer
    problem at the expense of
  • An extra 4 bytes per variable allocated from the
    heap
  • Those extra 4 bytes generally cant be
    deallocated.
  • An extra memory access (another layer of
    indirection) is required every time the variable
    is used
  • Not found in any popular modern languages
  • Maybe it should be, with loads of fast memory
    available

72
Lock and Key for Dangling Pointers
  • When a variable is allocated off the heap
  • Allocate storage for the structure
  • Allocate a memory cell for an integer which holds
    a lock value
  • Return pointer to the variable as a pair
    (integer key, integer address)
  • Key value in pointer is set to lock value

73
Lock and Key
  • When a pointer is copied (with an assignment
    statement),
  • copy the key and address

From initial new (allocate) statement
Pointer to Data
Key
Data
Lock
Pointer to Data
Key
From making a copy of the original pointer
74
Lock and Key
  • When a pointer is de-referenced
  • Verify the key stored with the pointer matches
    the lock out in memory next to the item being
    pointed to.
  • If the lock and key dont match, throw an error
  • When would the lock and key not match?

75
Lock and Key
  • When a variable from the heap is de-allocated,
    set the lock to an illegal key value
  • Overhead
  • An integer comparison to check lock and key
  • Extra space to hold the lock which cant be
    de-allocated
  • Extra space to hold the key in each pointer
  • Implemented in versions of Pascal

76
Heap Management
  • Heap
  • Portion of computer memory where space for
    dynamically allocated variables is taken from.
  • Varying levels at which programmer can interact
    with heap.
  • Java everything is handled for you
  • C - new, delete allow programmer to ask for
    memory, but the system still controls how the
    heap is managed

77
Heap Management
  • Look at two different implementations
  • Heap as a group of fixed, single size cells
  • More likely to be seen when language system is
    requesting from heap Implicit heap-dynamic
    languages
  • Heap as a group of variable sized segments
  • Required to support programmer requests (array of
    arbitrary size of contiguous memory Explicit
    heap-dynamic languages
  • Two primary uses
  • Obtaining memory (allocation) from heap
  • Returning memory (de-allocation) to heap

78
Heaps Single Sized Cells
  • Define a cell as a unit that contains space for
    item of interest and a pointer
  • Often implemented as a circular linked list of
    cells
  • Available heap often called the the free list

Data Goes Here
79
Heaps Single Sized Cells
  • Allocation
  • Remove cell from front of free list
  • De-allocation
  • Attach released cells to the front of the free
    list

Updated free list after allocation
Free list before allocation
80
Heaps Variable Sized Cells
  • More applicable to most programming languages
    needs
  • General approach
  • Have AvailableStart pointer initially point to
    a single cell that is sized to be all of
    available free memory.
  • Allocation When a request is made
  • If the cell at the front of the list is large
    enough, break the cell into two pieces, one being
    the requested size and the other being everything
    else
  • For a while, this technique will work fine
  • If front cell isnt large enough, what to do?
    lets look at deallocation first

81
Heaps Variable Sized Cells
  • Deallocation
  • Reclaimed, variable sized cells are added to onto
    the list
  • May check to see if directly adjacent neighbors
    can be coalesced together with these cells OR we
    might wait to do this only until we need to
  • Allocation
  • If front cell isnt large enough, try the next
    free block(s) on the list until find one that is
    large enough.

82
Heaps Variable Sized Cells
  • This approach does entail list overhead
  • Requires searching through lists to check and see
    if there is a block of appropriate size available
  • May hit a point where only have lots of small
    blocks sitting around
  • Requires adjoining small blocks that were from
    adjacent parts of memory back together
  • Any over-allocations also waste space
  • Does this sound familiar? (CSC 241)
  • Internal/external fragmentation

83
Heaps Variable Sized Cells
  • Implementation Questions
  • Do you
  • Take the first block that is big enough to handle
    the request (first-fit)
  • Look for the best fit block, which could
    require looking at every block?
  • Costly, tends to leave small leftover blocks
  • Do you keep the list of different size blocks in
    sorted order by size?

84
Heaps Allocation
  • Costs of heap search are based on number of items
    in heap (linear)
  • Some languages maintain heaps for different size
    requests why?
  • Searching through a smaller list!
  • Move broken off chunks of a large allocation onto
    smaller lists

85
Heaps Compaction
  • Compaction
  • Moving items that are already allocated in memory
    to different locations
  • Can free up larger chunks of contiguous space
  • Costly requires updating all pointers pointing
    to a particular spot in memory
  • Tombstones? Only need to update the tombstones!

86
Heap Management
  • Approach 1 Reference Counters
  • Eager Approach Incremental reclamation as soon
    as free cells are made free
  • Requirements
  • In every cell, additional space has to be
    reserved to hold an integer
  • The integer, the reference count, holds the
    number of pointers pointing to the cell
  • If the reference count ever hits zero, the cells
    can be returned to the free list.

87
Reference Counters
  • The initial allocation of memory and assignment
    of the returned pointer sets the reference count
    to 1
  • Reference count management involves overhead
    adding code to pointer operations to ensure
    counts are updated
  • Whenever a pointer is connected to the variable,
    including via a copy, the reference count is
    incremented
  • Whenever a pointer is disconnected from a
    variable, the reference count is decremented
  • Disconnects explicit re-assigment, local stack
    variable disappearance, pointer inside an object
    being cleaned up

88
Reference Counters
  • Reference counter example

null
1
1
1
Remove ListHead pointer Block 1 ref count goes to
0 Return block 1 to free list Block 2 ref count
goes to 0 Return block 2 to free list Block 3 ref
count goes to 0 Return block 3 to free list
ListHead
89
Reference Counters
  • Reference counters can help work around dangling
    pointers
  • Even if user calls free through one pointer, the
    reference counter will see that there are other
    pointers directed towards the data
  • Forces programmer to assign all pointers
    elsewhere (to 0?) and then call free before free
    actually works, disposing of data

90
Reference Counters
  • Reference counter concerns
  • Additional instruction overhead for reference
    management (previous slide)
  • In some languages (LISP) nearly every instruction
    causes the system to change pointers around
  • Increased memory usage
  • Reference counter on each item allocated
  • Handling circularly connected cells?

91
Reference Counters
  • Circularly connected cells
  • Every cell in the list has a reference counter of
    at least 1. When can you delete because of their
    own circular references?

Without circular link, setting ListHead pointer
to null would cause a cascade of cleanups With
circular link, sits there with ref counts of
1 There are alternatives, but not as intuitive to
program
ListHead
92
Reference Counters
  • Reference counts can also help us implement
    dangling pointer protection provides a mean for
    removing tombstones
  • If all pointers to tombstones have been moved
    elsewhere, the tombstone can be freed.

93
Garbage Collection
  • Garbage Collection Periodic process
  • Garbage accumulates, and is cleaned up at regular
    intervals or as necessary
  • Remember, ref counting was incremental cleaned
    up as soon as possible
  • A garbage accumulator has to examine the heap,
    find anything allocated but not actively being
    used, and free up that memory.

94
Garbage Collection
  • Every heap cell has an extra bit or field
    (indicator) that is exploited by the garbage
    collector
  • 3 phase collection process
  • Initialize
  • Trace and Mark
  • Sweep and Clean

95
Garbage Collection
  • Initialize Every cell in the heap is marked as
    garbage in its indicator field
  • Trace and Mark A trace from every active pointer
    in the program is made to see if a cell is
    reachable from a valid pointer. If so, the
    indicator is set to not garbage.
  • Active needs a definition
  • Sweep and Clean Return to the free list any
    cells still marked as garbage.

96
Garbage Collection
  • An element is active if it is
  • Referenced by a pointer on the function call
    stack
  • Referenced by a pointer from another active part
    of the heap

97
Garbage Collection
References from function call stack
References from inside of objects
Basic view of results of mark and trace gc
algorithm for Java (circa 1998) http//java.sun.co
m/developer/technicalArticles/ALT/RefObj/
98
Garbage Collection
  • GC costs depend on
  • Total size of heap memory
  • Initialization
  • Sweep and clean
  • Number of active pointers
  • Trace and Mark

99
Garbage Collection
  • Within a process, GC is often implemented as a
    thread
  • Stops other parts of program from executing when
    it uses CPU
  • Should not be interrupted itself
  • If the GC is interrupted, the whole process
    should be restarted as the other code executed
    may have made changes to memory (which GC worked
    hard to gather statistics on).

Why?
Often times, when GC runs, your Java program
hiccups
100
An Application of Garbage Collection Java
  • Original versions of Java used Trace and Mark on
    a large heap
  • Now allows generation collection
  • Exploit common properties of programs and object
    lifetimes.

http//java.sun.com/docs/hotspot/gc5.0/gc_tuning_5
.html
101
An Application of Garbage Collection Java
  • Java uses multiple generations to store objects
  • New objects are stored in the Young generation
  • Object that exist long enough are moved to the
    Tenured generation
  • Young and tenured are GCed when they fill up.
  • Exploits infant mortality Many objects are
    deleted soon after being allocated
  • Finally, some objects, known to exist through the
    whole program are in the Permanent generation
    which never needs to be garbage collected.

Have we seen this general idea before?
Decomposing your work area into smaller pieces?
102
Garbage Collection
  • Copying GC
  • Separate heap into two large blocks
  • block A/block B
  • Initially all data allocated from blockA
  • When block A fills up
  • it is labeled as block B
  • Copy all directly (fc stack-)pointed to items
    from block B into block A
  • Copy all items pointed to by items in block A
    to block A
  • Allocate from new block A

103
Garbage Collection
  • Implicitly doing marking (as not-garbage) by
    moving
  • Costs Two large blocks of memory reserved for
    each programs heap, one of which is essentially
    empty
  • Benefits
  • No per-object bloat for mark tag
  • No separate clean-up phase
  • Automatic compaction

104
GC Recursion
  • Is recursion an issue for tracing garbage
    collection ?
  • If its a recursive heap object not pointed to
    actively, it will never get set to not garbage
    and will thus get cleaned up.
  • Can realize if youve already marked something
    not garbage, so shouldnt get stuck in a loop
    with that marking.

105
Structured Data Types
  • Array (vector) data type
  • Fixed number of components
  • Declared by user with size (C arrays A10), or
    lower, upper bound (Pascal A-5,5)
  • Homogeneous in type
  • Declared by user
  • Allocated linearly in memory
  • Managed by system
  • Big question how is component access
    implemented?

106
Structured Data Types
  • Work under the assumption user can specify lower
    bound (such as -5, or 1, or 10)
  • Zero as lower bound is just an instance of this
    assumption, with some nice properties
  • General formula
  • address(AI) base (I-LB) E

107
Structured Data Types
  • address(AI) base ((I-LB) E)
  • base starting location of array
  • - Could be on stack or in heap
  • I index of interest
  • L lower bound on indices
  • E size of an element

108
Structured Data Types
  • address(AI) base ((I-LB) E)
  • Assume indices are -3, 3
  • Holds doubles (8 bytes each)
  • Base is 00032
  • Calling A1
  • A1 is actually the 5th element because starts
    at -3
  • Address 00032 ((1 (-3)) 8)
  • 00032 (48) 00064

00032 -3
00040 -2
00048 -1
00056 0
00064 1
00072 2
00080 3
109
Structured Data Types
  • address(AI) alpha (I-LB) E
  • is equivalent to
  • address(AI) alpha (LB E) (IE)
  • Immediately after allocation, could compute
    (alpha (LB E)) once and re-use
  • Use as a base, offset is index size Called
    virtual origin (where A0 would lie)
  • A0 might not even be valid for accessing!

110
Structured Data Types
  • C/C
  • Implementation
  • of subscripting

111
Structured Data Types
  • C/C
  • Implementation
  • of subscripting

Direct addressing
Offset addressing -24(ebp) is base eax holds I
(index)
112
Structured Data Types
  • Multi-dimensional arrays
  • Generalization of single-dimensional (standard)
    arrays
  • Declaration syntax requires size or upper lower
    bounds for each dimension
  • Accessing a single element requires a subscript
    entry for each dimension
  • Accessing a subarray requires entry for only
    partial set of dimensions, but need to specify
    contiguously and start in first dimension

113
Structured Data Types
  • Multidimensional arrays
  • Memory itself is linear, so map n-dimensional
    into linear format
  • Two major memory layouts
  • Row major
  • Column major

Example 3x3 2-D array
114
Structured Data Types
  • Row Major Order

115
Structured Data Types
  • Column Major Order

Of major languages, only Fortran uses
column-major order
116
Structured Data Types
  • Statically allocated arrays in C True
    contiguous layout

117
Structured Data Types
  • Dynamically allocated in C Have to also hold
    pointer references

Pointers to data Actual data
There are some gaps here 29c?
118
Structured Data Types
  • Why is knowing order of multi-dimensional arrays
    important?
  • If using pointer operations, what does pointer
    arithmetic get you (over 1 or down 1)?
  • Imagine you need to perform some operation on
    each element in the array, order of work on
    element unimportant to results
  • Accessing elements in order that language stores
    elements is typically more efficient data
    locality
  • Paging for large arrays?
  • Cache loading?

119
Structured Data Types
  • Virtual memory Your program may only have a
    (few) page(s) of memory allocated to it
  • Other pages are stored on disk until needed,
    brought in with others swapped out
  • Large arrays may fill multiple pages.
  • Sequential access is likely to stay in the same
    page.

120
Structured Data Types
for (i 0 i lt 3 i) for (k 0 k lt 3
k) cout ltlt dataik
Page 1
Page 3
Page 2
Page 1
Page 3
Page 2
for (int i 0 i lt 3 i) for (k 0 k lt 3
k) cout ltlt dataki
121
Structured Data Types
  • Address in a multi-dimensional array
  • For 2D, general approach is
  • Declaration A-5,50,4 gt
  • LB1 -5 UB1 5 LB2 0 UB2 4
  • address(AIJ) alpha (I-LB1) S (J-LB2)
    E
  • S is the size of a row
  • E is the size of an element
  • How do we come up with S?
  • ((UB2-LB21) E)
  • (number of columns in a row size of column)

122
Structured Data Types
  • Row Major Order

Declared -1,1-1,1 A00? Holds value 2 S
1-(-1) 1 4 34 gt 12 (each 12 is a
new row) A00 100 (0-(-1)) 12
(0-(-1) 4) gt 100 12 4 116
Generalizes to higher dimensions, have to take
into account size of lower dimension point, row,
plane, cube,
123
Structured Data Types
  • Can still use same virtual origin trick to cut
    out some repeated computations
  • For 1-d array alpha (LB E)
  • For 2-d array alpha (LB1 S) - (LB2 E)
  • Verification For array on previous slide, LB1 is
    -1, LB2 is -1, S 12, E is 4
  • Alpha (-112) - (-1 4) gt alpha 16

124
Back to stack management
  • Stack management a few final details
  • Storing old ebp, eip
  • Return values
  • Debugging support
  • Optimization

125
Back to the stack subprograms
  • Activation Records
  • Stores the data state of the subprogram as its
    executing
  • Created each time subprogram is called
  • Removed each time subprogram completes
  • Note
  • 1 instance of subprogram code segment
  • Multiple instances of subprogram activation
    records

126
Pools of Memory
Address 0
Heap
Intel Architecture
Memory Allocated To Program
ESP
EBP
Executing
Not Used
EIP
Main, Others
Code
Address 2048
127
Subprograms
  • At a subprogram call, need to store
  • Old EIP
  • Which instruction to return back to
  • Old EBP
  • Where on the stack to return back to
  • Implementation
  • These are themselves stored directly on the stack

128
Subprograms
  • CALL instruction pushes old EIP on stack
    (implicitly), changes ESP
  • PUSHL EBP pushes old base pointer onto top of
    stack, modifying ESP
  • Updated ESP (top of stack) is set as new EBP
    (base for next function)

129
Subprograms
  • Leaving a function
  • Reset ESP no longer need local variables
  • Reset EBP point back to caller method entries on
    stack
  • Reset EIP point back to next instruction in
    caller

130
Subprograms
  • Returning from a subprogram call
  • LEAVE is a macro for
  • mov ebp, esp
  • Copy whats at current base pointer into stack
    pointer
  • pop ebp
  • Item on top of stack put in ebp
  • esp drops another 4
  • RET pops old EIP off stack, jumps back to that
    instruction

Essentially undoes all actions from function
setup
131
Subprograms
  • Note that stack memory not actually scrubbed
    between function calls
  • No re-initialization (setting to all zeros?)
  • Just a simple pointer replacement (esp,ebp)
  • That parts no longer in use
  • Can lead to some tricky debugging situations

132
Subprograms
  • Second function call laid right on top of
    previous call
  • All variables line up in the same place
  • Nothing seems strange at all!

133
Subprograms
  • With this approach
  • Have a stack of activation records.
  • Each AR can reference one below it.
  • There are instruction pointers nestled below each
    function call (the line from the calling
    function)
  • Exactly this type of information that GDB/Java
    exception handling uses to allow you to trace
    program execution viewing function call stack

134
Subprograms
135
Subprograms
  • Return value
  • For simple return values, compiler chooses a
    register and says
  • Callee should store the return value in that
    register
  • Caller should look in that register when using
    return value.

Share eax for return value
136
Subprograms
  • Complicated Return Types

Returning a struct of two ints 2 registers
used
137
Subprograms
  • Sometimes, the overhead of a function call is
    more than the cost of the instructions themselves
  • Overhead instructions to manage function call
  • Saving and resetting state, parameter passing,
    return values
  • Inline functions
  • Can tag functions with inline statement
  • Prompts compiler to use copy rule to replace
    call to function with the function code itself
  • In C, up to compiler to decide on whether or
    not it actually implements this (you can suggest,
    but it makes final decision)

138
Subprograms
139
Left not inlined Right inlined Required g
-O2
Write a Comment
User Comments (0)
About PowerShow.com