Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 6 Class 1
- Section 3.2.2 to 3.2.2.3
- (245-253)
- Ray Schneider
2Topics of the Day
- Symbolic Interpretation
- Simple Symbolic Interpretation
- Full Symbolic Interpretation
3Symbolic interpretation
- Upon execution, control follows one possible path
through the control flow graph - Code executed at nodes represents the RUN-TIME
semantics of the node NOT the compile time
context relations. - Ex. attribute evaluation code is concerned with
the AST and distribution of information about the
syntax AT RUN TIME an if-statement is only about
a JMP to the THEN-part or ELSE part depending on
the state of the condition just computed
4Compile-Time versus Run-Time interests
If_statement (INH successor, SYN first)? 'IF'
Condition 'THEN' Then_part 'ELSE' Else_part 'END'
'IF' ATTRIBUTE RULES SET IF_statement
.first TO Condition .first SET Condition
.true successor TO Then_part .first SET
Condition .false successor TO Else_part .first
SET Then_part .successor TO If_statement
.successor SET Else_part .successor TO
If_statement .successor
AST
Run-time Path
5Run-time behavior depends on variables
- Much contextual information about variables can
be deduced statically at compile time by
simulating the run-time process using a technique
called - SYMBOLIC INTERPRETATION or
- SIMULATION ON THE STACK
- We attach a Stack Representation to each arrow in
the Control Flow Graph - Compile time representation of the run-time stack
holds an entry for each identifier visible at
that point in the program - Mostly interested in Variable and Constants
6Stack representations in CFG of an if-statement
true
condition y gt 0
x is initialized and y has the value 5
7Symbolically interpret an if statement
- two parameter provided a Stack Representation,
and an If node1) symbolically interprets the
condition (new stack with condition on top), 2)
condition unstacked and used to get stack
representation for end of then/else clause, 3)
routine merges stack representation
FUNCTION Symbolically interpret an if statement
( Stack representation, If node ) RETURNING a
stack representation SET New stack
representation TO Symbolically interpret a
condition ( Stack representation, If
node .condition ) Discard top entry
from New stack representation RETURN Merge
stack representations ( Symbolically
interpret a statement sequence ( New
stack representation, If node .then part
), Symbolically interpret a statement
sequence ( New stack representation, If
node .else part ) )
8reality
- example is oversimplified, more details are
needed in actual code - ex. a check for the presence of an ELSE part
since statement could be an IF...THEN only - might need to copy the stack representation to
pass a copy to each branch - Symbolic Interpretation was already in use in the
1960's for type checking in Algol 60 (Naur 1965) - Next we'll look at checking for uninitialized
variables using two variants of symbolic
interpretation - SIMPLE Symbolic Interpretation (works for
structured programs and simple properties)
L-attributes - FULL Symbolic Interpretation (more general)
works for general attribute grammars and
L-attributed grammars.
9Simple Symbolic Interpretation 1
- Checking for the use of uninitialized variables
- make a compile time representation of the local
stack of a routine (possibly including its
parameter stack) - follow representation through the entire routine
- (representation can be a linked list of name,
property pairs called a PROPERTY LIST) - List starts empty or initialized with
parameters with properties - IN parameters Initialized
- INOUT parameters Initialized
- OUT parameters Uninitialized
- ALSO A RETURN LIST which combines the stack
representations as found at return statements and
routine exit
(247)
10Simple Symbolic Interpretation 2
- Then we follow the arrows of the control flow
graph updating our list as we go - ex. At each node we do the appropriate thing
- meet a DECLARATION add declared name to list
with its status, (initialized or uninitialized) - IF FLOW OF CONTROL SPLITS a copy of the list
follows each path, when the paths combine the
lists are merged (generally merger is obvious
except when a variable gets a value in one path
and not the other then label it May be
initialized) - ASSIGNMENTset left variable to initialized and
check variables on right side in the expression
issuing an error if they are uninitialized or a
warning if they May Be Initialized - ROUTINE CALL different run-time stack (not our
problem) but ought to check parameters - FOR pass through bounds and loop initialization
of control variable and make a copy of the list
LOOP-EXIT LIST
11Execution of a For_Loop
v
v
F T
v
from
F
from to
v
For_statement
Body
vcontrol variable
body exec 0 times
v
E F T
from to
v
End_for
E' F T
v
E F T
from to
from to
12Get'in Out'ta Dodge
- When we find an exit_loop inside a loop we merge
the list we've collected with the loop-exit list
and continue with the empty list. (same with
return statements and end of routine) - If we find an exit_loop outside a loop we give an
error. - When all stack representations have been computed
check return list to see if all OUT parameters
have been defined. Error if not! - Check all variables at the end of routine and any
that are not initialized then give a warning
13Extending the system
- once we have a system of symbolic interpretation
in place it is relatively easy to extend - extending the tracking variable, constant, field
selector, etc. - extend beyond status (ex. initialized) to value,
range, set of possible values, a technique called
CONSTANT PROPAGATION - 2 Purposes 1) id variable used as constants, 2)
get a tighter grip on tests in for- and while-
loops, or last-def analysis (3.2.2.3)
int i 0 while (some condition) if (igt0)
printf("Loop reentered i d\n",i) i
//fig 3.46
If we try to eliminate the printf( ) because we
know i0 so the the first test fails we rather
obviously have concluded too much.
14What kind of properties allow simple symbolic
interpretation to WORK?
- Four Requirements for it to work
- 1. Program must consist of flow-of-control
structures with one entry point and one exit
point only (Structured Programming) - 2. Values of the property must form a lattice,
i.e. values can be ordered in a sequence v1..vn
such that no operation can transform vj to vi
where iltj we will write viltvj for all iltj. - 3. Result of merging two values must be at least
as large as the smaller of the two values - 4. Action taken on vi in a given situation must
make any action taken on vj in that same
situation superflueous, for viltvj
15Why? 1
- 1. Program must consist of flow-of-control
structures with one entry point and one exit
point only (Structured Programming) - this allows each control structure to be treated
in isolation with the property being analyzed
well-defined at both the entry point and the exit
point of the structure - Other three requirement allow us to ignore jump
back to the beginning of the looping control
structures
16Why? 2
- vin is property at entrance to the loop and vout
is property at loop exist - (2) guarantees that vin?vout
- (3) guarantees that when we merge vout after
first pass through the loop with vin to obtain
the value vnew at the start of the second loop
vnew?vin - from (4) it follows that we don't need to perform
a second scan or to consider the jump back - consider the initialization example the values v1
Uninitialized, v2 May be initialized, and v3
Initialized fulfills the property since
initialization can only progress left to right as
a property - If the four requirements are not satisfied then
FULL SYMBOLIC INTERPRETATION is necessary.
17Full Symbolic Interpretation
- Full Symbolic Interpretation consists of
performing Simple Symbolic Interpretation
repeatedly until no more changes in the values of
the properties occur (a closure algorithm) - GOTO statements cannot be handled by Simple
Symbolic Interpretation (Requirement 1 is
violated) - For each label in the routine we need a separate
list. They start off empty but every time we
jump to the label we merge the present list with
the label list and continue with an empty list
but we're not done since the next pass might
modify the list so we repeat until nothing
changes. Then we can be certain we have found
all paths by which a variable can be
uninitialized at a particular label - Closure Algorithm and have to postpone actions on
initialization status until everything is known - Simplegt single pass Fullgt multipass
18Full Symbolic Interpretation as a Closure
Algorithm
- Data Definitions Stack representations, with
entries for every item we are interested in. - Initializations
- 1. Empty stack representations are attached to
all arrows in the control flow graph residing in
the threaded AST. - 2. Some stack representations at strategic points
are initialized in accordance with properties of
the source language ex. the stack representation
of input parameters are initialized to
Initialized - Inference rules For each node type, source
language dependent rules allow inferences to be
made, adding information to the stack
representation on the outgoing arrows based on
those on the incoming arrows and the node itself,
and vice versa.
19So where are we? ...
- Full symbolic interpretation removes almost all
the requirements listed in the previous section
but IT DOESN'T SOLVE ALL THE PROBLEMS - ex. No guarantee that the algorithm terminates
- open ended algorithms like fig 3.46
- The complete set of possible values of a variable
cannot be determined at compile time in all cases - Approximation 1) a set of at most two values, or
2) any value - a little like the counting of primitive tribes 1,
2, 3, many - Read section 3.2.2.3 on Last-Def analysis
- involves collecting all the places where a
variable is defined (used for code generation and
in particular register allocation) also called
REACHING-DEFINITIONS ANALYSIS
20Homework for Week 8
- Bison Familiarization
- Read the entire 39 pages of "A Compact Guide To
Lex and Yacc" // you can skim through it the
first time - THEN concentrate first on getting the lex example
on page 10 running - THEN after you have that running go on to
Practice, Part 1 and strive to get the primitive
calculator running (pages 14 through 17)
21References
- Text Modern Compiler Design