Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 6 Class 2
- Section 3.2.3 to section 3.2.3.2
- (253-260)
- Ray Schneider
2Topics of the Day
- Data Flow Equations
- Setting them up
- Solving them
3Data-flow Equations
- a half-way automation of full symbolic
interpretation - Stack Representation is replaced by a Collection
of Sets - the Semantics of the Node is described more
formally - Interpretation is replaced by a built-in and
fixed propagation mechanism - two set variables are associated with each node N
in the control flow graph (both start off empty) - the input set IN(N), and
- the output set OUT(N)
- they replace the stack representation and are
computed by the propagation mechanism
4Node(s) of the Control Flow Graph
- for each Node
- IN(N) input set
- OUT(N) output set
- Input / Output sets contain static information
about the run-time situation at the node - Variable X is equal to 1 here
- There has been no remote procedure call in any
path from the routine entry to here - Definitions for the variable y reach here from
nodes N1 and N2 - Global variable line_count has been modified
since routine entry - GEN(N) contains items added by the node
- KILL(N) contains items removed by the node N
5Interpretation mechanism is missing so ...
- nodes that modify the stack size are not handled
easily in setting up the data flow equations - ex. nodes occurring in expressions like '' which
will remove two entries from the stack and push
one entry back on. - in practice this is dealt with by combining
groups of control flow nodes such that there is
no net stack effect - ex. for data-flow equation purposes this entire
set of control flow nodes is considered a single
node, with one IN, OUT, GEN, and KILL set.
6Setting up the data-flow equations
IN(N)Mdynamic? predecessor of N
OUT(M) OUT(N)(IN(N)\KILL(N)) ? GEN(N)
- actual data-flow equations are the same for all
nodes - information at the ENTRANCE of a node N is equal
to the union of the information at the exit of
all dynamic predecessors to N - obviously true since no information is lost going
from Node to Node - information at the EXIT of a node N is in
principle equal to that at the entrance, except
that all the information in the KILL set has been
removed from it and all the information in the
GEN set has been added to it. (The order of
removing and adding is important first the
information being invalidated is removed, then
the new information is added.)
7example
- Arrive at a node xy with the IN set Variable
x in equal to 0 here. - THEN the KILL set of the node contains the item
Variable x is equal to here and the GEN set
contains Variable x equals y here - 1) all items in the IN set that are also in the
KILL set are erased, i.e. Variable x is equal to
here, subsumes Variable x equals 0 here, so
that item is erased - 2) next items from the GEN set are added,
- 3) so the OUT set is Variable x equals y here
8Interpreting the Data-flow Equations
- While the operators for set union ? and set
difference \ are used they really apply as
information union and information difference
operators - sometimes behave as ordinary set union and
difference and can be implemented with binary,
Boolean representations (say bit sets), ex.
Variable V may be unintialized here set union,
(i.e. some predecessor node may be
uninitialized) or ex. Variable V is guaranteed
to have a value here set intersection (i.e. all
predecessor nodes must have a value) - sometimes the information is more complicated,
ex. Variable x has a value in the range M to N,
requiring ad hoc code be designed and written
that knows how to create, merge and examine such
ranges
9Third Data-flow Equation
- Zeroth Data Flow Equation
- Defines the IN set of the first node of the
routine as the set of information items
established by the parameters of the routine - in particular each IN and INOUT parameter gives
rise to an item 'Parameter Pi has a value here'
IN all value parameters have values
KILL all local information
10Solving the Data-flow equations (Closure)
First Data Flow equation tells us how to obtain
the IN set of all nodes when we know the OUT sets
of all nodes. Second Data Flow equation tells us
how to obtain the OUT set of a node if we know
its IN set (and its GEN and KILL sets, but they
are constants).
Closure Algorithm for Solving the Data-Flow
Equations Data definitions 1. Constant KILL
and GEN sets for each node. 2. Variable IN and
OUT sets for each node. Initializations 1. The
IN set of the top node is initialized with
information established externally 2. For all
other nodes N, IN(N) and OUT(N) are set to
empty. Inference rules
IN(N)Mdynamic? predecessor of N
OUT(M) OUT(N)(IN(N)\KILL(N)) ? GEN(N)
11Implementation of the Closure Algorithm
- implemented by traversing the control graph
repeatedly computing IN and OUT sets of the nodes
visited until a complete traversal of the Control
Flow Graph produces no further change - Now we're ready to use the information for
context checking and code generation. - NOTE predecessors of a node are easy to find if
the Control Flow Graph is doubly linked as shown
earlier
12Trivalent Logic for initialization of variables
Note 11 may or may not have a value 10
definitely does not have a value 01 definitely
has a value 00 an error
x is guaranteed to have a value y may or may not
have a value
x is guaranteed not to have a value the
combination of 00 for y is an error
13if ygt0 then xy else y0 end if
x y
Note 11 may or may not have a value 10
definitely does not have a value 01 definitely
has a value 00 an error
14Summing Up a Little
- Generally we visit all the nodes in Control Flow
Graph order this is not necessary but is
generally logical and convenient - The data-flow algorithm in itself only collect
information it does not checking and does not
generate error messages or warnings - Additional traversals are needed to use the
information - ex. checking for uninitialized variables
15Flex and Bison
- Lex/Flex as we have seen previously is a program
generator for lexical processing of character
input streams - It accepts a high-level description for character
string matching and produces a program which
recognizes regular expressions - Lex written code recognizes the expressions in
the input - The Lex source file associates regular
expressions and program fragments provided by the
user which are executed as each expression
appears in the input
16General Format of Lex
ex. \t \t printf(" ") /
this lex input causes lex to ignore sequences
of 1 or more blanks or tabs up to the end of
line and for blanks not followed by the end
of line it will substitute a single blank /
- definitions
-
- rules
-
- user subroutines
User definitions and user subroutines are often
omitted
17Uses of Lex
- It can be used alone for simple transformations
of files, or for analysis and statistics
gathering at the lexical level - Lex generates lexical analyzers that are easy to
interface with Yacc/Bison - Lex programs recognize only regular expressions
- Yacc writes parsers that accept a large class of
context free grammars, but requires a lower level
analyzer to recognize the input tokens
18Combining Lex and Yacc
- Lex is used to partition the input stream and
Yacc (the parser generator) assigns structure to
the remaining pieces
Lexical Rules
Grammar Rules
Yacc
Lex
yyparse
yylex
Parsed Input
Input
Note all Yacc variables begin with 'yy' so you
can avoid collisions with the user generated code.
19Yacc Specifications
- Generally the Lexical Analyzer (ex. yylex.c) is
included as part of the Yacc Specification File - The full Yacc Specification File looks like
- declarations
-
- rules
-
- programs
- Where have we seen this before? Structure is
similar to Lex input, but what goes in the
sections is different.
20Grammar Rules and Actions
Grammar Rules
Smallest legal Yacc Specification is rules
Grammar Rules look like A BODY where A
a non-terminal name BODY a sequence of zero or
more names and literals, and the and
are Yacc punctuation.
Actions Associated with Rules
With each grammar rule, the user may associate
actions to be performed each time the rule is
recognized in the input process. An action is
specified by one or more statements enclosed in
curly braces '' and ''
21Examples
- A '(' B ')' hello(1,"abc")
- XXX YYY ZZZ printf("a message \n")
- flag25
- To facilitate easy communication between the
actions and the parser, Yacc uses the special ''
symbol. '' is a pseudo-variable for the left
hand side of the grammar rule, and 1, 2, etc.
are pseudovariables for the elements of the rhs - A B C D 2 has the value returned by C
etc. - default is 1, the value of the first element
22How the parser works
- Yacc turns the specification file into a C
program which parses the input according to the
specification given. - The parser that is produced consists of a Finite
State Machine with a stack with a look ahead
token. The current state is the one on top of
the stack. - The machine has only four actions shift, reduce,
accept and error - We'll LEAVE YACC THERE You need to read about it
so you can use it.
23Homework for Week 8
- Bison Familiarization
- Read the entire 39 pages of "A Compact Guide To
Lex and Yacc" // you can skim through it the
first time - THEN concentrate first on getting the lex example
on page 10 running - THEN after you have that running go on to
Practice, Part 1 and strive to get the primitive
calculator running (pages 14 through 17)
24References
- Text Modern Compiler Design Figures
- Lex A Lexical Analyzer Generator by M.E. Lesk
and E. Schmidt - Yacc Yet Another Compiler-Compiler by Stephen C.
Johnson - see http//dinosaur.compilertools.net/yacc/index.h
tml and http//dinosaur.compilertools.net/lex/in
dex.html