COS 320 Compilers - PowerPoint PPT Presentation

About This Presentation
Title:

COS 320 Compilers

Description:

COS 320 Compilers David Walker The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Syntax Analysis: Create abstract syntax tree from ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 70
Provided by: csPrincet
Category:
Tags: cos | compilers

less

Transcript and Presenter's Notes

Title: COS 320 Compilers


1
COS 320Compilers
  • David Walker

2
The Front End
  • Lexical Analysis Create sequence of tokens from
    characters (Chap 2)
  • Syntax Analysis Create abstract syntax tree from
    sequence of tokens (Chap 3)
  • Type Checking Check program for well-formedness
    constraints

stream of characters
stream of tokens
abstract syntax
Lexer
Parser
Type Checker
3
Parsing with CFGs
  • Context-free grammars are (often) given by BNF
    expressions (Backus-Naur Form)
  • Appel Chap 3.1
  • More powerful than regular expressions
  • Matching parens
  • Nested comments
  • wait, we could do nested comments with ML-LEX!
  • CFGs are good for describing the overall
    syntactic structure of programs.

4
Context-Free Grammars
  • Context-free grammars consist of
  • Set of symbols
  • terminals that denotes token types
  • non-terminals that denotes a set of strings
  • Start symbol
  • Rules
  • left-hand side non-terminal
  • right-hand side terminals and/or non-terminals
  • rules explain how to rewrite non-terminals
    (beginning with start symbol) into terminals

symbol symbol symbol ... symbol
5
Context-Free Grammars
  • A string is in the language of the CFG if only if
    it is possible to derive that string using the
    following non-deterministic procedure
  • begin with the start symbol
  • while any non-terminals exist, pick a
    non-terminal and rewrite it using a rule
  • stop when all you have left are terminals (and
    check you arrived at the string your were hoping
    to)
  • Parsing is the process of checking that a string
    is in the CFG for your programming language. It
    is usually coupled with creating an abstract
    syntax tree.

6
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

Elist E Elist Elist , E
E ID E NUM E E E E ( S , Elist
)
S S S S ID E S PRINT ( Elist )
7
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
ID NUM PRINT ( NUM )
8
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S ID NUM PRINT ( NUM )
9
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S ID E ID NUM PRINT ( NUM )
10
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S ID E ID NUM PRINT ( NUM )
oops, cant make progress
11
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S ID NUM PRINT ( NUM )
12
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S S S ID NUM PRINT ( NUM )
13
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S S S ID E S ID NUM PRINT ( NUM )
14
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
Derive me!
S S S ID E S ID NUM S ID NUM PRINT
( Elist ) ID NUM PRINT ( E ) ID NUM PRINT
( NUM )
15
  • non-terminals S, E, Elist
  • terminals ID, NUM, PRINT, , , (, ),
  • rules

8. Elist E 9. Elist Elist , E
4. E ID 5. E NUM 6. E E E 7. E
( S , Elist )
1. S S S 2. S ID E 3. S PRINT
( Elist )
S S S ID E S ID NUM S ID NUM PRINT
( Elist ) ID NUM PRINT ( E ) ID NUM PRINT
( NUM )
S S S S PRINT ( Elist ) S PRINT ( E ) S
PRINT ( NUM ) ID E PRINT ( NUM ) ID NUM
PRINT ( NUM )
Another way to derive the same string
left-most derivation
right-most derivation
16
Parse Trees
  • Representing derivations as trees
  • useful in compilers Parse trees correspond
    quite closely (but not exactly) with abstract
    syntax trees were trying to generate
  • difference abstract syntax vs concrete (parse)
    syntax
  • each internal node is labeled with a non-terminal
  • each leaf note is labeled with a terminal
  • each use of a rule in a derivation explains how
    to generate children in the parse tree from the
    parents

17
Parse Trees
  • Example

S S S ID E S ID NUM S ID NUM PRINT
( Elist ) ID NUM PRINT ( E ) ID NUM PRINT
( NUM )
S
S
S

E

L
)
(
ID
PRINT
E
NUM
NUM
18
Parse Trees
  • Example 2 derivations, but 1 tree

S S S ID E S ID NUM S ID NUM PRINT
( Elist ) ID NUM PRINT ( E ) ID NUM PRINT
( NUM )
S
S
S

E

L
)
(
ID
PRINT
S S S S PRINT ( Elist ) S PRINT ( E ) S
PRINT ( NUM ) ID E PRINT ( NUM ) ID NUM
PRINT ( NUM )
E
NUM
NUM
19
Parse Trees
  • parse trees have meaning.
  • order of children, nesting of subtrees is
    significant

S
S
S
S
S

S

E

L
)
(
ID
L
)
(
PRINT
PRINT
E

ID
E
E
NUM
NUM
NUM
NUM
20
Ambiguous Grammars
  • a grammar is ambiguous if the same sequence of
    tokens can give rise to two or more parse trees

21
Ambiguous Grammars
characters 4 5 6 tokens NUM(4)
PLUS NUM(5) MULT NUM(6)
E
non-terminals E terminals ID NUM PLUS
MULT E ID NUM E E
E E
E
E

E
E

NUM(4)
NUM(6)
NUM(5)
I like using this notation where I avoid
repeating E
22
Ambiguous Grammars
characters 4 5 6 tokens NUM(4)
PLUS NUM(5) MULT NUM(6)
E
non-terminals E terminals ID NUM PLUS
MULT E ID NUM E E
E E
E
E

E
E

NUM(4)
NUM(6)
NUM(5)
E
E

E
E
E

NUM(6)
NUM(5)
NUM(4)
23
Ambiguous Grammars
  • problem compilers use parse trees to interpret
    the meaning of parsed expressions
  • different parse trees have different meanings
  • eg (4 5) 6 is not 4 (5 6)
  • languages with ambiguous grammars are DISASTROUS
    The meaning of programs isnt well-defined! You
    cant tell what your program might do!
  • solution rewrite grammar to eliminate ambiguity
  • fold precedence rules into grammar to
    disambiguate
  • fold associativity rules into grammar to
    disambiguate
  • other tricks as well

24
Building Parsers
  • In theory classes, you might have learned about
    general mechanisms for parsing all CFGs
  • algorithms for parsing all CFGs are expensive
  • to compile 1/10/100 million-line applications,
    compilers must be fast.
  • even for 10 thousand-line apps, speed is nice
  • sometimes 1/3 of compilation time is spent in
    parsing
  • compiler writers have developed specialized
    algorithms for parsing the kinds of CFGs that you
    need to build effective programming languages
  • LL(k), LR(k) grammars can be parsed.

25
Recursive Descent Parsing
  • Recursive Descent Parsing (Appel Chap 3.2)
  • aka predictive parsing top-down parsing
  • simple, efficient
  • can be coded by hand in ML quickly
  • parses many, but not all CFGs
  • parses LL(1) grammars
  • Left-to-right parse Leftmost-derivation 1
    symbol lookahead
  • key ideas
  • one recursive function for each non terminal
  • each production becomes one clause in the function

26
non-terminals S, E, L terminals NUM, IF,
THEN, ELSE, BEGIN, END, PRINT, , rules
1. S IF E THEN S ELSE S 2. BEGIN S
L 3. PRINT E
4. L END 5. S L 6. E NUM
NUM
27
non-terminals S, E, L terminals NUM, IF,
THEN, ELSE, BEGIN, END, PRINT, , rules
1. S IF E THEN S ELSE S 2. BEGIN S
L 3. PRINT E
4. L END 5. S L 6. E NUM
NUM
Step 1 Represent the tokens
datatype token NUM IF THEN ELSE BEGIN
END PRINT SEMI EQ
Step 2 build infrastructure for reading tokens
from lexing stream
val tok ref (getToken ()) fun advance () tok
getToken () fun eat t if (! tok t) then
advance () else error ()
28
non-terminals S, E, L terminals NUM, IF,
THEN, ELSE, BEGIN, END, PRINT, , rules
1. S IF E THEN S ELSE S 2. BEGIN S
L 3. PRINT E
4. L END 5. S L 6. E NUM
NUM
Step 1 Represent the tokens
datatype token NUM IF THEN ELSE BEGIN
END PRINT SEMI EQ
Step 2 build infrastructure for reading tokens
from lexing stream
val tok ref (getToken ()) fun advance () tok
getToken () fun eat t if (! tok t) then
advance () else error ()
29
non-terminals S, E, L terminals NUM, IF,
THEN, ELSE, BEGIN, END, PRINT, , rules
1. S IF E THEN S ELSE S 2. BEGIN S
L 3. PRINT E
4. L END 5. S L 6. E NUM
NUM
val tok ref (getToken ()) fun advance () tok
getToken () fun eat t if (! tok t) then
advance () else error ()
datatype token NUM IF THEN ELSE BEGIN
END PRINT SEMI EQ
Step 3 write parser gt one function per
non-terminal one clause per rule
fun S () case !tok of IF gt eat
IF E () eat THEN S () eat ELSE S ()
BEGIN gt eat BEGIN S () L () PRINT gt
eat PRINT E () and L () case !tok of END
gt eat END SEMI gt eat SEMI S ()
L () and E () eat NUM eat EQ eat NUM
30
non-terminals A, S, E, L rules
1. A S EOF 2. ID E 3.
PRINT ( L )
4. E ID 5. NUM 6. L E 7.
L , E
fun A () S () eat EOF and S () case !tok
of ID gt eat ID eat ASSIGN E
() PRINT gt eat PRINT eat LPAREN L ()
eat RPAREN and E () case !tok of ID
gt eat ID NUM gt eat NUM and L
() case !tok of ID gt ???
NUM gt ???
31
problem
  • predictive parsing only works for grammars where
    the first terminal symbol of each self-expression
    provides enough information to choose which
    production to use
  • LL(1)
  • if !tok ID, the parser cannot determine which
    production to use

6. L E (E could be ID) 7.
L , E (L could be E could be ID)
32
solution
  • eliminate left-recursion
  • rewrite the grammar so it parses the same
    language but the rules are different

A S EOF ID E PRINT ( L
) E ID NUM
A S EOF ID E PRINT ( L
) E ID NUM
L E M M , E M
L E L , E
33
eliminating left-recursion in general
  • Original grammar form
  • Transformed grammar

X base X X repeat
Strings base repeat repeat ...
X base Xnew Xnew repeat Xnew Xnew
Strings base repeat repeat ...
34
Recursive Descent Parsing
  • Unfortunately, left factoring doesnt always work
  • Questions
  • how do we know when we can parse grammars using
    recursive descent?
  • Is there an algorithm for generating such parsers
    automatically?

35
Constructing RD Parsers
  • To construct an RD parser, we need to know what
    rule to apply when
  • we have seen a non terminal X
  • we see the next terminal a in input
  • We apply rule X s when
  • a is the first symbol that can be generated by
    string s, OR
  • s reduces to the empty string (is nullable) and a
    is the first symbol in any string that can follow
    X

36
Constructing RD Parsers
  • To construct an RD parser, we need to know what
    rule to apply when
  • we have seen a non terminal X
  • we see the next terminal a in input
  • We apply rule X s when
  • a is the first symbol that can be generated by
    string s, OR
  • s reduces to the empty string (is nullable) and a
    is the first symbol in any string that can follow
    X

37
Constructing Predictive Parsers
1. Y 2. bb
5. Z d
3. X c 4. Y Z
next terminal
rule
non-terminal seen
X c
X b
X d
38
Constructing Predictive Parsers
1. Y 2. bb
5. Z d
3. X c 4. Y Z
next terminal
rule
non-terminal seen
X c 3
X b
X d
39
Constructing Predictive Parsers
1. Y 2. bb
5. Z d
3. X c 4. Y Z
next terminal
rule
non-terminal seen
X c 3
X b 4
X d
40
Constructing Predictive Parsers
1. Y 2. bb
5. Z d
3. X c 4. Y Z
next terminal
rule
non-terminal seen
X c 3
X b 4
X d 4
41
Constricting Predictive Parsers
  • in general, must compute
  • for each production X s, must determine if s
    can derive the empty string.
  • if yes, X ? Nullable
  • for each production X s, must determine the
    set of all first terminals Q derivable from s
  • Q ? First(X)
  • for each non terminal X, determine all terminals
    symbols Q that immediately follow X
  • Q ? Follow(X)

42
Iterative Analysis
  • Many compilers algorithms are iterative
    techniques.
  • Iterative analysis applies when
  • must compute a set of objects with some property
    P
  • P is defined inductively. ie, there are
  • base cases objects o1, o2 obviously have
    property P
  • inductive cases if certain objects (o3, o4)
    have property P, this implies other objects (f
    o3 f o4) have property P
  • The number of objects in the set is finite
  • or we can represent infinite collections using
    some finite notation we can find effective
    termination conditions

43
Iterative Analysis
  • general form
  • initialize set S with base cases
  • applied inductive rules over and over until you
    reach a fixed point
  • a fixed point is a set that does not change when
    you apply an inductive rule
  • Nullable, First and Follow sets can be determined
    through iteration
  • many program optimizations use iteration
  • worst-case complexity is bad
  • average-case complexity is good iteration
    usually terminates in a couple of rounds

44
Computing Nullable Sets
  • Non-terminal X is Nullable only if the following
    constraints are satisfied (computed using
    iterative analysis)
  • base case
  • if (X ) then X is Nullable
  • inductive case
  • if (X ABC...) and A, B, C, ... are all
    Nullable then X is Nullable

45
Computing First Sets
  • First(X) is computed iteratively
  • base case
  • if T is a terminal symbol then First (T) T
  • inductive case
  • if X is a non-terminal and (X ABC...) then
  • First (X) First (X) U First (ABC...)
  • where First(ABC...) F1 U F2 U F3 U ... and
  • F1 First (A)
  • F2 First (B), if A is Nullable
  • F3 First (C), if A is Nullable B is Nullable
  • ...

46
Computing Follow Sets
  • Follow(X) is computed iteratively
  • base case
  • initially, we assume nothing in particular
    follows X
  • (Follow (X) is initially )
  • inductive case
  • if (Y s1 X s2) for any strings s1, s2 then
  • Follow (X) First (s2) U Follow (X)
  • if (Y s1 X s2) for any strings s1, s2 then
  • Follow (X) Follow(Y) U Follow (X), if s2 is
    Nullable

47
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z
Y
X
48
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no
Y yes
X no
base case
49
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no
Y yes
X no
after one round of induction, we realize we have
reached a fixed point
50
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d
Y yes c
X no a,b
base case
51
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
after one round of induction, no fixed point
52
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
after two rounds of induction, no more changes
gt fixed point
53
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
base case
54
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
after one round of induction, no fixed point
55
building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
after two rounds of induction, fixed point (but
notice, computing Follow(X) before Follow (Y)
would have required 3rd round)
56
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z
Y
X
57
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ
Y
X
58
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y
X
59
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y c
X
60
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X
61
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
  • if T ? First(s) then
  • enter (X s) in row X, col T
  • if s is Nullable and T ? Follow(X)
  • enter (X s) in row X, col T

Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
62
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
What are the blanks?
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
63
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
What are the blanks? --gt syntax errors
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
64
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
Is it possible to put 2 grammar rules in the same
box?
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
65
Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d Z d e
Y c Y
X a X b Y e
Is it possible to put 2 grammar rules in the same
box?
a b c d e
Z Z XYZ Z XYZ Z d Z d e
Y Y Y Y c Y Y
X X a X b Y e
66
predictive parsing tables
  • if a predictive parsing table constructed this
    way contains no duplicate entries, the grammar is
    called LL(1)
  • Left-to-right parse, Left-most derivation, 1
    symbol lookahead
  • if not, of the grammar is not LL(1)
  • in LL(k) parsing table, columns include every
    k-length sequence of terminals

aa ab ba bb ac ca ...

67
another trick
  • Previously, we saw that grammars with
    left-recursion were problematic, but could be
    transformed into LL(1) in some cases
  • the example non-LL(1) grammar we just saw
  • how do we fix it?

Z X Y Z Z d Z d e
Y c Y
X a X b Y e
68
another trick
  • Previously, we saw that grammars with
    left-recursion were problematic, but could be
    transformed into LL(1) in some cases
  • the example non-LL(1) grammar we just saw
  • solution here is left-factoring

Z X Y Z Z d Z d e
Y c Y
X a X b Y e
Z X Y Z Z d W
Y c Y
X a X b Y e
W W e
69
summary
  • CFGs are good at specifying programming language
    structure
  • parsing general CFGs is expensive so we define
    parsers for simple classes of CFG
  • LL(k), LR(k)
  • we can build a recursive descent parser for LL(k)
    grammars by
  • computing nullable, first and follow sets
  • constructing a parse table from the sets
  • checking for duplicate entries, which indicates
    failure
  • creating an ML program from the parse table
  • if parser construction fails we can
  • rewrite the grammar (left factoring, eliminating
    left recursion) and try again
  • try to build a parser using some other method
Write a Comment
User Comments (0)
About PowerShow.com