CSC 3130: Automata theory and formal languages - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CSC 3130: Automata theory and formal languages

Description:

... java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996. Parsing java ... Simple java program: about 1000 symbols. Parsing algorithms. How ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 30
Provided by: andr52
Category:

less

Transcript and Presenter's Notes

Title: CSC 3130: Automata theory and formal languages


1
Fall 2009
The Chinese University of Hong Kong
CSC 3130 Automata theory and formal languages
Parsers for programming languages
Andrej Bogdanov http//www.cse.cuhk.edu.hk/andrej
b/csc3130
2
CFG of the java programming language
Identifier IDENTIFIER QualifiedIdentifier Ide
ntifier . Identifier Literal IntegerLiteral
FloatingPointLiteral CharacterLiteral
StringLiteral BooleanLiteral NullLiteral Ex
pression Expression1 AssignmentOperator
Expression1 AssignmentOperator -
/

from http//java.sun.com/docs/books/jls /second_e
dition/html/syntax.doc.html52996
3
Parsing java programs
class Point2d / The X and Y coordinates of
the point--instance variables / private
double x private double y private
boolean debug // A trick to help with
debugging public Point2d (double px, double
py) // Constructor x px y py debug
false // turn off debugging public
Point2d () // Default constructor this (0.0,
0.0) // Invokes 2 parameter
Point2D constructor // Note that a
this() invocation must be the BEGINNING of //
statement body of constructor public Point2d
(Point2d pt) // Another
consructor x pt.getX() y pt.getY()


Simple java program about 1000 symbols
4
Parsing algorithms
  • How long would it take to parse this?
  • Can we parse faster?
  • No! CYK is the fastest known general-purposeparsi
    ng algorithm

exhaustive algorithm
about 1080 years (longer than life of universe)
CYK algorithm
about 1 week!
5
Another way of thinking
Scientist Find an algorithm thatcan parse
strings inany grammar
Engineer Design your grammar so it has a very
fastparsing algorithm
6
An example
Stack
S ? Tc(1) T ? TA(2) A(3) A ? aTb(4) ab(5)
Input
Action
? a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S
abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc
c c c ? ?
shift shift reduce (5) reduce (3) shift shift shif
t reduce (5) reduce (3) shift reduce (4) reduce
(2) shift reduce (1)
input abaabbc
a
b
a
b
c
a
b
7
Items
S ? Tc(1)
T ? A(3)
T ? TA(2)
A ? aTb(4)
A ? ab(5)
A ? aTb A ? aTb A ? aTb A ? aTb
A ? ab A ? ab A ? ab
T ? A T ? A
S ? Tc S ? Tc S ? Tc
T ? TA T ? TA T ? TA
Stack
Input
Action
? a ab A T Ta
abaabbc baabbc aabbc aabbc aabbc abbc
shift shift reduce (5) reduce (3) shift shift

Idea of parsing algorithm Try to match complete
items to top of stack
8
Some terminology
Stack
S ? Tc(1) T ? TA(2) A(3) A ? aTb(4) ab(5)
Input
Action
? a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S
abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc
c c c ? ?
shift shift reduce (5) reduce (3) shift
shift shift reduce (5) reduce (3) shift reduce
(4) reduce (2) shift reduce (1)
input abaabbc
handle
valid items aTb, ab
valid items Ta, Tc, aTb
9
Outline of LR(0) parsing algorithm
  • As the string is being read, it is pushed on a
    stack
  • Algorithm keeps track of all valid items
  • Algorithm can perform two actions

no complete itemis valid
there is one valid item,and it is complete
shift
reduce
10
Running the algorithm
Input
Valid Items
Stack
A
? a aa aab aA aAb A
aabb abb bb b b ? ?
A ? aAb A ? ab A ? aAb A ? ab A ? aAb A
? ab A ? aAb A ? ab A ? aAb A ? ab A ?
ab A ? aAb A ? aAb
S S S R S R
A ? aAb ab
A ? aAb ? aabb
11
How to update valid items
  • Initial set of valid items
  • Updating valid items on shift b
  • After these updates, for every valid item A ?
    aCb andproduction C ? d, we also addas a
    valid item

S ? a
for every production S ? a
A ? abb
A ? abb
is updated to
A ? aXb
disappears if X ? b
a, b terminals A, B variables X, Y mixed
symbols a, b mixed strings
notation
C ? d
12
How to update valid items
  • Updating valid items on reduce b to B
  • First, we backtrack to valid items before reduce
  • Then, we apply same rules as for shift B (as if
    B were a terminal)

A ? aBb
is updated to
A ? aBb
disappears if X ? B
A ? aXb
C ? d
is added for every valid item A ? aCb and
production C ? d
13
Viable item updates by NFA
  • States of NFA will be items (plus a start state
    q0)
  • For every item S ? a we have a transition
  • For every item A ? ?X? we have a transition
  • For every item A ? aCb and production C ? d

e
q0
S ? a
X
A ? ?X?
A ? ?X?
e
C ? d
A ? ?C?
14
Example
A ? aAb ab
A ? aAb
A ? aAb
A ? aAb
A ? aAb
A ? ab
A ? ab
A ? ab
15
Convert NFA to DFA
a
2
A ? aAb A ? ab A ? aAb A ? ab
1
4
A
a
A ? aAb A? ab
A ? aAb
b
b
5
3
A ? ab
A ? aAb
die
states correspond to sets of valid
items transitions are labeled by variables /
terminals
16
Shift states and reduce states
a
2
A ? aAb A ? ab A ? aAb A ? ab
1
4
A
a
A ? aAb A? ab
A ? aAb
b
b
5
3
A ? ab
A ? aAb
are shift states
1
2
4
are reduce states
3
5
17
Attempt at parsing with DFA
Input
DFA state
Stack
A
? a aa aab aA
aabb abb bb b b
A ? aAb A ? ab A ? aAb A ? ab A ? aAb A
? ab A ? aAb A ? ab A ? aAb A ? ab A ?
ab A ? aAb
1 2 2 3 ?
S S S R
A ? aAb ab
A ? aAb ? aabb
18
Remember the state in stack!
Input
DFA state
Stack
A
1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A
aabb abb bb b b ? ?
A ? aAb A ? ab A ? aAb A ? ab A ? aAb A
? ab A ? aAb A ? ab A ? aAb A ? ab A ?
ab A ? aAb A ? aAb
1 2 2 3 4 5
S S S R S R
A ? aAb ab
A ? aAb ? aabb
19
Reconstructing the parse tree
Input
DFA state
Stack
A
1 12 122 1223 124 1245 1
aabb abb bb b b ? ?
A ? aAb A ? ab A ? aAb A ? ab A ? aAb A
? ab A ? aAb A ? ab A ? aAb A ? ab A ?
ab A ? aAb A ? aAb
1 2 2 3 4 5
S S S R S R
a
a
b
b





A ? aAb ab
A ? aAb ? aabb
20
LR(0) grammars and deterministic PDAs
  • The parsing procedure can be implemented by
    adeterministic pushdown automaton
  • A PDA is deterministic if in every state there is
    atmost one possible transition
  • for every input symbol and pop symbol, including
    e
  • Example PDA for wwR is deterministic, but PDA
    forwwR is not

21
LR(0) grammars and deterministic PDAs
  • Not every PDA can be made deterministic
  • Since PDAs are equivalent to CFGs, LR(0) parsing
    algorithm must fail for some CFG, e.g.
  • Why does LR(0) parsing algorithm fail?

L wwR w ? a, b
22
Example 1
L wwR w ? a, b
A ? aAa bAb e
23
Example 1
L wwR w ? a, b
A ? aAa bAb e
a, b
a
A ? aAa
A ? aAa
A ? bAb
A ? aAa
A ? aAa
a, b
A
A ? bAb
A ? aAa
A ? bAb
A ?
A ? bAb
A ?
b
A ? bAb
24
Example 1
L wwR w ? a, b
A ? aAa bAb e
a, b
a
A ? aAa
A ? aAa
A ? bAb
A ? aAa
A ? aAa
a, b
A
A ? bAb
A ? aAa
A ? bAb
A ?
A ? bAb
A ?
b
A ? bAb
input abba
25
When you cant LR(0) parse
  • Algorithm can perform two actions
  • What if

no complete itemis valid
there is one valid item,and it is complete
shift (S)
reduce (R)
some valid itemscomplete, some not
more than one validcomplete item
S / R conflict
R / R conflict
26
Example 2
L wwR w ? a, b
A ? aAa bAb
a
A
a
e
A ? aAa
A ? aAa
A ? aAa
A ? aAa
e
e
e
e

q0
A ?
A ?
e
e
e
b
A
b
A ? bAb
A ? bAb
A ? bAb
A ? bAb
e
27
Example 2
L wwR w ? a, b
A ? aAa bAb e
a, b
4
2
a
A ? aAa
1
A ? aAa
3
A ? bAb
A ? aAa
A ? aAa
a, b
A
A ? bAb
A ? aAa
A ? bAb
A ?
A ? bAb
5
A ?
b
A ? bAb


6
A ?
No S/R or R/R conflicts!
28
Example 2 parsing
State
Stack
A
1 12 122 1226 1223 12234 123 1236 1
1 2 2 6 3 4 3 5
S S S R S R S R
A
b
a
a
b







29
Hierarchy of context-free grammars
context-free grammars parse using CYK algorithm
(slow)
to be continued
Write a Comment
User Comments (0)
About PowerShow.com