Title: Bottom Up Parsing
1Bottom Up Parsing
2Parsing Techniques
- Top-down parsers (LL(1), recursive descent)
- Start at the root of the parse tree from the
start symbol and grow toward leaves (similar to a
derivation) - Pick a production and try to match the input
- Bad pick ? may need to backtrack
- Some grammars are backtrack-free (predictive
parsing) - Bottom-up parsers (LR(1), operator
precedence) - Start at the leaves and grow toward root
- We can think of the process as reducing the input
string to the start symbol - At each reduction step a particular substring
matching the right-side of a production is
replaced by the symbol on the left-side of the
production - Bottom-up parsers handle a large class of grammars
3Bottom-up Parsing
- A general style of bottom-up syntax analysis,
known as shift-reduce parsing. - Bottom-up parsing is also known as shift-reduce
parsing because its two main actions are shift
and reduce. - At each shift action, the current symbol in the
input string is pushed to a stack. - At each reduction step, the symbols at the top of
the stack (this symbol sequence is the right side
of a production) will replaced by the
non-terminal at the left side of that production. - There are also two more actions accept and
error.
4Shift-Reduce Parsing
- A shift-reduce parser tries to reduce the given
input string into the starting symbol. - a string ? the starting symbol
- reduced to
- At each reduction step, a substring of the input
matching to the right side of a production rule
is replaced by the non-terminal at the left side
of that production rule. - If the substring is chosen correctly, the right
most derivation of that string is created in the
reverse order. - Rightmost Derivation S ? ?
- Shift-Reduce Parser finds ? ? ... ? S
-
-
5Bottom Up Parsing
- Shift-Reduce Parsing
- Reduce a string to the start symbol of the
grammar. - At every step a particular sub-string is matched
(in left-to-right fashion) to the right side of
some production and then it is substituted by the
non-terminal in the left hand side of the
production.
Reverse order
abbcde aAbcde aAde aABe S
Consider S ? aABe A
? Abc b B ? d
Rightmost Derivation S ? aABe ? aAde ? aAbcde
? abbcde
6Handles
- Handle of a string Substring that matches the
RHS of some production AND whose reduction to the
non-terminal on the LHS is a step along the
reverse of some rightmost derivation. - A handle of a right sentential form ?? (? ???)
is a production rule A ? ? and a position of ? - where the string ? may be found and replaced by
A to produce the previous right-sentential form
in a rightmost derivation of ?. - S ? ?A? ? ???
- i.e. A ? ? is a handle of ??? at the location
immediately after the end of ?, - If the grammar is unambiguous, then every
right-sentential form of the grammar has exactly
one handle. - ? is a string of terminals
7Example
Consider S ? aABe A
? Abc b B ? d
S ? aABe ? aAde ? aAbcde ? abbcde
It follows thatS ? aABe is a handle of aABe in
location 1. B ? d is a handle of aAde in location
3. A ? Abc is a handle of aAbcde in location 2. A
? b is a handle of abbcde in location 2.
8Handle Pruning
- A rightmost derivation in reverse can be obtained
by handle-pruning. - Apply this to the previous example.
S ? aABe A ? Abc b B ? d abbcde Find the
handle b at loc. 2 aAbcde b at loc. 3 is not a
handle aAAcde ... blocked.
9Handle-pruning, Bottom-up Parsers
- The process of discovering a handle reducing it
to the - appropriate left-hand side is called handle
pruning. - Handle pruning forms the basis for a bottom-up
parsing method. - To construct a rightmost derivation
- S?0 ? ?1 ? ?2 ? ... ? ?n-1 ? ?n ?
- input string
- Apply the following simple algorithm
- Start from ?n, find a handle An??n in ?n,
and
replace ?n by An to get ?n-1. - Then find a handle An-1??n-1 in ?n-1,
and
replace ?n-1 by An-1 to get ?n-2. - Repeat this, until we reach S.
10A Shift-Reduce Parser
- E ? ET T Right-Most Derivation of
ididid - T ? TF F E ? ET ? ETF ? ETid ? EFid
- F ? (E) id ? Eidid ? Tidid ? Fidid
? ididid - Right-Most Sentential Form Reducing Production
- ididid F ? id
- Fidid T ? F
- Tidid E ? T
- Eidid F ? id
- EFid T ? F
- ETid F ? id
- ETF T ? TF
- ET E ? ET
- E
- Handles are red and underlined in the
right-sentential forms
11A Stack Implementation of A Shift-Reduce Parser
- There are four possible actions of a shift-parser
action - Shift The next input symbol is shifted onto
the top of the stack. - Reduce Replace the handle on the top of the
stack by the non-terminal. - Accept Successful completion of parsing.
- Error Parser discovers a syntax error, and calls
an error recovery routine. - Initial stack just contains only the end-marker
. - The end of the input string is marked by the
end-marker .
12Shift Reduce Parsing with a Stack
- Two problems
- locate a handle and
- decide which production to use (if there are more
than two candidate productions). - General Construction using a stack
- shift input symbols into the stack until a
handle is found on top of it. - reduce the handle to the corresponding
non-terminal. - other operations
- accept when the input is consumed and only the
start symbol is on the stack, also error
13A Stack Implementation of A Shift-Reduce Parser
- Stack Input Action
- ididid shift
- id idid reduce by F ? id
- F idid reduce by T ? F
- T idid reduce by E ? T
- E idid shift
- E idid shift
- Eid id reduce by F ? id
- EF id reduce by T ? F
- ET id shift
- ET id shift
- ETid reduce by F ? id
- ETF reduce by T ? TF
- ET reduce by E ? ET
- E accept
14Conflicts During Shift-Reduce Parsing
- There are context-free grammars for which
shift-reduce parsers cannot be used. - Stack contents and the next input symbol may not
decide action - shift/reduce conflict Whether make a shift
operation or a reduction. - reduce/reduce conflict The parser cannot decide
which of several reductions to make. - If a shift-reduce parser cannot be used for a
grammar, that grammar is called as non-LR(k)
grammar. - left to right right-most k lookhead
- scanning derivation
- An ambiguous grammar can never be a LR grammar.
15Shift-Reduce Parsers
- There are two main categories of shift-reduce
parsers - Operator-Precedence Parser
- simple, but only a small class of grammars.
- LR-Parsers
- covers wide range of grammars.
- SLR simple LR parser
- Canonical LR most general LR parser
- LALR intermediate LR parser (lookhead LR
parser) - SLR, Canonical LR and LALR work same, only their
parsing tables are different.
16Operator-Precedence Parser
- Operator grammar
- small, but an important class of grammars
- we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator
grammar. - In an operator grammar, no production rule can
have - ? at the right side
- two adjacent non-terminals at the right side.
- Ex
- E?AB E?EOE E?EE
- A?a E?id EE
- B?b O?/ E/E id
- not operator grammar not operator
grammar operator grammar
17Precedence Relations
- In operator-precedence parsing, we define three
disjoint precedence relations between certain
pairs of terminals. - a lt. b b has higher precedence than a
- a b b has same precedence as a
- a .gt b b has lower precedence than a
- The determination of correct precedence relations
between terminals are based on the traditional
notions of associativity and precedence of
operators. (Unary minus causes a problem).
18Using Operator-Precedence Relations
- The intention of the precedence relations is to
find the handle of a right-sentential form, - lt. with marking the left end,
- appearing in the interior of the handle, and
- .gt marking the right hand.
- In our input string a1a2...an, we insert the
precedence relation between the pairs of
terminals (the precedence relation holds between
the terminals in that pair).
19Using Operator -Precedence Relations
- E ? EE E-E EE E/E EE (E)
-E id - The partial operator-precedence
- table for this grammar
- Then the input string ididid with the
precedence relations inserted will be - lt. id .gt lt. id .gt lt. id .gt
id
id .gt .gt .gt
lt. .gt lt. .gt
lt. .gt .gt .gt
lt. lt. lt.
20To Find The Handles
- Scan the string from left end until the first .gt
is encountered. - Then scan backwards (to the left) over any
until a lt. is encountered. - The handle contains everything to left of the
first .gt and to the right of the lt. is
encountered. - lt. id .gt lt. id .gt lt. id .gt E ? id id
id id - lt. lt. id .gt lt. id .gt E ? id E id
id - lt. lt. lt. id .gt E ? id E E id
- lt. lt. .gt E ? EE E E .E
- lt. .gt E ? EE E E
- E
21Operator-Precedence Parsing Algorithm
- The input string is w, the initial stack is
and a table holds precedence relations between
certain terminals - Algorithm
- set p to point to the first symbol of w
- repeat forever
- if ( is on top of the stack and p points
to ) then return - else
- let a be the topmost terminal symbol on
the stack and let b be the symbol pointed to by
p - if ( a lt. b or a b ) then /
SHIFT / - push b onto the stack
- advance p to the next input symbol
-
- else if ( a .gt b ) then / REDUCE /
- repeat pop stack
- until ( the top of stack terminal
is related by lt. to the terminal most recently
popped ) - else error()
-
22Operator-Precedence Parsing Algorithm -- Example
id
id .gt .gt .gt
lt. .gt lt. .gt
lt. .gt .gt .gt
lt. lt. lt.
- stack input action
- ididid lt. id shift
- id idid id .gt reduceE ? id
- idid shift
- idid shift
- id id id .gt reduce E ? id
- id shift
- id shift
- id id .gt reduce E ? id
- .gt reduce E ? EE
- .gt reduce E ? EE
- accept
23How to Create Operator-Precedence Relations
- We use associativity and precedence relations
among operators. - If operator ?1 has higher precedence than
operator ? 2,
? ? 1 .gt ? 2 and ? 2 lt. ? 1 - If operator ? 1 and operator ? 2 have equal
precedence,
they are left-associative ? ? 1 .gt ? 2
and ? 2 .gt ? 1
they are
right-associative ? ? 1 lt. ? 2 and ? 2 lt. ? 1 - For all operators ?, ? lt. id, id .gt ?, ? lt. (,
(lt. ?, ? .gt ), ) .gt ?, ? .gt , and lt. ? - Also, let
- () lt. ( id .gt ) ) .gt
- ( lt. ( lt. id id .gt ) .gt )
- ( lt. id
24Operator-Precedence Relations
- / id ( )
.gt .gt lt. lt. lt. lt. lt. .gt .gt
- .gt .gt lt. lt. lt. lt. lt. .gt .gt
.gt .gt .gt .gt lt. lt. lt. .gt .gt
/ .gt .gt .gt .gt lt. lt. lt. .gt .gt
.gt .gt .gt .gt lt. lt. lt. .gt .gt
id .gt .gt .gt .gt .gt .gt .gt
( lt. lt. lt. lt. lt. lt. lt.
) .gt .gt .gt .gt .gt .gt .gt
lt. lt. lt. lt. lt. lt. lt.
25Handling Unary Minus
- Operator-Precedence parsing cannot handle the
unary minus when we also have the binary minus in
our grammar. - The best approach to solve this problem, let the
lexical analyzer handle this problem. - The lexical analyzer will return two different
tokens for the unary minus and the binary minus. - The lexical analyzer will need a lookhead to
distinguish the binary minus from the unary
minus. - Then, we make
- ? lt. unary-minus for any operator
- unary-minus .gt ? if unary-minus has higher
precedence than ? - unary-minus lt. ? if unary-minus has lower (or
equal) precedence than ?
26Precedence Functions
- Compilers using operator precedence parsers do
not need to store the table of precedence
relations. - The table can be encoded by two precedence
functions f and g that map terminal symbols to
integers. - For symbols a and b.
- f(a) lt g(b) whenever a lt. b
- f(a) g(b) whenever a b
- f(a) gt g(b) whenever a .gt b
Algorithm 4.6 Constructing precedence functions
27Constructing precedence functions
- Method
- Create symbols fa and gb for each a that is a
terminal or . - Partition the created symbols into as many groups
as possible, in such a way that if a . b, then
fa and gb are in the same group. - Create a directed graph whose nodes are the
groups found in (2). For any a and b, if a lt.b ,
place an edge from the group of gb to the group
of fa. Of a .gt b, place an edge from the group of
fa to that of gb. - If the graph constructed in (3) has a cycle, then
no precedence functions exist. If there are no
cycle, let f(a) be the length of the longest path
beginning at the group of fa let g(a) be the
length of the longest path beginning at the group
of ga.
28Example
Id
f 2 4 4 0
g 1 3 5 0
29Disadvantages of Operator Precedence Parsing
- Disadvantages
- It cannot handle the unary minus (the lexical
analyzer should handle the unary minus). - Small class of grammars.
- Difficult to decide which language is recognized
by the grammar. - Advantages
- simple
- powerful enough for expressions in programming
languages
30The End