Title: Syntactic Analysis I
1Chapter 3
2- The Syntactic Analyzer, or Parser, is the heart
of the front end of the compiler. - The parser's main task is to analyze the
structure of the program and its component
statements. - Our principle resource in Parser Design is the
theory of Formal Languages. - We will use and study context free grammars
- Rare exceptions occur when a context-free grammar
cannot enforce a language requirement. For
example it cannot enforce the rule that
identifiers must be declared before use.
31. Grammars
- Informal Definition -- a finite set of rules for
generating an infinite set of sentences. - In natural languages, sentences are made up of
words - In programming languages, sentences are made up
of tokens.
4- Def Generative Grammar this type of grammar
builds a sentence in a series of steps, refining
each step, to go from an abstract to a concrete
sentence. - Analyzing, or parsing, the sentence consists of
reconstructing the way in which the sentence was
formed. This is done through a parse tree.
5- Def Parse Tree a tree that represents the
analysis/structure of a sentence (following the
refinement steps used by a generative grammar to
build it.
6- If you build the parse tree from top to bottom
(top-down), you are reconstructing the steps of
the speaker (or writer) in creating the sentence. - If you build the parse tree from bottom to the
top (bottom-up), you are reconstructing the steps
of the listener (or reader) in understanding the
sentence.
7Sample English Grammar Rules
- A sentence can consist of a noun and verb phrase
- A noun phrase can consist of an article and a
noun - A verb phrase can consist of a verb and a noun
phrase - Possible nouns are dog, cat, bone, etc.
- Possible articles are the, a, an, etc.
- Possible verbs are gnawed, saw, walks, etc.
8- These rules can be concisely represented by
- ltsentencegt -gt ltnoun phrasegtltverb phrasegt
- ltnoun phrasegt -gt ltarticlegtltnoungt
- ltverb phrasegt -gt ltverbgtltnoun phrasegt
- ltnoungt -gt dog, cat, bone,
- ltarticlegt -gt a, an, the,
- ltverbgt -gt contains, gnawed, saw, walks,
9Sample programming language grammar rules
- Rules
- ltexpressiongt -gt ltexpressiongt ltexpressiongt
- ltexpressiongt -gt ltexpressiongt ltexpressiongt
- ltexpressiongt -gt a, b, c,
- You can use these rules to parse the expression
- a b c
10- Def Productions/Re-Write Rules rules that
explain how to refine the steps of a generative
grammar. - Def Terminals the actual words in the language.
- Def Non-Terminals Symbols not in the language,
but part of the refinement process.
111.1 Syntax and Semantics
- Syntax deals with the way a sentence is put
together. - Semantics deals with what the sentence means.
12- There are sentences that are grammatically
correct that do not make any sense. - There are things that make sense that are not
grammatically correct. - The compiler will check for syntactical
correctness, yet it is the programmers
responsibility (usually during debugging) to make
sure it makes sense.
131.2 Grammars Formal Definition
- G (T,N,S,R)
- T set of Terminals
- N set of Non-Terminals
- S Start Symbol (element of N)
- R Set of Rewrite Rules (productions) (a -gt b)
- In your rewrite rules, if a is a single
non-terminal the language is Context-Free.
14BNF and EBNF
- BNF stands for Backus-Naur Form
- is used in place of -gt
- Uses the when representing productions with the
same left-hand side - EBNF extends BNF to include additional constructs
- is equivalent to ( )
- is used to indicate an optional element
- ltinteger_constgt - ltdigitgt ltdigitgt
151.3 Parse Trees and Derivations
- Use a1 gt a2 to show that string a1 is changed to
string a2 by applying one production. - Use a gt b to show that you can get to ß from a
in 0 or more production - Sentential forms are the strings appearing in
various derivation steps - L(G) w S Ggt w , represents the set of
all strings of terminal derivable from S using G
161.4 Rightmost and Leftmost Derivations
- Which non-terminal do you rewrite-expand when
there is more than one to choose from. - If you always select the rightmost non-terminal
to expand, it is a Rightmost Derivation. - Leftmost and Rightmost derivations must result in
a unique parse tree otherwise the grammar is
ambiguous.
17- Def any sentential form occurring in a leftmost
derivation is termed a left sentential form. - Def any sentential form occurring in a rightmost
derivation is termed a right sentential form. - Some parsers construct leftmost derivations and
others rightmost, so it is important to
understand the difference.
18- Given (pg 72) GE (T, N, S, R)
- T i, , -, , /, (, ),
- N E
- S E
- R
- E -gt E E E -gt E - E
- E -gt E E E -gt E / E
- E -gt ( E ) E -gt i
- consider (ii)/(i-i)
19(No Transcript)
201.5 Ambiguous Grammars
- Given (pg 72) GE (T, N, S, R)
- T i, , -, , /, (, ),
- N E
- S E
- R
- E -gt E E E -gt E - E
- E -gt E E E -gt E / E
- E -gt ( E ) E -gt i
- consider i i i
21(No Transcript)
22- a grammar in which it is possible to parse even
one sentence in two or more different ways is
ambiguous - A language for which no unambiguous grammar
exists is said to be inherently ambiguous
23- The previous example is "fixed" by
operator-precedence rules, - or re-write the grammar
- E -gt E T E - T T
- T -gt T F T / F F
- F -gt ( E ) i
- Try iii
24(No Transcript)
251.6 The Chomsky Hierarchy (from the outside in)
- Type 0 grammars
- gAd -gt gbd
- these are called phrase structured, or
unrestricted grammars. - It takes a Turing Machine to recognize these
types of languages.
26- Type 1 grammars
- gAd -gt gbd b ! e
- therefore the sentential form never gets
shorter. - Context Sensitive Grammars.
- Recognized by a simpler Turing machine linear
bounded automata (lba)
27- Type 2 grammars
- A -gt b
- Context Free Grammars
- it takes a stack automaton to recognize CFG's
(FSA with temporary storage) - Nondeterministic Stack Automaton cannot be mapped
to a DSA, but all the languages we will look at
will be DSA's
28- Type 3 grammars
- The Right Hand Side may be
- a single terminal
- a single non-terminal followed by a single
terminal. - Regular Grammars
- Recognized by FSA's
291.7 Some Context-Free and Non-Context-Free
Languages
- Example 1
- S -gt S S
- (S)
- ( )
- This is Context Free.
30- Example 2
- anbncn
- this is NOT Context Free.
31- Example 3
- S -gt aSBC
- S -gt abC
- CB -gt BC
- bB -gt bb
- bC -gt bc
- cC -gt cc
- This is a Context Sensitive Grammar
32- L2 wcw w in (T-c) is NOT a Context Free
Grammar.
331.8 More about the Chomsky Hierarchy
- There is a close relationship between the
productions in a CFG and the corresponding
computations to be carried out by the program
being parsed. - This is the basis of Syntax-directed translation
which we use to generate intermediate code.
342. Top-Down parsers
- The top-down parser must start at the root of the
tree and determine, from the token stream, how to
grow the parse tree that results in the observed
tokens. - This approach runs into several problems, which
we will now deal with.
352.1 Left Recursion
- Productions of the form (A-gtAa) are left
recursive. - No Top-Down parser can handle left recursive
grammars. - Therefore we must re-write the grammar and
eliminate both direct and indirect left
recursion.
36- How to eliminate Left Recursion (direct)
- Given
- A -gt Aa1 Aa2 Aa3
- A -gt d1 d2 d3 ...
- Introduce A'
- A -gt d1 A' d2 A' d3 A' ...
- A' -gt e a1 A' a2 A' a3 A' ...
- Example
- S -gt Sa b
- Becomes
- S -gt bS'
- S' -gt e a S'
37- How to remove ALL Left Recursion.
- 1. Make list of all nonterminals in the
sequence as they occur in the list of
productions. - 2. For each nonterminal. If the RHS begins with
a nonterminal earlier in the production list as
in prod. 2, where A appeared earlier in the list, - A -gt g1 g2 g3 ... (prod. 1)
- B -gt Ab (prod. 2)
- Then replace B as follows
- A -gt g1 g2 g3 ... (prod. 1)
- B -gt g1b g2b g3b ... ( new prod. 2)
- 3. After all done, remove immediate left
recursion.
38- Example
- S -gt aA b cS (A -gt g1 g2 g3 ...)
- A -gt Sd ? (Ab)
- becomes
- S -gt aA b cS (A -gt g1 g2 g3 ...)
- A -gt aAd bd cSd ? (B -gt g1b g2b g3b
... )
392.2 Backtracking
- One way to carry out a top-down parse is simply
to have the parser try all applicable productions
exhaustively until it finds a tree. - This is sometimes called the brute force method.
- It is similar to depth-first search of a graph
- Tokens may have to be put back on the input
stream
40- Given a grammar
- S -gt ee bAc bAe
- A -gt d cA
- A Backtracking algorithm will not work properly
with this grammar. - Example input string is bcde
- When you see a b you select S -gt bAc. Then use A
-gtcA to expand A. Then use A -gt to expand A
again. This yields bcdc. - This is wrong since the last letter is e not c
41- The solution is Left Factorization.
- Def Left Factorization -- create a new
non-terminal for a unique right part of a left
factorable production. - Left Factor the grammar given previously.
- S -gt ee bAQ
- Q -gt c e
- A -gt d cA
- A viable solution to backtracking only if
terminals precede nonterminals.
423. Recursive-Descent Parsing
- There is one function for each non-terminal
these functions try each production and call
other functions for non-terminals. - The stack is invisible for CFG's
- The problem is -- a new grammar requires new
code.
43- Code
- function S boolean
- begin
- S true
- if token_is ('b') then
- if A then
- writeln('S --gt bA')
- else
- S false
- else
- if token_is ('c') then
- writeln ('S --gt c')
- else
- begin
- error ('S')
- S false
- end
- end S
- Example
- S -gt bA c
- A -gt dSd e
44- else
- A false
- end
- else
- if token_is ('e') then
- writeln ('A --gt e')
- else
- begin
- error ('A')
- A false
- end
- end A
- function A boolean
- begin
- A true
- if token_is ('d') then
- begin
- if S then
- if token_is ('d') then
- writeln('A --gt dSd')
- else
- begin
- error ('A')
- A false
- end
45Input String bdcd
46Problem with Recursive Descent Parsers
- Can potentially require a good deal of
backtracking and experimenting before the right
derivation is found. - Very expensive processing time
- Left factorization cannot solve backtracking if
nonterminals are not preceded by terminal
symbols. - Solution give the parser the ability to look
ahead in the grammar PREDICTIVE PARSERS
474. Predictive Parsers
- Answers the following questions
- Given multiple RHSs that start with the same
nonterminal, which one does the parser chooses? - If a nonterminal derives an ?, how does the
parser know which production to use next? - What if a nonterminal derives a nonterminal that
derives an ??
48- The goal of a predictive parser is to know which
characters on the input string trigger which
productions in building the parse tree. - Backtracking can be avoided if the parser had the
ability to look ahead in the grammar so as to
anticipate what terminals are derivable (by
leftmost derivations) from each of the various
nonterminals on the RHS.
49- Rules to construct First (a)
- 1.if a begins with a terminal x,
- then first(a) x.
- 2.if a gt e,
- then first(a) includes e.
- 3.First(e) e.
- 4.if a begins with a nonterminal A,
- then first(a) includes first(A) - e
50- Hidden trap with Rule 4
- If a -gt ABd and A in nullable, you have to
include the FIRST(B). Similarly, if B is
nullable, you have to include FIRST(d) and so
forth. - This requires the modification of the rules for
deriving FIRST Sets.
51Rules to construct FIRST Sets
- Case 1 a is a single character or ?
- If a is a terminal y, then FIRST(a) y
- Else if a is ? the FIRST(a) ?
- Else if a is a nonterminal and a -gt ß1 ß2 ß3
- Then FIRST(a) Uk FIRST (ßk)
- Case 2 a X1 X2Xn
- FIRST(a)
- j 0
- Repeat
- j j 1
- Include FIRST(Xj) in FIRST(a)
- Until Xj nonnullable or j n
- If Xn is nullable then add to FIRST(a).
52Problems with FIRST Sets
- If FIRST(a) and FIRST(ß) are not disjoint, FIRST
sets are useless - For grammars that acquire ? productions as a
result of removing left recursions, the FIRST
sets will not tell us when to choose - A ??.
- For these cases, FOLLOW Sets are needed
53FOllOW Sets
- Given a nonterminal symbol, A
- FOLLOW(A) set is the set of all terminals that
can come after A in any sentential form of L(G).
If A can come right at the end, then FOLLOW(A)
includes the end marker, .
54- Follow(a)
- 1.if A is the start symbol,
- then put the end marker into follow(A).
- 2.for each production with A on the right hand
side Q -gt xAy - 1.if y begins with a terminal q,
- q is in follow(A).
- 2.else follow(A) includes first(y)-e.
- 3.if y e, or y is nullable (y gt e)
- then add follow(Q) to follow(A).
55- Grammar
- E -gt T E
- E -gt T E - T E epsilon
- T -gt F T
- T -gt F T / F T epsilon
- F -gt ( E ) i
- Construction of First and Follow Sets
56- FIRST Set Construction
- First(E) i,(
- First(E') ,-,e
- First(T) i,(
- First(T') ,/,e
- First(F) i,(
57- FOLLOW Set Construction
- Follow(E) ,)
- Follow(E') Follow(E) , )
- Follow(T) First(E') - e Follow(E)
,- ), - Follow(T') Follow(T)
- Follow(F) First(T') - e Follow(T)
,/ ,-,),
584.1 A Predictive Recursive-Descent Parser
- The book builds a predictive recursive-descent
parser for - E -gt E T T
- T -gt T F F
- F -gt ( E ) I
- First step is -- Remove Left Recursion
- Then the FIRST and FOLLOW sets are determined
59- Impractical because for every production, a
function must be written - If the grammar is changed, one or more function
has to be re-rewritten. - The solution to this is a table driven parser.
604.2 Table-Driven Predictive Parsers
- Grammar
- E -gt E T E - T T
- T -gt T F T / F F
- F -gt ( E ) I
- Step 1 Eliminate Left Recursion.
61- Grammar without left recursion
- E -gt T E
- E -gt T E - T E epsilon
- T -gt F T
- T -gt F T / F T epsilon
- F -gt ( E ) I
- It is easier to show you the table, and how it is
used first, and to show how the table is
constructed afterward.
62 63- Driver Algorithm
- Push onto the stack
- Put a similar end marker on the end of the
string. - Push the start symbol onto the stack.
64- While (stack not empty do)
- Let x top of stack and a incoming token.
- If x is in T (a terminal)
- if x a then pop x and goto next input token
- else error
- else (nonterminal)
- if Tablex,a
- pop x
- push Tablex,a onto stack in reverse order
- else error
- It is a successful parse if the stack is empty
and the input is used up.
65 66(No Transcript)
67(No Transcript)
68 69(No Transcript)
704.3 Constructing the Predictive Parser Table
- Go through all the productions. X -gt b is
your typical production. - 1.For all terminals a in First(b), except e,
TableX,a b. - 2.If b e, or if e is in first(b) then For ALL a
in Follow(X), TableX,a e. - So, Construct First and Follow for all Left and
right hand sides.
714.4 Conflicts
- A conflict occurs if there is more than 1 entry
in a table slot. This can sometimes be fixed by
Left Factoring, ... - If a grammar is LL(1) there will not be multiple
entries.
725. Summary
- Left Recursion
- Left Factorization
- First (A)
- Follow (A)
- Predictive Parsers (table driven)