Title: Chapter 4. Syntax Analysis (1)
1Chapter 4.Syntax Analysis (1)
2Application of a production ?A????? in a
derivation step ?i ? ?i1
3Formal grammars (1/3)
- Example Let G1 have N A, B, C, T a, b,
c and the set of productions - ? ? A CB ? BC
- A ? aABC bB ? bb
- A ? abC bC ? bc
- cC ? cc
- The reader should convince himself that the word
akbkck is in L(G1) for all k ? 1 and that only
these words are in L(G1). That is, - L(G1) akbkck k ? 1.
4Formal grammars (2/3)
- Example Grammar G2 is a modification of G1
- G2 ? ? A CB ? BC
- A ? aABC bB ? bb
- A ? abC bC ? b
- The reader may verify that L(G2) akbk k ?
1. Note that the last rule, bC ? b, erases all
the C's from the derivation, and that only this
production removes the nonterminal C from
sentential forms.
5Formal grammars (3/3)
- Example A simpler grammar that generates akbk
k ? 1 is the grammar G3 - G3 ? ? S
- S ? aSb
- S ? ab
- A derivation of a3b3 is
- ? ? S ? aSb ? aaSbb ? aaabbb
- The reader may verify that L(G3) akbk k ?
1.
6Type Format of Productions Remarks
0 fA?? f? ? Unrestricted Substitution Rules
1 fA?? f? ?, ??? ??? Context Sensitive Context Free Right Linear Left Linear
2 A ??, ??? ??? Context Sensitive Context Free Right Linear Left Linear
3 A?aB A?a ??? A?Ba A ?a ??? Context Sensitive Context Free Right Linear Left Linear
Contracting
Noncon- tracting
Regular
The four types of formal grammars
7Context-Sensitive Grammars(Type1)
Unrestricted Grammars(Type0)
- Definition A context-sensitive grammar G
(N,T,P,?) is a formal grammar in which all
productions are of the form - fA??f??, ?? ?
- The grammar may also contain the production ?
??, if G is a context-sensitive (type1) grammar,
then L(G) is a context-sensitive (type1) language.
8Context-Free Grammars (Type2)
- Definition A context-free grammar G(N,T,P,?)
is a formal grammar in which all productions are
of the form - A??
- The grammar may also contain the production ?
??. If G is a context-free (type2) grammar, then
L(G) is a context-free (type2) language.
A?N?? ??(N?T)-?
9Regular Grammars (Type3) (1/2)
- Definition A production of the form
- A?aB or A?a
- is called a right linear production. A
production of the form - A?Ba or A?a
- is a left linear production. A formal grammar is
right linear if it contains only right linear
productions, and is left linear if it contains
only left linear production ? ??. Left and right
linear grammars are also known as regular
grammars. If G is a regular (type3) grammar, then
L(G) is a regular (type3) language.
A?N?? B?N a?T
A?N?? B?N a?T
10Regular Grammars (Type3) (2/2)
- Example A left linear grammar G1 and a right
linear grammar G2 have productions as follows - G1 G2
- The reader may verify that
- L(G1) (10)11(01)L(G2)
? ? 1B ? ? 1 A ? 1B B ? 0A A ? 1
? ? B1 ? ? 1 A ? B1 B ? A0 A ? 1
11Ambiguity (1/2)
- Example Consider the context-free grammar
- G ? ? S
- S ? SS
- S ? ab
- We see that the derivations correspond to
different tree diagrams. The grammar G is
ambiguous with respect to the sentence ababab if
the tree diagrams were used as the basis for
assigning meaning to the derived string, mistaken
interpretation could result.
12Ambiguity (2/2)
- Definition A context-free grammar is ambiguous
if and only if it generates some sentence by two
or more distinct leftmost derivations.
13Fig. 4.1. Position of parser in compiler model.
14Syntax Error Handling (1/2)
- Probable Errors
- lexical, such as misspelling an identifier,
keyword, or operator - syntactic, such as an arithmetic expression with
unbalanced parentheses - semantic, such as an operator applied to an
incompatible operand - logical, such as an infinitely recursive call
15Syntax Error Handling (2/2)
- The error handler in a parser has simple-to-state
goals - It should report the presence of errors clearly
and accurately. - It should recover from each error quickly enough
to be able to detect subsequent errors. - It should not significantly slow down the
processing of correct programs.
16Error-Recovery Strategies
- panic mode
- phrase level
- error productions
- global correction
17Example 4.2
- The grammar with the following productions
defines simple arithmetic expressions.
expr expr expr expr op op op op op ? ? ? ? ? ? ? ? ? expr op expr ( expr ) - expr id - / ?
18Notational Conventions (1/2)
- 1. These symbols are terminals
- i) Lower-case letters early in the alphabet such
as a, b, c. - ii) Operator symbols such as , -, etc.
- iii) Punctuation symbols such as parentheses,
comma, etc. - iv) The digits 0, 1, . . . , 9.
- v) Boldface strings such as id or if.
- 2. These symbols are nonterminals
- i) Upper-case letters early in the alphabet such
as A, B, C. - ii) The letter S, which, when it appears, is
usually the start symbol. - iii) Lower-case italic names such as expr or
stmt. - 3. Upper-case letters late in the alphabet, such
as X, Y, Z, represent grammar symbols, that is,
either nonterminals or terminals.
19Notational Conventions (2/2)
- 4. Lower-case letters late in the alphabet,
chiefly u, v, . . . , z, represent strings of
terminals. - 5. Lower-case Greek letters, ?, ?, ?, for
example, represent strings of grammar symbols.
Thus, a generic production could be written as
A ? ?, indicating that there is a single
nonterminal A on the left of the arrow (the left
side of the production) and a string of grammar
symbols ? to the right of the arrow (the right
side of the production). - 6. If A ? ?1, A ? ?2, . . . , A ? ?k are all
productions with A on the left (we call them
A-productions), we may write A ? ?1 ?2 . . .
?k . We call ?1, ?2, . . . , ?k the alternatives
for A. - 7. Unless otherwise stated, the left side of the
first production is the start symbol.
20Derivations
- We say that ?A? ? ??? if A ? ? is a production
and ? and ? are arbitrary strings of grammar
symbols. If - ?1 ? ?2 ? . . . ? ?n, we say ?1 derives ?n. The
symbol ? means derives in one step. Often we
wish to say derives in zero or more steps. For
this purpose we can use the symbol ?. Thus, - 1. ? ? ? for any string ?, and
- 2. If ? ? ? and ? ? ?, then ? ? ?.
21Fig. 4.3. Building the parse tree from derivation
(4.4)
(Grammar 4.4 ) E ? -E ? -(E) ? -(EE) ? -(idE) ?
-(idid)
22Eliminating Ambiguity
stmt ? if expr then stmt if expr then stmt else stmt other
stmt matched_stmt unmatched_stmt ? ? ? matched_stmt unmatched_stmt if expr then matched_stmt else matched_stmt other if expr then stmt if expr then matched_stmt else unmatched_stmt
23Elimination of Left Recursion
- No matter how many A-productions there are, we
can eliminate immediate left recursion from them
by the following technique. First, we group the
A-productions as - A ? A?1 A?2 . . . A?m ?1 ?2 . . .
?n - where no begins with an A. Then, we replace the
A-productions by - A ? ?1A' ?2A' . . . ?nA'
- A' ? ?1A' ?2A' . . . ?mA' ?
24Left Factoring
- In general, if A ? ??1 ??2 are two
A-productions, and the input begins with a
nonempty string derived from ?, we do not know
whether to expand A to ??1 or to ??2 . However,
we may defer the decision by expanding A to ?A'.
Then, after seeing the input derived from ?, we
expand A' to ?1 or to ?2 . That is,
left-factored, original productions become - A ? ?A' A' ? ?1 ?2
- Example 4.12.
- The language L2 anbmcndm n ? 1 and m ? 1
25Fig. 4.9. Steps in top-down parse.
(a)
(b)
(c)
26Fig. 4.10. Transition diagrams for grammar (4.11).
(Grammar 4.11 )
E E' T T' F ? ? ? ? ? TE' TE' ? FT' FT' ? (E) id
27Fig. 4.11. Simplified transition diagrams.
28Fig. 4.12. Simplified transition diagrams for
arithmetic expressions.
29Fig. 4.13. Model of a nonrecursive predictive
parser.
30Nonrecursive Predictive Parsing
- 1. If X a , the parser halts and announces
successful completion of parsing. - 2. If X a ? , the parser pops X off the stack
and advances the input pointer to the next input
symbol. - 3. If X is a nonterminal, the program consults
entry MX, a of the parsing table M. This entry
will be either an X-production of the grammar or
an error entry. If, for example, MX, a X ?
UVW, the parser replaces X on top of the stack
by WVU (with U on top). As output, we shall
assume that the parser just prints the production
used any other code could be executed here. If
MX, a error, the parser calls an error
recovery routine.
31Fig. 4.15. Parsing table M for grammar (4.11).
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL Id ( )
E E' T T' F E ? TE' T ? FT' F ? id E' ? TE' T' ? ? T' ? FT' E ? TE' T ? FT' F ? (E) E' ? ? T' ? ? E' ? ? T' ? ?
32Fig. 4.16. Moves made by predictive parser on
input id id id.
STACK INPUT OUTPUT
E E' T E' T' F E' T' id E' T' E' E' T E' T E' T' F E' T' id E' T' E' T' F E' T' F E' T' id E' T' E' id id id id id id id id id id id id id id id id id id id id id id id id id id id id E ? T E' T ? F T' F ? id T' ? ? E' ? T E' T ? F T' F ? id T' ? F T' F ? id T' ? ? E' ? ?
33Fig. 4.17. Parsing table M for grammar (4.13).
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL a b e i t
S S ? a S ? iEtSS'
S' S' ? ? S' ? eS S' ? ?
E E ? b
S E ? ? iEtS iEtSeS a b
(Grammar 4.13 )
34Fig. 4.18. Synchronizing tokens added to parsing
table of Fig. 4.15.
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL id ( )
E E' T T' F E ? TE' T ? FT' F ? id E' ? TE' synch T' ? ? synch T' ? FT' synch E ? TE' T ? FT' F ? (E) synch E' ? ? synch T' ? ? synch synch E' ? ? synch T' ? ? synch
35Fig. 4.19. Parsing and error recovery moves made
by predictive parser.
STACK INPUT OUTPUT
E E E' T E' T' F E' T' id E' T' E' T' F E' T' F E' T' E' E' T E' T E' T' F E' T' id E' T' E' ) id id id id id id id id id id id id id id id id id id id error, skip ) id is in FIRST(E) error, MF, synch F has been popped