Chapter 4. Syntax Analysis (1) - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 4. Syntax Analysis (1)

Description:

Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar ... Lower-case Greek letters, , , , for example, represent strings of grammar symbols. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 36
Provided by: borameCs
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4. Syntax Analysis (1)


1
Chapter 4.Syntax Analysis (1)
2
Application of a production ?A????? in a
derivation step ?i ? ?i1
3
Formal grammars (1/3)
  • Example Let G1 have N A, B, C, T a, b,
    c and the set of productions
  • ? ? A CB ? BC
  • A ? aABC bB ? bb
  • A ? abC bC ? bc
  • cC ? cc
  • The reader should convince himself that the word
    akbkck is in L(G1) for all k ? 1 and that only
    these words are in L(G1). That is,
  • L(G1) akbkck k ? 1.

4
Formal grammars (2/3)
  • Example Grammar G2 is a modification of G1
  • G2 ? ? A CB ? BC
  • A ? aABC bB ? bb
  • A ? abC bC ? b
  • The reader may verify that L(G2) akbk k ?
    1. Note that the last rule, bC ? b, erases all
    the C's from the derivation, and that only this
    production removes the nonterminal C from
    sentential forms.

5
Formal grammars (3/3)
  • Example A simpler grammar that generates akbk
    k ? 1 is the grammar G3
  • G3 ? ? S
  • S ? aSb
  • S ? ab
  • A derivation of a3b3 is
  • ? ? S ? aSb ? aaSbb ? aaabbb
  • The reader may verify that L(G3) akbk k ?
    1.

6
Type Format of Productions Remarks
0 fA?? f? ? Unrestricted Substitution Rules
1 fA?? f? ?, ??? ??? Context Sensitive Context Free Right Linear Left Linear
2 A ??, ??? ??? Context Sensitive Context Free Right Linear Left Linear
3 A?aB A?a ??? A?Ba A ?a ??? Context Sensitive Context Free Right Linear Left Linear
Contracting
Noncon- tracting
Regular
The four types of formal grammars
7
Context-Sensitive Grammars(Type1)
Unrestricted Grammars(Type0)
  • Definition A context-sensitive grammar G
    (N,T,P,?) is a formal grammar in which all
    productions are of the form
  • fA??f??, ?? ?
  • The grammar may also contain the production ?
    ??, if G is a context-sensitive (type1) grammar,
    then L(G) is a context-sensitive (type1) language.

8
Context-Free Grammars (Type2)
  • Definition A context-free grammar G(N,T,P,?)
    is a formal grammar in which all productions are
    of the form
  • A??
  • The grammar may also contain the production ?
    ??. If G is a context-free (type2) grammar, then
    L(G) is a context-free (type2) language.

A?N?? ??(N?T)-?
9
Regular Grammars (Type3) (1/2)
  • Definition A production of the form
  • A?aB or A?a
  • is called a right linear production. A
    production of the form
  • A?Ba or A?a
  • is a left linear production. A formal grammar is
    right linear if it contains only right linear
    productions, and is left linear if it contains
    only left linear production ? ??. Left and right
    linear grammars are also known as regular
    grammars. If G is a regular (type3) grammar, then
    L(G) is a regular (type3) language.

A?N?? B?N a?T
A?N?? B?N a?T
10
Regular Grammars (Type3) (2/2)
  • Example A left linear grammar G1 and a right
    linear grammar G2 have productions as follows
  • G1 G2
  • The reader may verify that
  • L(G1) (10)11(01)L(G2)

? ? 1B ? ? 1 A ? 1B B ? 0A A ? 1
? ? B1 ? ? 1 A ? B1 B ? A0 A ? 1
11
Ambiguity (1/2)
  • Example Consider the context-free grammar
  • G ? ? S
  • S ? SS
  • S ? ab
  • We see that the derivations correspond to
    different tree diagrams. The grammar G is
    ambiguous with respect to the sentence ababab if
    the tree diagrams were used as the basis for
    assigning meaning to the derived string, mistaken
    interpretation could result.

12
Ambiguity (2/2)
  • Definition A context-free grammar is ambiguous
    if and only if it generates some sentence by two
    or more distinct leftmost derivations.

13
Fig. 4.1. Position of parser in compiler model.
14
Syntax Error Handling (1/2)
  • Probable Errors
  • lexical, such as misspelling an identifier,
    keyword, or operator
  • syntactic, such as an arithmetic expression with
    unbalanced parentheses
  • semantic, such as an operator applied to an
    incompatible operand
  • logical, such as an infinitely recursive call

15
Syntax Error Handling (2/2)
  • The error handler in a parser has simple-to-state
    goals
  • It should report the presence of errors clearly
    and accurately.
  • It should recover from each error quickly enough
    to be able to detect subsequent errors.
  • It should not significantly slow down the
    processing of correct programs.

16
Error-Recovery Strategies
  • panic mode
  • phrase level
  • error productions
  • global correction

17
Example 4.2
  • The grammar with the following productions
    defines simple arithmetic expressions.

expr expr expr expr op op op op op ? ? ? ? ? ? ? ? ? expr op expr ( expr ) - expr id - / ?
18
Notational Conventions (1/2)
  • 1. These symbols are terminals
  • i) Lower-case letters early in the alphabet such
    as a, b, c.
  • ii) Operator symbols such as , -, etc.
  • iii) Punctuation symbols such as parentheses,
    comma, etc.
  • iv) The digits 0, 1, . . . , 9.
  • v) Boldface strings such as id or if.
  • 2. These symbols are nonterminals
  • i) Upper-case letters early in the alphabet such
    as A, B, C.
  • ii) The letter S, which, when it appears, is
    usually the start symbol.
  • iii) Lower-case italic names such as expr or
    stmt.
  • 3. Upper-case letters late in the alphabet, such
    as X, Y, Z, represent grammar symbols, that is,
    either nonterminals or terminals.

19
Notational Conventions (2/2)
  • 4. Lower-case letters late in the alphabet,
    chiefly u, v, . . . , z, represent strings of
    terminals.
  • 5. Lower-case Greek letters, ?, ?, ?, for
    example, represent strings of grammar symbols.
    Thus, a generic production could be written as
    A ? ?, indicating that there is a single
    nonterminal A on the left of the arrow (the left
    side of the production) and a string of grammar
    symbols ? to the right of the arrow (the right
    side of the production).
  • 6. If A ? ?1, A ? ?2, . . . , A ? ?k are all
    productions with A on the left (we call them
    A-productions), we may write A ? ?1 ?2 . . .
    ?k . We call ?1, ?2, . . . , ?k the alternatives
    for A.
  • 7. Unless otherwise stated, the left side of the
    first production is the start symbol.

20
Derivations
  • We say that ?A? ? ??? if A ? ? is a production
    and ? and ? are arbitrary strings of grammar
    symbols. If
  • ?1 ? ?2 ? . . . ? ?n, we say ?1 derives ?n. The
    symbol ? means derives in one step. Often we
    wish to say derives in zero or more steps. For
    this purpose we can use the symbol ?. Thus,
  • 1. ? ? ? for any string ?, and
  • 2. If ? ? ? and ? ? ?, then ? ? ?.





21
Fig. 4.3. Building the parse tree from derivation
(4.4)
(Grammar 4.4 ) E ? -E ? -(E) ? -(EE) ? -(idE) ?
-(idid)
22
Eliminating Ambiguity
stmt ? if expr then stmt if expr then stmt else stmt other
stmt matched_stmt unmatched_stmt ? ? ? matched_stmt unmatched_stmt if expr then matched_stmt else matched_stmt other if expr then stmt if expr then matched_stmt else unmatched_stmt
23
Elimination of Left Recursion
  • No matter how many A-productions there are, we
    can eliminate immediate left recursion from them
    by the following technique. First, we group the
    A-productions as
  • A ? A?1 A?2 . . . A?m ?1 ?2 . . .
    ?n
  • where no begins with an A. Then, we replace the
    A-productions by
  • A ? ?1A' ?2A' . . . ?nA'
  • A' ? ?1A' ?2A' . . . ?mA' ?

24
Left Factoring
  • In general, if A ? ??1 ??2 are two
    A-productions, and the input begins with a
    nonempty string derived from ?, we do not know
    whether to expand A to ??1 or to ??2 . However,
    we may defer the decision by expanding A to ?A'.
    Then, after seeing the input derived from ?, we
    expand A' to ?1 or to ?2 . That is,
    left-factored, original productions become
  • A ? ?A' A' ? ?1 ?2
  • Example 4.12.
  • The language L2 anbmcndm n ? 1 and m ? 1

25
Fig. 4.9. Steps in top-down parse.
(a)
(b)
(c)
26
Fig. 4.10. Transition diagrams for grammar (4.11).
(Grammar 4.11 )
E E' T T' F ? ? ? ? ? TE' TE' ? FT' FT' ? (E) id
27
Fig. 4.11. Simplified transition diagrams.
28
Fig. 4.12. Simplified transition diagrams for
arithmetic expressions.
29
Fig. 4.13. Model of a nonrecursive predictive
parser.
30
Nonrecursive Predictive Parsing
  • 1. If X a , the parser halts and announces
    successful completion of parsing.
  • 2. If X a ? , the parser pops X off the stack
    and advances the input pointer to the next input
    symbol.
  • 3. If X is a nonterminal, the program consults
    entry MX, a of the parsing table M. This entry
    will be either an X-production of the grammar or
    an error entry. If, for example, MX, a X ?
    UVW, the parser replaces X on top of the stack
    by WVU (with U on top). As output, we shall
    assume that the parser just prints the production
    used any other code could be executed here. If
    MX, a error, the parser calls an error
    recovery routine.

31
Fig. 4.15. Parsing table M for grammar (4.11).
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL Id ( )
E E' T T' F E ? TE' T ? FT' F ? id E' ? TE' T' ? ? T' ? FT' E ? TE' T ? FT' F ? (E) E' ? ? T' ? ? E' ? ? T' ? ?
32
Fig. 4.16. Moves made by predictive parser on
input id id id.
STACK INPUT OUTPUT
E E' T E' T' F E' T' id E' T' E' E' T E' T E' T' F E' T' id E' T' E' T' F E' T' F E' T' id E' T' E' id id id id id id id id id id id id id id id id id id id id id id id id id id id id E ? T E' T ? F T' F ? id T' ? ? E' ? T E' T ? F T' F ? id T' ? F T' F ? id T' ? ? E' ? ?
33
Fig. 4.17. Parsing table M for grammar (4.13).
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL a b e i t
S S ? a S ? iEtSS'
S' S' ? ? S' ? eS S' ? ?
E E ? b
S E ? ? iEtS iEtSeS a b
(Grammar 4.13 )
34
Fig. 4.18. Synchronizing tokens added to parsing
table of Fig. 4.15.
NONTER-MINAL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL INPUT SYMBOL
NONTER-MINAL id ( )
E E' T T' F E ? TE' T ? FT' F ? id E' ? TE' synch T' ? ? synch T' ? FT' synch E ? TE' T ? FT' F ? (E) synch E' ? ? synch T' ? ? synch synch E' ? ? synch T' ? ? synch
35
Fig. 4.19. Parsing and error recovery moves made
by predictive parser.
STACK INPUT OUTPUT
E E E' T E' T' F E' T' id E' T' E' T' F E' T' F E' T' E' E' T E' T E' T' F E' T' id E' T' E' ) id id id id id id id id id id id id id id id id id id id error, skip ) id is in FIRST(E) error, MF, synch F has been popped
Write a Comment
User Comments (0)
About PowerShow.com