Programming Language Concepts CIS 635 - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Programming Language Concepts CIS 635

Description:

Lexer: generating tokens from string or character stream ... Lexer called from parser. Elsa L. Gunter. Recursive Decent Parsing ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 68
Provided by: me6105
Category:

less

Transcript and Presenter's Notes

Title: Programming Language Concepts CIS 635


1
Programming Language Concepts (CIS 635)
  • Elsa L Gunter
  • 4303 GITC
  • NJIT, http//www.cs.njit.edu/elsa/635-spring2004

2
Attribute Grammars
  • Attribute grammars add to BNF grammars an
    additional field with a function describing the
    meaning (attributes) of the construct described
    by the BNF rule
  • Attributes can be used to describe an interpreter
    or even a simple compiler
  • Usually used to describe abstract syntax trees to
    be generated

3
Example Atrribute Grammar
  • ltSumgt 0
  • value(ltSumgt) 0
  • ltSum gt 1
  • value(ltSumgt) 1
  • ltSumgt ltSumgt ltSumgt
  • value(ltSum1gt) value(ltSum2gt) value(ltSum3gt)
  • ltSumgt (ltSumgt)
  • value (ltSum1gt) value(ltSum2gt)

4
Attribute Grammars
  • An inherited attribute describes the meaning of
    the nonterminals on the right in the rule in
    terms of the nonterminal on the left
  • A synthesized attribute describes the meaning of
    the left-hand nonterminal in terms of the
    nonterminals on the right

5
YACC
  • The input to YACC (a parser generator for C) is
    basically an attribute grammar with only
    synthesized attributes (inherited attributes can
    be handled using global variables (typically
    tables))
  • ML-YACC is a version of YACC that produces SML
    code instead of C code

6
Parsing Programs
  • Parsing is the process of tracing or constructing
    a parse tree for a given input string
  • Process usually broken into two phases
  • Lexer generating tokens from string or character
    stream
  • Parser generating parse tree from token list or
    stream
  • Lexer called from parser

7
Recursive Decent Parsing
  • Recursive decent parsers are a class of parsers
    derived fairly directly from BNF grammar
  • A recursive descent parser traces out a parse
    tree in top-down order it is a top-down parser

8
Recursive Decent Parsing
  • Each nonterminal in the grammar has a subprogram
    associated with it the subprogram parses all
    phrases that the nonterminal can generate
  • Each nonterminal in right-hand side of a rule
    corresponds to a recursive call to the
    associated subprogram

9
Recursive Decent Parsing
  • Each subprogram must be able to decide how to
    begin parsing by looking at the left-most
    character in the string to be parsed
  • May do so directly, or indirectly by calling
    another parsing subprogram
  • Recursive descent parsers, like other top-down
    parsers, cannot be built from left-recursive
    grammars

10
Sample Grammar
  • ltexprgt lttermgt lttermgt ltexprgt
  • lttermgt - ltexprgt
  • lttermgt ltfactorgt ltfactorgt lttermgt
  • ltfactorgt / lttermgt
  • ltfactorgt ltidgt ( ltexprgt )

11
Tokens as SML Datatypes
  • - / ( ) ltidgt
  • Becomes an SML datatype
  • datatype token
  • Id_token of string
  • Left_parenthesis Right_parenthesis
  • Times_token Divide_token
  • Plus_token Minus_token

12
Parse Trees as Datatypes
  • ltexprgt lttermgt lttermgt ltexprgt
  • lttermgt - ltexprgt
  • datatype Expr
  • Term_as_Expr of Term
  • Plus_Expr of (Term Expr)
  • Minus_Expr of (Term Expr)

13
Parse Trees as Datatypes
  • lttermgt ltfactorgt ltfactorgt lttermgt
  • ltfactorgt / lttermgt
  • and Term
  • Factor_as_Term of Factor
  • Mult_Term of (Factor Term)
  • Div_Term of (Factor Term)

14
Parse Trees as Datatypes
  • ltfactorgt ltidgt ( ltexprgt )
  • and Factor
  • Id_as_Factor of string
  • Parenthesized_Expr_as_Factor of Expr

15
Parsing Lists of Tokens
  • Will create three mutually recursive functions
  • expr token list -gt (Expr token list)
  • term token list -gt (Term token list)
  • factor token list -gt (Factor token list)
  • Each parses what it can and gives back parse and
    remaining tokens

16
Parsing an Expression
  • ltexprgt lttermgt ( - ) ltexprgt
  • fun expr tokens
  • (case term tokens
  • of ( term_parse , tokens_after_term) gt
  • (case tokens_after_term
  • of ( Plus_token tokens_after_plus)
    gt

17
Parsing an Expression
  • ltexprgt lttermgt ( - ) ltexprgt
  • fun expr tokens
  • (case term tokens
  • of ( term_parse , tokens_after_term) gt
  • (case tokens_after_term
  • of ( Plus_token tokens_after_plus)
    gt

18
Parsing a Plus Expression
  • ltexprgt lttermgt ( - ) ltexprgt
  • fun expr tokens
  • (case term tokens
  • of ( term_parse , tokens_after_term) gt
  • (case tokens_after_term
  • of ( Plus_token tokens_after_plus)
    gt

19
Parsing a Plus Expression
  • ltexprgt lttermgt ltexprgt
  • (case expr tokens_after_plus
  • of ( expr_parse , tokens_after_expr) gt
  • ( Plus_Expr ( term_parse , expr_parse ),
  • tokens_after_expr))

20
Parsing a Plus Expression
  • ltexprgt lttermgt ltexprgt
  • (case expr tokens_after_plus
  • of ( expr_parse , tokens_after_expr) gt
  • ( Plus_Expr ( term_parse , expr_parse ),
  • tokens_after_expr))

21
Building Plus Expression Parse Tree
  • ltexprgt lttermgt ltexprgt
  • (case expr tokens_after_plus
  • of ( expr_parse , tokens_after_expr) gt
  • ( Plus_Expr ( term_parse , expr_parse ),
  • tokens_after_expr))

22
Parsing a Minus Expression
  • ltexprgt lttermgt - ltexprgt
  • ( Minus_token tokens_after_minus) gt
  • (case expr tokens_after_minus
  • of ( expr_parse , tokens_after_expr) gt
  • ( Minus_Expr ( term_parse , expr_parse ),
  • tokens_after_expr))

23
Parsing a Minus Expression
  • ltexprgt lttermgt - ltexprgt
  • ( Minus_token tokens_after_minus) gt
  • (case expr tokens_after_minus
  • of ( expr_parse , tokens_after_expr) gt
  • ( Minus_Expr ( term_parse , expr_parse ),
  • tokens_after_expr))

24
Parsing an Expression as a Term
  • ltexprgt lttermgt
  • _ gt (Term_as_Expr term_parse ,
    tokens_after_term)))
  • Code for term is same except for replacing
    addition with multiplication and subtraction with
    division

25
Parsing Factor as Id
  • ltfactorgt ltidgt
  • and factor (Id_token id_name tokens)
  • ( Id_as_Factor id_name, tokens)

26
Parsing Factor as Parenthesized Expression
  • ltfactorgt ( ltexprgt )
  • factor ( Left_parenthesis tokens)
  • (case expr tokens
  • of ( expr_parse , tokens_after_expr) gt

27
Parsing Factor as Parenthesized Expression
  • ltfactorgt ( ltexprgt )
  • (case tokens_after_expr
  • of Right_parenthesis tokens_after_rparen gt
  • ( Parenthesized_Expr_as_Factor expr_parse ,
    tokens_after_rparen)))

28
( a b ) c - d
  • expr Left_parenthesis, Id_token "a", Plus_token,
    Id_token "b",Right_parenthesis, Times_token,
    Id_token "c", Minus_token, Id_token "d"

29
( a b ) c - d
  • val it (Minus_Expr (Mult_Term
    (Parenthesized_Expr_as_Factor
    (Plus_Expr (Factor_as_Term
    (Id_as_Factor "a"), Term_as_Expr
    (Factor_as_Term (Id_as_Factor "b")))),
    Factor_as_Term (Id_as_Factor "c")),
    Term_as_Expr (Factor_as_Term (Id_as_Factor
    "d"))),) Expr token list

30
( a b ) c d
  • ltexprgt
  • lttermgt - ltexprgt
  • ltfactorgt lttermgt lttermgt
  • ( ltexprgt ) ltfactorgt
    ltfactorgt
  • lttermgt ltexprgt ltidgt ltidgt
  • ltfactorgt lttermgt c d
  • ltidgt ltfactorgt
  • a ltidgt
  • b

31
a b c d
  • expr Id_token "a", Plus_token, Id_token "b",
    Times_token, Id_token "c", Minus_token,
    Id_token "d"
  • val it (Plus_Expr (Factor_as_Term
    (Id_as_Factor "a"), Minus_Expr
    (Mult_Term (Id_as_Factor "b",
    Factor_as_Term (Id_as_Factor "c")),
    Term_as_Expr (Factor_as_Term (Id_as_Factor
    "d")))),) Expr token list

32
a b c d
  • ltexprgt
  • lttermgt ltexprgt
  • ltfactorgt lt termgt - ltexprgt
  • ltidgt ltfactorgt lttermgt lttermgt
  • a ltidgt ltfactorgt
    ltfactorgt
  • b ltidgt
    ltidgt
  • c
    d

33
( a b c - d
  • expr Left_parenthesis, Id_token "a", Plus_token,
    Id_token "b", Times_token, Id_token "c",
    Minus_token, Id_token "d"uncaught
    exception nonexhaustive match failure raised
    at arith_exp.sml94.12
  • Cant parse because it was expecting a right
    parenthesis but it got to the end without
    finding one

34
a b ) c - d )
  • expr Id_token "a", Plus_token, Id_token "b",
    Right_parenthesis, Times_token, Id_token "c",
    Minus_token, Id_token "d"
  • val it (Plus_Expr (Factor_as_Term
    (Id_as_Factor "a"), Term_as_Expr
    (Factor_as_Term (Id_as_Factor "b"))),
    Right_parenthesis,Times_token,Id_token
    "c",Minus_token,Id_token "d") Expr token
    list

35
Error Cases?
  • What if factor doesnt find an id token or a left
    parenthesis when it starts?
  • What if it doesnt find a right parenthesis after
    the expression?

36
Streams in Place of Lists
  • More realistically, we don't want to create the
    entire list of tokens before we can start parsing
  • We want to generate one token at a time and use
    it to make one step in parsing
  • Will use
  • (token option (unit -gt token option)
  • in place of token list

37
Parsing an Expression
  • ltexprgt lttermgt ( - ) ltexprgt
  • fun expr tokens
  • (case term tokens
  • of ( SOME term_parse ,
  • tokens_after_term) gt
  • (case tokens_after_term
  • of ( SOME Plus_token,
  • tokens_after_plus) gt

38
Parsing a Plus Expression
  • ltexprgt lttermgt ltexprgt
  • fun expr tokens
  • (case term tokens
  • of ( SOME term_parse ,
  • tokens_after_term) gt
  • (case tokens_after_term
  • of ( SOME Plus_token ,
  • tokens_after_plus) gt

39
Parsing a Plus Expression
  • ltexprgt lttermgt ltexprgt
  • (case expr (tokens_after_plus(),
    tokens_after_plus)
  • of ( SOME expr_parse,
  • tokens_after_expr) gt
  • ( SOME ( Plus_Expr (term_parse,
  • expr_parse)),
  • tokens_after_expr)

40
Parsing a Plus Expression
  • ltexprgt lttermgt ltexprgt
  • (case expr (tokens_after_plus(),
    tokens_after_plus)
  • of ( SOME expr_parse,
  • tokens_after_expr) gt
  • ( SOME ( Plus_Expr (term_parse,
  • expr_parse)),
  • tokens_after_expr)

41
Building Plus Expression Parse Tree
  • ltexprgt lttermgt ltexprgt
  • (case expr (tokens_after_plus(),
    tokens_after_plus)
  • of ( SOME expr_parse,
  • tokens_after_expr) gt
  • ( SOME ( Plus_Expr ( term_parse,

  • expr_parse)),
  • tokens_after_expr)

42
What If No Expression After Plus
  • ltexprgt lttermgt ltexprgt
  • ( NONE ,rem_tokens) gt
  • ( NONE , rem_tokens))
  • Code for Minus_token is almost identical

43
What If No Plus or Minus
  • ltexprgt lttermgt
  • _ gt ( SOME (Term_as_Expr term_parse) ,
  • tokens_after_term))

44
What if No Term
  • exprgt lttermgt ( - ) ltexprgt
  • ( NONE , rem_tokens) gt
  • ( NONE , rem_tokens))
  • Code for term is same as for expr except for
    replacing addition with multiplication and
    subtraction with division

45
Parsing Factor as Id
  • ltfactorgt ltidgt
  • and factor (SOME (Id_token id_name) ,
  • tokens)
  • (SOME (Id_as_Factor id_name),
  • (tokens(), tokens))

46
Parsing Factor as Parenthesized Expression
  • ltfactorgt ( ltexprgt )
  • factor (SOME Left_parenthesis ,
  • tokens)
  • (case expr (tokens(), tokens)
  • of (SOME expr_parse,
  • tokens_after_expr) gt

47
Parsing Factor as Parenthesized Expression
  • ltfactorgt ( ltexprgt )
  • (case tokens_after_expr
  • of ( SOME Right_parenthesis ,
  • tokens_after_rparen ) gt
  • (SOME (Parenthesized_Expr_as_Factor
  • expr_parse),
    (tokens_after_rparen(),tokens_after_rparen))

48
What if No Right Parenthesis
  • ltfactorgt ( ltexprgt )
  • _ gt (NONE, tokens_after_expr))

49
What If No Expression After Left Parenthesis
  • ltfactorgt ( ltexprgt )
  • ( NONE , rem_tokens) gt
  • ( NONE , rem_tokens))

50
What If No Id or Left Parenthesis
  • ltfactorgt ltidgt ( ltexprgt )
  • factor tokens (NONE, tokens)

51
Parsing Factor as Id
  • ltfactorgt ltidgt
  • and factor (SOME (Id_token id_name) ,
  • tokens)
  • ( true , (tokens(), tokens))

52
Parsing - in C
  • Assume global variable currentToken that holds
    the latest token removed from token stream
  • Assume subroutine lex( ) to analyze the character
    stream, find the next token at the head of that
    stream and update currentToken with that token
  • Assume subroutine error( ) to raise an exception

53
Parsing expr in C
  • ltexprgt lttermgt ( - ) ltexprgt
  • void expr ( )
  • term ( )
  • if (nextToken PLUS_CODE)
  • lex ( )
  • expr ( )
  • else if (nextToken MINUS_CODE)
  • lex ( )
  • expr ( )

54
SML Code
  • fun expr tokens
  • (case term tokens
  • of ( true , tokens_after_term) gt
  • (case tokens_after_term
  • of (SOME Plus_token,tokens_after_plus) gt
  • (case expr (tokens_after_plus(),
    tokens_after_plus)
  • of ( true , tokens_after_expr) gt
  • ( true , tokens_after_expr)

55
Parsing expr in C (optimized)
  • ltexprgt lttermgt ( - ) ltexprgt
  • void expr ( )
  • term( )
  • while (nextToken PLUS_CODE
  • nextToken MINUS_CODE)
  • lex ( )
  • term ( )

56
Parsing factor in C
  • ltfactorgt ltidgt
  • void factor ( )
  • if (nextToken ID_CODE)
  • lex ( )

57
Parsing factor in C
  • ltfactorgt ( ltexprgt )
  • else if (nextToken
  • LEFT_PAREN_CODE)
  • lex ( )
  • expr ( )
  • if (nextToken
  • RIGHT_PAREN_CODE)
  • lex

58
Comparable SML Code
  • factor (SOME Left_parenthesis , tokens)
  • (case expr (tokens(), tokens)
  • of ( true , tokens_after_expr) gt
  • (case tokens_after_expr
  • of ( SOME Right_parenthesis ,
  • tokens_after_rparen ) gt
  • ( true , (tokens_after_rparen(),
  • tokens_after_rparen))

59
Parsing factor in C
  • else
  • error ( )
  • / Right parenthesis missing /
  • else
  • error ( )
  • / Neither ltidgt nor ( was found at start /

60
Error cases in SML
  • ( No right parenthesis )
  • _ gt ( false , tokens_after_expr))
  • ( No expression found )
  • ( false , rem_tokens) gt
  • ( false , rem_tokens))
  • ( Neither ltidgt nor left parenthesis found )
  • factor tokens ( false , tokens)

61
Lexers Simple Parsers
  • Lexers are parsers driven by regular grammars
  • Use character codes and arithmetic comparisons
    rather than case analysis to determine syntactic
    category for each character
  • Often some semantic action must be taken
  • Compute a number or build a string and record it
    in a symbol table

62
Example
  • ltposgt ltdigitgt ltposgt ltdigitgt
  • ltdigitgt 0 1 2 3 4 5 6 7 8 9
  • fun digit c
  • (case Char.ord c
  • of n gt if n gt 0 andalso n lt 9
  • then SOME n
  • else NONE)

63
Example
  • fun pos (ccs)
  • (case digit c
  • of SOME m gt
  • (case pos cs
  • of SOME(p, n) gt mpn
  • NONE gt SOME(10,m)
  • NONE gt NONE)

64
Problems for Recursive-Descent Parsing
  • Left Recursion
  • A Aw
  • translates to a subroutine that loops forever
  • Indirect Left Recursion
  • A Bw
  • B Av
  • causes the same problem

65
Problems for Recursive-Descent Parsing
  • Parser must always be able to choose the next
    action based only only the next very next token
  • Pairwise disjointedness Test Can we always
    determine which rule (in the non-extended BNF) to
    choose based on just the first token

66
Pairwise Disjointedness Test
  • For each rule
  • A y
  • Calculate
  • FIRST (y) a y gt aw ? ? if y gt ?
  • For each pair of rules A y and A z,
    require FIRST(y) ? FIRST(z)
  • Test too strong Cant handle
  • ltexprgt lttermgt ( - ) ltexprgt

67
Example
  • Grammar
  • ltSgt ltAgt a ltBgt b
  • ltAgt ltAgt b b
  • ltBgt a ltBgt a
  • FIRST (ltAgt b) b
  • FIRST (b) b
  • Rules for ltAgt not pairwise disjoint
Write a Comment
User Comments (0)
About PowerShow.com