CS 363 Comparative Programming Languages - PowerPoint PPT Presentation

About This Presentation
Title:

CS 363 Comparative Programming Languages

Description:

The General Problem of Describing Syntax. Formal Methods of Describing Syntax ... A metalanguage is a language used to describe another language. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 81
Provided by: tjh5
Learn more at: https://www.tjhsst.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 363 Comparative Programming Languages


1
CS 363 Comparative Programming Languages
  • Lecture 3 Syntax Notation

2
Topics
  • The General Problem of Describing Syntax
  • Formal Methods of Describing Syntax
  • Context Free Grammars, BNF
  • Parse Trees

3
Introduction
  • Who must use language definition
  • Language designers
  • Implementors
  • Programmers (the users of the language)
  • Syntax - the form or structure of the
    expressions, statements, and program units
  • Semantics - the meaning of the expressions,
    statements, and program units

Our focus today
4
What is a language?
  • Alphabet (S) finite set of basic syntatic
    elements (characters, tokens)
  • The S of C includes while, for, ,
    identifiers, integers,
  • Sentence finite sequence of elements in S can
    be l, the empty string (Some texts use e as the
    empty string)
  • A legal C program is a single sentence in that
    language
  • Language possibly infinite set of sentences
    over some alphabet can be , the empty
    language.
  • Set of all legal C programs defines the
    language

5
Suppose S a,b,c. Some languages over S could
be
  • aa,ab,ac,bb,bc,cc
  • ab,abc,abcc,abccc,. . .
  • l , where l (e) is the empty string (length
    0)
  • a,b,c,l

6
Recognizing Languages
  • Typically the task of a compiler
  • Find tokens (S) from the input
  • See if tokens in appropriate order
  • Determine what that token ordering means
  • All of this must be formally specified

7
A Typical Compiler Architecture
Syntactic/semantic structure
tokens
Syntactic structure
Scanner (lexical analysis)
Parser (syntax analysis)
Semantic Analysis (IC generator)
Code Generator
Source language
Code Optimizer
Symbol Table
8
Token
  • lexeme indivisible string in an input language
  • ex while, (, main,
  • token (possibly infinite) set of lexemes
    defining an atomic element with a defined meaning
  • while_token while
  • identifier_token main, x,
  • Tokens are often describable using a pattern.
  • The language of tokens is regular.

9
Lexical Analysis
  • Break input string of characters into tokens.
  • while (a lt limit) aa 1
  • while (a lt limit) aa 1
  • Remove white space, comments

10
Describing Language Syntax
  • Enumeration what are all the possible legal
    token orderings
  • Formal approaches to describing syntax
  • Recognizers - used in compilers Is the given
    sentence in the language?
  • Generators generate the sentences of a language

11
Metalanguages for Describing Syntax
  • A metalanguage is a language used to describe
    another language.
  • Abstractions are used to represent classes of
    syntactic structures--they act like syntactic
    variables (also called nonterminal symbols)
  • Define a class of languages called context-free
    languages
  • Context-Free Grammars (Noam Chomsky in mid
    1950s)
  • Backus-Naur Form or BNF (1959 invented by John
    Backus to describe Algol 58)

12
Backus-Naur Form (BNF)
  • ltwhile_stmtgt ? while ( ltlogic_exprgt ) ltstmtgt
  • This is a rule describing the structure of a
    while statement
  • Non-terminals are placeholders for other rules
    ltwhile_stmtgt, ltlogic_exprgt, ltstmtgt
  • Tokens (terminal symbols) are part of the
    language alpahbet

13
BNF Examples
  • Vt ,-,0..9, Vn ltLgt,ltDgt, s ltLgt
  • ltLgt ? ltLgt ltDgt ltLgt ltDgt ltDgt
  • ltDgt ? 0 9
  • Vt(,), Vn ltLgt, s ltLgt
  • ltLgt ? ( ltLgt ) ltLgt
  • ltLgt ? l

recursion
14
BNF Examples
  • Vt a,b,c,d,,,,-,const, Vn ltprogramgt,
    ltstmtsgt, ltstmtgt, ltvargt, ltexprgt, lttermgt
  • ltprogramgt ? ltstmtsgt
  • ltstmtsgt ? ltstmtgt ltstmtgt ltstmtsgt
  • ltstmtgt ? ltvargt ltexprgt
  • ltvargt ? a b c d
  • ltexprgt ? lttermgt lttermgt lttermgt - lttermgt
  • lttermgt ? ltvargt const

15
Applying BNF rules
  • Definition Given a string a A b and a production
    A ? g, we can replace A with g
  • a A b ? a g b is a single step derivation.
  • (a, b, and g are strings of zero or more
    terminals/non-terminals)
  • Examples
  • ltLgt ltDgt ? ltLgt ltDgt ltDgt using ltLgt ? ltLgt -
    ltDgt
  • ( ltLgt ) ( ltLgt ) ? ( ( ltLgt ) ltLgt ) ( ltLgt )
    using ltLgt ? ( ltLgt ) ltLgt

16
Derivations
  • Definition A sequence of rule applications
  • w0 ? w1 ? ? wn
  • is a derivation of wn from w0 (w0 ? wn)
  • ltLgt production ltLgt ? ( ltLgt ) ltLgt
  • ?( ltLgt ) ltLgt production ltLgt ? l
  • ( ) ltLgt production ltLgt ? l
  • ? ( )
  • ltLgt ? ()
  • If wi has non-terminal symbols, it is referred to
    as sentential form.

17
Derivation
  • A sentence is a sentential form that has only
    terminal symbols
  • A leftmost derivation is one in which the
    leftmost nonterminal in each sentential form is
    the one that is expanded
  • A derivation may be neither leftmost nor rightmost

18
Derivation of (())()
ltLgt production ltLgt ? ( ltLgt )ltLgt
?(ltLgt) ltLgt production ltLgt ? ( ltLgt )ltLgt
?(ltLgt) (ltLgt)ltLgt production ltLgt ? l
?(ltLgt) (ltLgt) production ltLgt ? ( ltLgt )ltLgt
?((ltLgt)ltLgt)(ltLgt) production ltLgt ? l
?(( ) ltLgt) (ltLgt) production ltLgt ? l
?( ( )ltLgt) ( ) production ltLgt ? l
?( ( ) ) ( )
Grammar ltLgt ? (ltLgt)ltLgt ltLgt ? l
lt Lgt ? (( )) ( )
19
Same String, Leftmost Derivation
ltLgt production ltLgt ? ( ltLgt )ltLgt
?(ltLgt) ltLgt production ltLgt ? (ltLgt)ltLgt
?((ltLgt)ltLgt) ltLgt production ltLgt ? l
?(() ltLgt)ltLgt production ltLgt ? l
?(())ltLgt production ltLgt ? (ltLgt)ltLgt
?(( ))(ltLgt) ltLgt production ltLgt ? l
?(( )) () ltLgt production ltLgt ? l
?(()) ()
Grammar ltLgt ? (ltLgt)ltLgt ltLgt ? l
ltLgt ? ? (( )) ( )
20
Same String, Rightmost Derivation
ltLgt production ltLgt ? ( ltLgt )ltLgt
?(ltLgt) ltLgt production ltLgt ? (ltLgt)ltLgt
?(ltLgt) (ltLgt) ltLgt production ltLgt ? l
?(ltLgt) ( ltLgt) production ltLgt ? l
?(ltLgt)( ) production ltLgt ? (ltLgt)ltLgt
?((ltLgt) ltLgt)( ) production ltLgt ? l
?((ltLgt)) ( ) production ltLgt ? l
?(()) ()
Grammar ltLgt ? (ltLgt)ltLgt ltLgt ? l
ltLgt ? ? (( )) ( )
21
  • L(G), the language generated by grammar G is w
    in Vt s ? w for start symbol s
  • Both () and (())() are in L(G) for the previous
    grammar.

22
Parse Trees
  • The parse tree for some string in some language
    is defined by the grammar G as follows
  • The root is the start symbol of G
  • The leaves are terminals or l. When visited from
    left to right, the leaves form the input string
  • The interior nodes are non-terminals of G
  • For every non-terminal A in the tree with
    children B1 Bk, there is some production A ? B1
    Bk
  • If a string is in the given language, a parse
    tree must exist.

23
Parse Tree for (())()
L
ltLgt
(ltLgt) ltLgt
? (ltLgt) (ltLgt)ltLgt
? (ltLgt) (ltLgt)
? ((ltLgt)ltLgt)(ltLgt)
? (( ) ltLgt) (ltLgt)
? ( ( )ltLgt) ( )
? ( ( ) ) ( )
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
24
Ambiguity
  • A grammar is ambiguous if there at least two
    parse trees (or leftmost derivations ) for some
    string in the language
  • E ? E E
  • E ? E E
  • E ? 0 9

E
E
E E
E E
4
2
E E
E E
3
4
2
3
2 3 4
25
An UnambiguousExpression Grammar
  • Grammars can be written that enforce precedence
  • ltexprgt ? ltexprgt lttermgt lttermgt
  • lttermgt ? lttermgt ltcgt ltcgt
  • ltCgt ? 0 1 9

ltexprgt
ltexprgt
lttermgt

ltcgt
lttermgt
lttermgt

2 3 4
4
ltcgt
ltcgt
3
2
26
Formal Methods of Describing Syntax
  • Operator associativity can also be indicated by a
    grammar
  • ltexprgt -gt ltexprgt ltexprgt const (ambiguous)
  • ltexprgt -gt ltexprgt const const (unambiguous)

ltexprgt
ltexprgt
ltexprgt
const

ltexprgt
const

const
27
EBNF
  • Extended BNF
  • Shorthand for BNF
  • Optional parts are placed in brackets ( )
  • ltproc_callgt -gt ident ( ltexpr_listgt)
  • Put alternative parts of RHSs in parentheses and
    separate them with vertical bars
  • lttermgt -gt lttermgt ( -) const
  • Put repetitions (0 or more) in braces ( )
  • ltidentgt -gt letter letter digit

28
BNF and EBNF
  • BNF
  • ltexprgt ? ltexprgt lttermgt
  • ltexprgt - lttermgt
  • lttermgt
  • lttermgt ? lttermgt ltfactorgt
  • lttermgt / ltfactorgt
  • ltfactorgt
  • EBNF
  • ltexprgt ? lttermgt ( -) lttermgt
  • lttermgt ? ltfactorgt ( /) ltfactorgt

29
Lexical and Syntax Analysis
  • If a string is in a language, a parse tree can be
    derived for that string
  • Problem We need to go from a string of
    characters (input file) to a legal parse tree to
    show that a string is in the language.
  • From introduction compilers, interpreters,
    hybrid approaches
  • Our Focus Top-Down Parsing

30
Parsing
  • Take sequence of tokens and produce a parse tree
  • Two general algorithms (methods) top-down,
    bottom-up
  • Algorithms derived from the cfg
  • Note We cant always derive an algorithm from a
    cfg

31
Top Down
Start symbol
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
String (())()
32
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
String (())()
33
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
String (())()
34
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
l
String (())()
35
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
l
l
String (())()
36
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
L
L
(
)
l
l
String (())()
37
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
String (())()
38
Top Down
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String (())()
39
Writing a recursive descent parser
  • Procedure for each non-terminal.
  • Use next token (lookahead) to choose which
    production for that nonterminal to mimic
  • for non-terminal X, call procedure X()
  • for terminals X, call match(X)
  • match(symbol)
  • if (symbol lookahead)
  • lookahead next_token()
  • else error()
  • Function next_token() gets the next token from
    the lexical analyzer must be called before the
    first call to get first lookahead.

40
Simplified RDP Example
  • L ? ( L ) L l
  • L()
  • if (lookahead ()
    / L ? ( L ) L /
  • match(() L() match()) L()
  • else return
    / L ? l /
  • main()
  • lookahead next_token()
  • L()

41
Tracing the Recursive Descent Parse
call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
42
Tracing the Recursive Descent Parse
call L() call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
43
Tracing the Recursive Descent Parse
call L() call L() call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
44
Tracing the Recursive Descent Parse
call L() call L() call L() - return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
45
Tracing the Recursive Descent Parse
call L() call L() call L() - return
call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
46
Tracing the Recursive Descent Parse
call L() call L() call L() - return
call L() - return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
47
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
48
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
49
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return call L()
call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
50
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return call L()
call L() - return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
51
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return call L()
call L() return call L()
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
52
Tracing the Recursive Descent Parse
call L() call L() - return call L() -
return call L() return call L()
call L() return call L() return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
53
Tracing the Recursive Descent Parse
call L() - return call L() - return call
L() - return call L() return call
L() - return call L() return call
L() return
L
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String ( ( ) ) ( )
lookahead
54
Simplified RDP Example
  • L ? ( L ) L l
  • L()
  • if (lookahead ()
    / L ? ( L ) L /
  • match(() L() match()) L()
  • else return
    / L ? l /
  • main()
  • lookahead next_token()
  • L()

The body of the function for a given non-terminal
mimics the productions.
55
Another Grammar
  • A ? a B
  • A ? b
  • A ? c B B
  • B ? a B
  • B ? b A
  • A()
  • if (lookahead a)
  • lookahead next_token() B()
  • else if (lookahead b)
  • lookahead next_token()
  • else if (lookahead c)
  • lookahead next_token() B() B()
  • else error()
  • B()
  • if (lookahead a)
  • lookahead next_token() B()
  • else if (lookahead b)
  • lookahead next_token() A()
  • else error()

Key Finding the set of symbols (lookahead) that
indicate which production to use!
56
How do we find the lookaheads?
  • Can compute lookahead sets for some grammars from
    FIRST() sets
  • lookhead(A ? a) FIRST(a)
  • For this to work for a given grammar, the
    lookahead sets for a given non-terminal will be
    disjoint.

57
FIRST Sets
  • FIRST(a) is the set of all terminal symbols that
    can begin some sentential form that starts with a
  • FIRST(a) a in Vt a ? ab
  • U l if a ? l
  • Example
  • ltstmtgt ? simple begin ltstmtsgt end
  • FIRST(ltstmtgt) simple, begin

Remember, a is a string of zero or more
terminals and nonterminals
58
Computing FIRST sets
  • Initially FIRST(A) is empty
  • For productions A ? a b, where a in Vt
  • Add a to FIRST(A)
  • For productions A ? l
  • Add l to FIRST(A)
  • For productions A ? a B b, where a ? l and NOT
    (B ? l)
  • Add FIRST(aB) to FIRST(A)
  • For productions A ? a, where a ? l
  • Add FIRST(a) and l to FIRST(A)

59
  • To compute FIRST across strings of terminals and
    non-terminals
  • FIRST(l) l
  • A if A is a terminal
  • FIRST(Aa) FIRST(A) U FIRST(a)
  • if A ? l
  • FIRST(A) otherwise


60
Example 1
  • S ? a S e
  • S ? B
  • B ? b B e
  • B ? C
  • C ? c C e
  • C ? d
  • FIRST(C)
  • FIRST(B)
  • FIRST(S)

61
Example 1
  • S ? a S e
  • S ? B
  • B ? b B e
  • B ? C
  • C ? c C e
  • C ? d
  • FIRST(C) c,d
  • FIRST(B) b,c,d
  • FIRST(S) a,b,c,d

62
Example 2
  • P ? i c n T S
  • Q ? P a S b S c S T
  • R ? b l
  • S ? c R n l
  • T ? R S q
  • FIRST(P)
  • FIRST(Q)
  • FIRST(R)
  • FIRST(S)
  • FIRST(T)

63
Example 2
  • P ? i c n T S
  • Q ? P a S b S c S T
  • R ? b l
  • S ? c R n l
  • T ? R S q
  • FIRST(P) i,c,n
  • FIRST(Q) i,c,n,a,b
  • FIRST(R) b, l
  • FIRST(S) c,b,n, l
  • FIRST(T) b,c,n,q

64
Example 3
  • S ? a S e S T S
  • T ? R S e Q
  • R ? r S r l
  • Q ? S T l
  • FIRST(S)
  • FIRST(R)
  • FIRST(T)
  • FIRST(Q)

65
Example 3
  • S ? a S e S T S
  • T ? R S e Q
  • R ? r S r l
  • Q ? S T l
  • FIRST(S) a
  • FIRST(R) r, l
  • FIRST(T) r,a, l
  • FIRST(Q) a, l

66
Bottom up Parsing (shift/reduce, LR)
  • Less intuitive but more efficient than top down
  • Two actions
  • Shift move some token from the input to the
    parse tree forest
  • Reduce merge 0 or more parser trees with a
    single parent.

67
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
String (())()
68
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
(
String (())()
Shift (
69
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
(
(
String (())()
Shift (
70
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
(
L
(
l
String (())()
Reduce L ? l
71
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
(
L
(
)
l
String (())()
Shift )
72
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
(
L
L
(
)
l
l
String (())()
Reduce L ? l
73
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
L
L
(
)
l
l
String (())()
Reduce L ? ( L ) L
74
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
)
L
L
(
)
l
l
String (())()
Shift )
75
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
)
L
L
(
)
(
l
l
String (())()
Shift (
76
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
)
L
L
(
)
(
L
l
l
l
String (())()
Reduce L ? l
77
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
)
L
L
(
)
(
L
)
l
l
l
String (())()
Shift )
78
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
(
)
L
L
(
)
(
L
)
L
l
l
l
l
String (())()
Reduce L ? l
79
Bottom Up
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String (())()
Reduce L ? ( L ) L
80
Bottom Up
L
ltLgt ? (ltLgt)ltLgt ltLgt ? l
L
L
(
)
L
L
(
)
L
L
(
)
l
l
l
l
String (())()
Reduce L ? ( L ) L
Write a Comment
User Comments (0)
About PowerShow.com