Title: CS 2130
1CS 2130
- Lecture 13
- Formal Language Concepts
2Where are we going?
- Compilers
- or
- How does a computer program translate one program
into another?
3Compilers
4Parts of Compilers
Analysis
(0. Preprocessor)
1. Lexical Analysis 2. Syntax Analysis 3.
Semantic Analysis
Front End
Synthesis
4. Code Generation 5. Optimization
Back End
5Sidebar...HTML
- What is an HTML file?
- What does a browser do?
- Not all analysis leads to code
6For today...
- Assume that we have broken a program up into
pieces called tokens - a b c
- How does the compiler determine if this statement
is legal or not? - To answer we need to first answer another
question...
Token
Tokens
7What is a grammar?
- Or to put it another way...
8What grammar a is?
9Speech
- How does natural language work when two persons
communicate? - One person speaks a language using known words
arranged in a certain order - The other person knows the meaning of the words
and the rules of the arrangements and derives
meaning
10A Sentence
- The quick brown fox jumps over the lazy dog.
11A Sentence
- The quick brown fox jumps over the lazy dog.
verb
noun
noun
article
article
adjective
adjective
adjective
preposition
prepositional phrase
predicate
subject
12Diagrammatically
fox
jumps
over
dog
Remember in the 3rd grade when you said, "We'll
never see diagramming sentences again?"
13Natural Language
- Rules of grammar specify legal syntax
- Rules are typically very complex with numerous
exceptions and special cases
14Computer Language
- Still use a grammar to specify syntax
- Grammar is much simpler than natural language
15Grammar
- Sentential forms
- Noam Chomsky MIT 60s and 70s
- Chomsky Type 2 Grammar
- Backus-Naur Form (BNF)
- Algol
16Example
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltnoun-phrasegt ltcmplx-noungt
- ltcmplx-noungtltprep-phrasegt
- ltverb-phrasegt ltcmplx-verbgt
- ltcmplx-verbgtltprep-phrasegt
- ltprep-phrasegt ltprepgtltcmplx-noungt
- ltcmplx-noungt ltarticlegtltnoungt
- ltcmplx-verbgt ltverbgt ltverbgtltnoun-phrasegt
- ltarticlegt a the
- ltnoungt boy girl flower
- ltverbgt touches likes sees
- ltprepgt with
-
17Example
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltnoun-phrasegt ltcmplx-noungt
- ltcmplx-noungtltprep-phrasegt
- ltverb-phrasegt ltcmplx-verbgt
- ltcmplx-verbgtltprep-phrasegt
- ltprep-phrasegt ltprepgtltcmplx-noungt
- ltcmplx-noungt ltarticlegtltnoungt
- ltcmplx-verbgt ltverbgt ltverbgtltnoun-phrasegt
- ltarticlegt a the
- ltnoungt boy girl flower
- ltverbgt touches likes sees
- ltprepgt with
-
Terminal Symbols
18Example
ltNon-terminal Symbolsgt
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltnoun-phrasegt ltcmplx-noungt
- ltcmplx-noungtltprep-phrasegt
- ltverb-phrasegt ltcmplx-verbgt
- ltcmplx-verbgtltprep-phrasegt
- ltprep-phrasegt ltprepgtltcmplx-noungt
- ltcmplx-noungt ltarticlegtltnoungt
- ltcmplx-verbgt ltverbgt ltverbgtltnoun-phrasegt
- ltarticlegt a the
- ltnoungt boy girl flower
- ltverbgt touches likes sees
- ltprepgt with
-
19Example
ltStart Symbolgt
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltnoun-phrasegt ltcmplx-noungt
- ltcmplx-noungtltprep-phrasegt
- ltverb-phrasegt ltcmplx-verbgt
- ltcmplx-verbgtltprep-phrasegt
- ltprep-phrasegt ltprepgtltcmplx-noungt
- ltcmplx-noungt ltarticlegtltnoungt
- ltcmplx-verbgt ltverbgt ltverbgtltnoun-phrasegt
- ltarticlegt a the
- ltnoungt boy girl flower
- ltverbgt touches likes sees
- ltprepgt with
-
20Example
Rules
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltnoun-phrasegt ltcmplx-noungt
- ltcmplx-noungtltprep-phrasegt
- ltverb-phrasegt ltcmplx-verbgt
- ltcmplx-verbgtltprep-phrasegt
- ltprep-phrasegt ltprepgtltcmplx-noungt
- ltcmplx-noungt ltarticlegtltnoungt
- ltcmplx-verbgt ltverbgt ltverbgtltnoun-phrasegt
- ltarticlegt a the
- ltnoungt boy girl flower
- ltverbgt touches likes sees
- ltprepgt with
-
21Note
- ltnoungt boy girl flower
- Equivalent to
- ltnoungt boy
- ltnoungt girl
- ltnoungt flower
22TypicalDerivation
ltsentencegt ltnoun-phrasegtltverb-phrasegt ltnoun
-phrasegt ltcmplx-noungt
ltcmplx-noungtltprep-phrasegt ltverb-phrasegt
ltcmplx-verbgt
ltcmplx-verbgtltprep-phrasegt ltprep-phrasegt
ltprepgtltcmplx-noungt ltcmplx-noungt
ltarticlegtltnoungt ltcmplx-verbgt ltverbgt
ltverbgtltnoun-phrasegt ltarticlegt a
the ltnoungt boy girl flower ltverbgt
touches likes sees ltprepgt
with
- ltsentencegt ltnoun-phrasegtltverb-phrasegt
- ltcmplx-noungtltverb-phrasegt
- ltarticlegtltnoungtltverb-phrasegt
- a ltnoungtltverb-phrasegt
- a boy ltcmplx-verbgt
- a boy verb ltverbgt
- a boy sees
23BNF Grammar
- Terminal Symbols
- Non-Terminal Symbols
- Start Symbol (non-terminal)
- Rules
- Use grammar to parse statements into n-ary tree
- Each statement becomes a tree
- Terminal symbols correspond to leaf nodes
- Non-terminal symbols correspond to internal nodes
- Start symbol corresponds to root node
24Example Expression Grammar
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt ( ltexprgt ) ltnumgt
- ltnumgt 0 1 2 3 4 5 6 7 8 9
lt...gt Non-terminal
symbols ( ) 0 1 2 3 4 5 6 7 8 9 Terminal
symbols Rules
Non-terminal symbol only appearing on right side
or Left-hand side of first rule
Start symbol
25Grammars
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexpr ')' ltnumgt
- ltnumgt 0 1 2 3 4 5 6 7 8 9
- This grammar will parse arithmetic expressions
involving and which use parentheses for
grouping and numbers from 0 to 9 - What if a given "sentence" can't be parsed?
26Parse Tree(Syntax Tree)
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexpr ')' ltnumgt
- ltnumgt 0 1 2 3 4
- 5 6 7 8 9
1 2 3
ltexprgt
ltexprgt
lttermgt
lttermgt
lttermgt
ltfactorgt
ltfactorgt
ltfactorgt
ltnumgt
ltnumgt
ltnumgt
1
2
3
1
2
3
27Note
- All symbols were used
- Tree has all valid leaf nodes that are terminal
symbols - Said to be a "Successful parse"
- Sentence "1 2 3" was syntactically correct
28Parse Tree(Syntax Tree)
1 2 3
1
3
2
29Binary Tree!
1 2 3
1
3
2
30Traversals?
1 2 3
Preorder In order Post order Breadth-first Depth-
first
1
3
2
31Traversals?
1 2 3
Preorder 1 2 3 In order 1 2 3 Post
order 1 2 3
1
3
2
32Traversals?
1 2 3
Preorder 1 2 3 In order 1 2 3 Post
order 1 2 3 Preorder returns
Prefix notation similar to that used in Lisp or
Scheme ( 1 ( 2 3))
1
3
2
33Traversals?
1 2 3
Preorder 1 2 3 In order 1 2 3 Post
order 1 2 3 Preorder can also be used to
reproduce the original tree
1
3
2
34Traversals?
1 2 3
Preorder 1 2 3 In order 1 2 3 Post
order 1 2 3 In order returns
original expression.
1
3
2
35What is the value of
- 1 2 3
- a.) 1 2 3 9
- b.) 1 2 3 7
- Why?
36Traversals?
1 2 3
Preorder 1 2 3 In order 1 2 3 Post
order 1 2 3 Post order returns post-fix
notation otherwise known as Reverse Polish
Notation or RPN
1
3
2
37Note
- Both preorder and postorder traversals (which
generated prefix and postfix notation) which
followed the "normal" rules of precedence did so
because the grammar was set up to do so.
38Historical Note
- Hewlett-Packard Calculators typically use the RPN
or postfix notation system - The following sequence of keystrokes would
calculate as follows - 1 ltentergt
- 2 ltentergt
- 3
-
-
39RPN Stack Calculators
stack
0
0
0
1
(Display)
After pressing 1
40RPN Stack Calculators
stack
0
0
1
1
(Display)
After pressing ltENTERgt
41RPN Stack Calculators
stack
0
0
1
2
(Display)
After pressing 2
42RPN Stack Calculators
stack
0
1
2
2
(Display)
After pressing ltENTERgt
43RPN Stack Calculators
stack
0
1
2
3
(Display)
After pressing 3
44RPN Stack Calculators
stack
0
0
1
6
(Display)
After pressing
45RPN Stack Calculators
stack
0
0
0
7
(Display)
After pressing
46The preceding sequence brought to you by
47Parse Tree
- Introduces
- Syntax
- Semantics
- Order of Operations
- Must be built-in to grammar
48Bad Grammar!
- ltexprgt ltexprgt ltexprgt
- ltexprgt ltexprgt
- '(' ltexprgt ')'
- 0 1 2 3 4
- 5 6 7 8 9
49ltexprgt ltexprgt ltexprgt ltexprgt
ltexprgt '(' ltexprgt ')'
0 1 2 3 4 5 6 7
8 9
Problem?
50ltexprgt ltexprgt ltexprgt ltexprgt
ltexprgt '(' ltexprgt ')'
0 1 2 3 4 5 6 7
8 9
What about?
51ltexprgt ltexprgt ltexprgt ltexprgt
ltexprgt '(' ltexprgt ')'
0 1 2 3 4 5 6 7
8 9
Ambiguous!
52Ambiguous grammars have their place.
53Metasymbols
- means OR (alternation)
- means "defines a"
- lt...gt means a non-terminal
- (...) means grouping
- ' means enclosed metasymbol is terminal
- There are others...
- ...and there is no "standard" set of symbols.
54Metasymbol Examples
- ltnumgt 0123456789
- ltsigned numgt ltnumgt - ltnumgt
- ltsigned numgt (-) ltnumgt
- Using pipe in Unix
- foo '' bar '' baz
55Questions?
56(No Transcript)