Title: Grammars
1Grammars
2Formal Definitions
- A symbol is a character. It represents an
abstract entity that has no inherent meaning - Examples
- a, A, 3, , - ,
3Formal Definitions
- An alphabet is a finite set of symbols.
- Examples
- A a, b, c
- B 0, 1
4Formal Definitions
- A string (or word) is a finite sequence of
symbols from a given alphabet. - Examples
- S 0, 1 is a alphabet
- 0, 1, 11010, 101, 111 are strings from S
- A a, b, c ,d is an alphabet
- bad, cab, dab, d, aaaaa are strings from A
5Formal Definitions
- A language is a set of strings from an alphabet.
- The set can be finite or infinite.
- Examples
- A 0, 1
- L1 00, 01, 10, 11
- L2 010, 0110, 01110,011110,
6Formal Definitions
- A grammar is a quadruple G (V, S, R, S) where
- 1) V is a finite set of variables
(non-terminals), - 2) S is a finite set of terminals, disjoint from
V, - 3) R is a finite set of rules. The left side
of each rule is a string of one or more elements
from V U S and whose right side is a string of 0
or more elements from V U S - 4) S is an element of V and is called the start
symbol
7Formal Definitions
- Example grammar
- G (V, S, R, S)
- V S, A
- S a, b
- R S ? aA
- A ? bA
- A ? a
-
-
8Derivations
- R S ? aA
- A ? bA
- A ? a
- A derivation is a sequence of replacements ,
beginning with the start symbol, and replacing a
substring matching the left side of a rule with
the string on the right side of a rule - S ? aA
- ? abA
- ? abbA
- ? abba
9Derivations
- What strings can be generated from the following
grammar? - S ? aBa
- B ? aBa
- B ? b
-
10Formal Definitions
- The language generated by a grammar is the set of
all strings of terminal symbols which are
derivable from S in 0 or more steps. - What is the language generated by this grammar?
- S ? a
- S ? aB
- B ? aB
- B ? a
11Kleene Closure
- Let S be a set of strings. S is called the
Kleene closure of S and represents the set of all
concatenations of 0 or more strings in S. - Examples
- S 1 ø, 1, 11, 111, 1111,
- S 01 ø, 01, 0101, 010101,
- S 0 1 set of all possible strings
of 0s and 1s. ( means union)
12Formal Definitions
- A grammar G (V,S, R, S) is right-linear if all
rules are of the form - A ? xB
- A ? x
- where A, B e V and x e S
13Right-linear Grammar
- G V, S, R, S
- V S, B
- S a, b
- R S ? aS ,
- S ? B ,
- B ? bB ,
- B ? e
- What language is generated?
14Formal Definitions
- A grammar G (V,S, R, S) is left-linear if all
rules are of the form - A ? Bx
- A ? x
- where A, B e V and x e S
15Formal Definitions
- A regular grammar is one that is either right or
left linear. - Let Q be a finite set and let S be a finite set
of symbols. Also let d be a function from Q x S
to Q, let q0 be a state in Q and let A be a
subset of Q. We call each element of Q a state, d
the transition function, q0 the initial state and
A the set of accepting states. Then a
deterministic finite automaton (DFA) is a 5-tuple
lt Q , S , q0 , d , A gt - Every regular grammar is equivalent to a DFA
16Language Definition
- Recognition a machine is constructed that reads
a string and pronounces whether the string is in
the language or not. (Compiler) - Generation a device is created to generate
strings that belong to the language. (Grammar)
17Chomsky Hierarchy
- Noam Chomsky (1950s) described 4 classes of
grammars - 1) Type 0 unrestricted grammars
- 2) Type 1 Context sensitive grammars
- 3) Type 2 Context free grammars
- 4) Type 3 Regular grammars
18Grammars
- Context-free and regular grammars have
application in computing - Context-free grammar each rule or production
has a left side consisting of a single
non-terminal
19Backus-Naur form (BNF)
- BNF was used to describe programming language
syntax and is similar to Chomskys context free
grammars - A meta-language is a language used to describe
another language - BNF is a meta-language for computer languages
20BNF
- Consists of nonterminal symbols, terminal symbols
(lexemes and tokens), and rules or productions - ltif-stmtgt ? if ltlogical-exprgt then ltstmtgt
- ltif-stmtgt ? if ltlogical-exprgt then ltstmtgt
else ltstmtgt - ltif-stmtgt ? if ltlogical-exprgt then ltstmtgt
- if ltlogical-exprgt then ltstmtgt
- else ltstmtgt
21A Small Grammar
- ltprogramgt ? begin ltstmt_listgt end
- ltstmt_listgt ? ltstmtgt
- ltstmtgt ltstmt_listgt
- ltstmtgt ? ltvargt ltexpressiongt
- ltvargt ? A B C
- ltexpressiongt ? ltvargt ltvargt
- ltvargt - ltvargt
- ltvargt
22A Derivation
- ltprogramgt ? begin ltstmt_listgt end
- ? begin ltstmtgt end
- ?begin ltvargt ltexpressiongt end
- ?begin A ltexpressiongt end
- ?begin A ltvargt ltvargt end
- ?begin A B ltvargt end
- ?begin A B C end
23Terms
- Each of the strings in a derivation is called a
sentential form. - If the leftmost non-terminal is always the one
selected for replacement, the derivation is a
leftmost derivation. - Derivations can be leftmost, rightmost, or
neither - Derivation order has no effect on the language
generated by the grammar
24Derivations Yield Parse Trees
- ltprogramgt ? begin ltstmt_listgt end
- ? begin ltstmtgt end
- ?begin ltvargt ltexpressiongt end
- ?begin A ltexpressiongt end
- ?begin A ltvargt ltvargt end
- ?begin A B ltvargt end
- ?begin A B C end
ltProgramgt begin
ltstmt_listgt end
ltstmtgt ltvargt
ltexpressiongt A
ltvargt ltvargt
B C
25Parse Trees
- Parse trees describe the hierarchical structure
of the sentences of the language they define. - A grammar that generates a sentence for which
there are two or more distinct parse trees is
ambiguous.
26An Ambiguous Grammar
- ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt ltexprgt
- ltexprgt ltexprgt
- ( ltexprgt )
- ltidgt
27Two Parse Trees Same Sentence
ltassigngt ltidgt
ltexprgt A ltexprgt ltexprgt
ltidgt ltexprgt
ltexprgt B ltidgt
ltidgt
C A
ltassigngt ltidgt
ltexprgt A
ltexprgt ltexprgt ltexprgt
ltexprgt ltidgt ltidgt
ltidgt A B
C
28Derivation 1
- ltassigngt ? ltidgt ltexprgt
- ? A ltexprgt
- ? A ltexprgt ltexprgt
- ? A ltidgt ltexprgt
- ? A B ltexprgt
- ? A B ltexprgt ltexprgt
- ? A B ltidgt ltexprgt
- ? A B C ltexprgt
- ? A B C ltidgt
- ? A B C A
29Derivation 2
- ltassigngt ? ltidgt ltexprgt
- ? A ltexprgt
- ? A ltexprgt ltexprgt
- ? A ltexprgt ltexprgt ltexprgt
- ? A ltidgt ltexprgt ltexprgt
- ? A B ltexprgt ltexprgt
- ? A B ltidgt ltexprgt
- ? A B C ltexprgt
- ? A B C ltidgt
- ? A B C A
30Ambiguity
- Parse trees are used to determine the semantics
of a sentence - Ambiguous grammars lead to semantic ambiguity -
this is intolerable in a computer language - Often, ambiguity in a grammar can be removed
31Unambiguous Grammar
- ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt lttermgt lttermgt
- lttermgt ? lttermgt ltfactorgt ltfactorgt
- ltfactorgt ? ( ltexprgt ) ltidgt
- This grammar makes multiplication take precedence
over addition
32Associativity of Operators
ltassigngt ltidgt
ltexprgt A
ltexprgt lttermgt
ltexprgt lttermgt ltfactorgt
lttermgt ltfactorgt
ltidgt ltfactorgt ltidgt
A ltidgt
C B
- ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt lttermgt lttermgt
- lttermgt ? lttermgt ltfactorgt ltfactorgt
- ltfactorgt ? ( ltexprgt ) ltidgt
- Addition operators associate from left to right
33BNF
- A BNF rule that has its left hand side appearing
at the beginning of its right hand side is left
recursive . - Left recursion specifies left associativity
- Right recursion is usually used for associating
exponetiation operators - ltfactorgt ? ltexpgt ltfactorgt ltexpgt
- ltexpgt ? ( ltexprgt ) ltidgt
-
34Ambiguous If Grammar
- ltstmtgt ? ltif_stmtgt
- ltif_stmtgt ? if ltlogic_exprgt then ltstmtgt
- if ltlogic_exprgt then ltstmtgt
else ltstmtgt - Consider the sentential form
- if ltlogic_exprgt then if ltlogic_exprgt then
ltstmtgt else ltstmtgt
35Parse Trees for an If Statement
ltif_stmtgt If ltlogic_exprgt then ltstmtgt else
ltstmtgt
ltif_stmtgt
if ltlogic_exprgt then ltstmtgt
ltif_stmtgt If ltlogic_exprgt then ltstmtgt
ltif_stmtgt
if ltlogic_exprgt then ltstmtgt else ltstmtgt
36Unambiguous Grammar for If Statements
- ltstmtgt ? ltmatchedgt ltunmatchedgt
- ltmatchedgt ? if ltlogic_exprgt then ltmatchedgt else
ltmatchedgt - any non-if statement
- ltunmatchedgt ? if ltlogic_exprgt then ltstmtgt
- if ltlogic_exprgt then
ltmatchedgt else ltunmatchedgt
37Extended BNF (EBNF)
- Optional part denoted by
- ltselectiongt ? if ( ltexprgt ) ltstmtgt else ltstmtgt
- Braces used to indicate the enclosed part can be
repeated indefinitely or left out - ltident_listgt ? ltidentifiergt , ltidentifiergt
- Multiple choice options are put in parentheses
and separated by the or operator - ltfor_stmtgt ? for ltvargt ltexprgt (to downto)
ltexprgt do ltstmtgt
38BNF vs EBNF for Expressions
- BNF
- ltexprgt ? ltexprgt lttermgt
- ltexprgt - lttermgt
- lttermgt
- lttermgt ? lttermgt ltfactorgt
- lttermgt / ltfactorgt
- ltfactorgt
EBNF ltexprgt ? lttermgt ( - ) lttermgt
lttermgt ? ltfactorgt ( / )
ltfactorgt