Title: Syntax and Semantics
1Chapter 3 Describing Syntax and Semantics
- Syntax and Semantics
- The syntax of a programming language is the part
of the language definition that says how programs
look their form and structure, such as its
expressions, statements, and so on. - The semantics of a programming language is the
part of the language definition that says what
programs do the behavior and meaning of those
expressions, statements. - For example
- The syntax of a C if statement is
- if (ltexprgt) ltstatementgt
- The semantics of this statement form means if
the current value of the expression ltexprgt is
true, the embedded statement ltstatementgt is
selected for execution.
22. A Grammar Example for English 2.1 ltAgt for an
article (a, the) and express our definition ltAgt
? a the 2.2 ltNgt for a noun (dog, cat, or
rat) ltNgt ? dog cat rat 2.3 ltNPgt for a
noun phrase (an article followed by a
noun) ltNPgt ? ltAgt ltNgt 2.4 ltVgt for a verb
(loves, hates, or eats) ltVgt ? loves hates
eats 2.5 ltSgt for a sentence (a noun phrase,
followed by a verb, followed by another noun
phrase ltSgt ? ltNPgt ltVgt ltNPgt
3 2.6 Grammar defining a small subset of
unpunctuated English ltSgt ? ltNPgt ltVgt ltNPgt
ltNPgt ? ltAgt ltNgt ltVgt ? loves hates eats ltNgt
? dog cat rat ltAgt ? a the How does
such a grammar define a language? Think of the
grammar as a set of rules that say how to build
a tree.
4(No Transcript)
53 The Basic Concepts of Describing
Syntax 3.1 Sentences (statements) The
strings of a language. The syntax rules of the
language specify which strings of characters
in the language. 3.2 lexemes The small units
(identifier, literals, operators, and
special words). 3.3 Token A category of the
language lexemes. 3.4 For example
Statement index 2 count
17 Lexemes Tokens index identifier equ
al_sign 2 int_literal mult_op count ide
ntifier plus_op 17 int_literal semico
lon
64 Formal Methods of Describing Syntax 4.1 A
Grammar Example for a Programming Language ltexpgt
? ltexpgtltexpgt ltexpgtltexpgt ( ltexpgt ) a b
c expression a expression a b expression
a b c expression ( ( a b ) c
) 4.2 Parse trees Hierarchical
syntactic structures
ltexpgt
(
ltexpgt
)
Recursive Grammar
ltexpgt
ltexpgt
c
ltexpgt
(
)
ltexpgt
ltexpgt
a
b
7 4.3 Backus-Naur Form and Context-Free
Grammars John Backus and Noam Chomsky (The
middle to late 1950s) proposed a method
(grammar) for describing syntax. Start
symbol ltSgt ? ltNPgt ltVgt ltNPgt production ltNPgt
? ltAgt ltNgt ltVgt ? loves hates eats
Non-terminal ltNgt ? dog cat rat symbols
ltAgt ? a the tokens (Lexemes)
terminal symbols
8The grammar has four important parts The
Non-terminal symbols are strings enclosed in
angle brackets, such as ltNPgt. The Non-terminal
symbols of a grammar often correspond to
different kinds of language constructs The
grammar designates one of the non-terminal
symbols as the root of the parse tree the start
symbol. A production (rule) consists of a
left-hand side (LHS) a single non-terminal
symbol a arrow ? a right-hand side (RHS)
a sequence of one or more things, each of
which can be either a token or a
non-terminal symbol. The special symbol
is used to separate the right-hand sides.
Tokens and Lexemes Terminal symbols (the
smallest units of syntax).
9The example The grammar for a simple language of
expressions with three variables. ltexpgt ?
ltexpgtltexpgt ltexpgtltexpgt ( ltexpgt ) a b
c This grammar can be written in a different
way, without the notation ltexpgt ?
ltexpgtltexpgt ltexpgt ? ltexpgtltexpgt ltexpgt ? (
ltexpgt ) ltexpgt ? a ltexpgt ? b ltexpgt ?
c The special non-terminal symbol ltemptygt
Generate an empty string a string of no
tokens if then statement with an optional
else part might be defined like this ltif-stmtgt
? if ltexprgt them ltstmtgt ltelse-partgt ltelse-pa
rtgt ? else ltstmtgt ltemptygt
10Context-free Grammars The grammars are
context-free grammars, the children of a node in
the parse tree depend only on that nodes
non-terminal symbol they do not depend on the
context of neighboring nodes in the
tree. Metalanguage a language is used to
describe another language. BNF is a metalanguage.
114.4 An example for writing Grammars float
a boolean (bool)b, c, d int e1, f, g
12 Step 1 Divide the problem into smaller
pieces. The major components the type name
the list of variables the final
semicolon Non-terminal symbols ltvar-decgt
lttype-namegt ltdeclarator-listgt ltvar-decgt
? lttype-namegt ltdeclarator-listgt
12Step 2 List the primitive types lttype-namegt ?
boolean (bool) byte short int
long char float double Step 3 Define the
declarator list A declarator list is a list of
one or more declarators, followed by a comma,
followed by a (smaller) declarator list.
ltdeclarator-listgt ? ltdeclartorgt
ltdeclaratorgt , ltdeclarator-listgt Step 4 Define
a declarator It is a variable name followed by,
optionally, by a equal sign and an
expression. ltdeclaratorgt ? ltvariable-namegt
ltvariable-namegt ltexprgt
13- 4.5 Examples
- A Grammar for a Small Language
- ltprogramgt ? begin ltstmt_listgt end
- ltstmt_listgt ? ltstmtgt ltstmtgt ltstmt_listgt
- ltstmtgt ? ltvargt ltexpressiongt
- ltvargt ? A B C
- ltexpressiongt ? ltvargt ltvargt ltvargt - ltvargt
ltvargt - Remarks
- Only one kind of statement assignment.
- A program consists of begin and end
- A list of assignment statements separated by
semicolons - An expression a variable, add two variables, or
subtract one variable from another one. - Only three variable names A, B, C.
14Leftmost derivations ltprogramgt gt begin
ltstmt_listgt end gt begin ltstmtgt ltstmt_listgt
end gt begin ltvargt ltexpressiongt ltstmt_listgt
end gt begin A ltexpressiongt ltstmt_listgt
end gt begin A ltvargt ltvargt ltstmt_listgt
end gt begin A B ltvargt ltstmt_listgt
end gt begin A B C ltstmt_listgt end gt
begin A B C ltstmtgt end gt begin A B C
ltvargt ltexpressiongt end gt begin A B C B
ltexpressiongt end gt begin A B C B C
end In this derivation, the replaced
non-terminal is always left-most non-terminal.
(right-most non-terminal or neither left-most nor
right-most)
15A Grammar for Simple Assignment
Statements ltassigngt ? ltidgt ltexprgt ltidgt ? A
B C ltexprgt ? ltidgt ltexprgt ltidgt
ltexprgt ( ltexprgt )
ltidgt A B ( A C ) is generated by the
leftmost derivation ltassigngt gt ltidgt ltexprgt
gt A ltidgt ltexprgt gt A B ltexprgt
gt A B ( ltexprgt ) gt A B ( ltidgt
ltexprgt ) gt A B ( A ltexprgt ) gt A
B ( A ltidgt ) gt A B ( A C )
16A parse tree for the simple statement A B (A
C)
17- Ambiguity A grammar that generates a sentence
for which there are two or more distinct parse
trees is said to be ambiguous. - An ambiguous Grammar for Simple Assignment
Statements - ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt ltexprgt
- ltexprgt ltexprgt
- ( ltexprgt )
- ltidgt
- The statement A B C A has two distinct
parse trees. - By using Operator Precedence and Associativity of
Operator, we solve the ambiguous problem.
18Two distinct parse trees for the same sentence, A
B C A
195.1 Operator Precedence An operator in an
expression is generated lower in the parse tree
(and therefore must be evaluated first) has
precedence over an operator produced higher up in
the tree. The left parse tree indicates The
multiplication operator has precedence over the
addition operator. The right parse tree
indicates The addition operator has precedence
over the multiplication operator. A grammar can
be written to define different operators (the
addition and multiplication) in a higher to lower
ordering.
20 An unambiguous Grammar for Simple Assignment
Statements ltassigngt ? ltidgt ltexprgt ltidgt ? A
B C ltexprgt ? ltexprgt lttermgt
lttermgt lttermgt ? lttermgt ltfactorgt
ltfactorgt ltfactorgt ? ( ltexprgt )
ltidgt Derive the statement A B C A by
using the grammar
21The leftmost derivation ltassigngt gt ltidgt
ltexprgt gt A ltexprgt gt A ltexprgt
lttermgt gt A lttermgt lttermgt gt A
ltfactorgt lttermgt gt A ltidgt lttermgt
gt A B lttermgt gt A B lttermgt
ltfactorgt gt A B ltfactorgt ltfactorgt
gt A B ltidgt ltfactorgt gt A B C
ltfactorgt gt A B C ltidgt gt A B C
A
22The rightmost derivation ltassigngt gt ltidgt
ltexprgt gt ltidgt ltexprgt lttermgt gt ltidgt
ltexprgt lttermgt ltfactorgt gt ltidgt ltexprgt
lttermgt ltidgt gt ltidgt ltexprgt lttermgt
A gt ltidgt ltexprgt ltfactorgt A gt ltidgt
ltexprgt ltidgt A gt ltidgt ltexprgt C A
gt ltidgt lttermgt C A gt ltidgt ltfactorgt
C A gt ltidgt ltidgt C A gt ltidgt B
C A gt A B C A
23A unique parse tree for A B C A using an
unambiguous grammar
245.2 Associativity of Operators The parse trees
for expressions with two or more adjacent
occurrences of operators with equal precedence
have those occurrences in proper hierarchical
order. Example Derive this statement A B C
A by using the grammar (the Leftmost) and
generate its parse tree. This parse tree shows
the left addition is lower than the right
addition (the left associative) A B C
A
25The leftmost derivation ltassigngt gt ltidgt
ltexprgt gt A ltexprgt gt A ltexprgt
lttermgt gt A ltexprgt lttermgt lttermgt gt A
lttermgt lttermgt lttermgt gt A ltfactorgt
lttermgt lttermgt gt A ltidgt lttermgt
lttermgt gt A B lttermgt lttermgt gt A B
ltfactorgt lttermgt gt A B ltidgt lttermgt
gt A B C lttermgt gt A B C
ltfactorgt gt A B C ltidgt gt A B C
A
26A parse tree for A (B C) A illustrating the
left associativity of addition The leftmost
derivation
27The rightmost derivation ltassigngt gt ltidgt
ltexprgt gt ltidgt ltexprgt lttermgt gt ltidgt
ltexprgt ltfactorgt gt ltidgt ltexprgt
ltidgt gt ltidgt ltexprgt A gt ltidgt
ltexprgt lttermgt A gt ltidgt ltexprgt ltfactorgt
A gt ltidgt ltexprgt ltidgt A gt ltidgt
ltexprgt C A gt ltidgt lttermgt C A gt
ltidgt ltfactorgt C A gt ltidgt ltidgt C
A gt ltidgt B C A gt A B C A
28A parse tree for A (B C) A illustrating the
left associativity of addition The rightmost
derivation
29- Left recursive when a BNF rule has its LHS also
appearing at the beginning of its RHS.
An unambiguous Grammar for Simple Assignment
Statements ltassigngt ? ltidgt ltexprgt ltidgt ? A
B C ltexprgt ? ltexprgt lttermgt lttermgt
(Left associative) lttermgt ? lttermgt ltfactorgt
ltfactorgt (Left associative) ltfactorgt ? (
ltexprgt ) ltidgt
30- Right recursive when a BNF rule has its LHS also
appearing at the end of its RHS.
Rules for exponentiation as a right-associative
operator ltfactorgt ? ltexpgt ltfactorgt ltexpgt
(Right associative) ltexpgt ? ( ltexpgt ) ltidgt
315.3 An unambiguous grammar for if-then-else
Rules for if-then-else ltif_stmtgt ? if
ltlogic_exprgt then ltstmtgt if ltlogic_exprgt
then ltstmtgt else ltstmtgt Deriving if
ltlogic_exprgt then if ltlogic_exprgt then ltstmtgt
else ltstmtgt by the rules above
32Two distinct parse trees for the same sentential
form
33The unambiguous grammar for if-then-else Rules
for if-then-else ltstmtgt ? ltmatchedgt
ltunmatchedgt ltmatchedgt ? if ltlogic_exprgt then
ltmatchedgt else ltmatchedgt any non-if
statement ltunmatchedgt ? if ltlogic_exprgt then
ltstmtgt if ltlogic_exprgt then ltmatchedgt else
ltunmachedgt Deriving if ltlogic_exprgt then if
ltlogic_exprgt then ltstmtgt else ltstmtgt by the rules
above