Title: Syntax and Semantics
1Syntax and Semantics
- Lexeme
- lowest level syntactic unit in the language
- Token
- a language category for the lexemes
- Syntax
- form or structure of expressions or statements
for a given language - For instance, in Java, the form of a while loop
is - while(ltbool_expr) ltstmtgt
- Semantics
- meaning of the expressions
- Language
- group of words that can be combined and the rules
for combining those words - Sentence
- a legal statement in the language
Example (in Java) index 2 count
17 Lexeme Token index
identifier assignment_operator
2 integer_literal
mult_operator count identifier
addition_operator 17
integer_literal
semicolon
2Languages
- Language Recognizer
- given a sentence, is it in the given language?
- Language Generator
- given a language, create legal and meaningful
sentences - We can build a language recognizer if we already
have a language generator - Grammar
- a description of a language - can be used for
generation or, given the grammar, a language
recognizer (known as a parser) can be created
- We classify languages into one of four
categories - Regular
- Context-Free
- Context-Sensitive
- Recursively Enumerable
- Here, we are interested in the context-free
grammar - these include those which can be generated from a
language generator - all natural languages and programming languages
fall into this category - You will study the other languages in more detail
in 485
3BNF (Backus Naur Form)
- Equivalent to a context-free language
- BNF is a notation (or a meta-language) used to
specify the grammar of a language - The BNF can then be used for language generation
or recognition - BNF uses rules to map non-terminal symbols
(tokens) into other non-terminals and terminals
(lexemes) - We define a BNF Grammar as
- Galphabet, rules, ltstartgt
- alphabet consists of those symbols used in the
rules - both terminal symbols and non-terminal symbols,
non-terminal symbols are placed in lt gt - rules map from a non-terminal to other elements
in the alphabet - for instance, a rule might say ltAgt ? altBgt bltAgt
- rules can be recursive as shown above where an
ltAgt can be applied to generate a terminal (b) and
another ltAgt - ltstartgt is a non-terminal which is the starting
point for a language generator and must be on at
least 1 rules left hand side
4Examples of Grammar Rules
ltprogramgt -gt begin ltstmt_listgt end Notice the
use of both terminals and non-terminals on the
right side Recursion is used as
necessary ltident_listgt -gt ident ident,
ltident_listgt The symbol means or so that an
ltident_listgt can be a single ident, or an ident,
a comma, and more of an ident_list Other
examples are ltassigngt -gt ltvargt
ltexpressiongt ltif_stmtgt -gt if ltlogical_exprgt then
ltstmtgt if ltlogical_exprgt then ltstmtgt else
ltstmtgt
5Example Grammar
ltprogramgt -gt begin ltstmt_listgt end ltstmt_listgt -gt
ltstmtgt ltstmtgt ltstmt_listgt ltstmtgt -gt ltvargt
ltexprgt ltvargt -gt a b c d ltexprgt -gt ltvargt
ltvargt ltvargt - ltvargt ltvargt
This grammar can be used to generate a program
(granted, a program that will only consist of
assignment statements) or we can use the grammar
to generate a recognizer to recognize if a given
program is syntactically valid in this
language A derivation is a generation, starting
at the ltstartgt symbol (in this case, the start
symbol is ltprogramgt) and applying rules until all
non-terminal symbols have been removed from the
generated sentence. Such a sentence will be a
legal sentence in the language
6A derivation from the grammar
ltprogramgt gt begin ltstmt_listgt end gt
begin ltstmtgt ltstmtgt_listgt end gt begin
ltvargt ltexpressiongt ltstmt_listgt end gt
begin A ltexpressiongt ltstmt_listgt end gt
begin A ltvargt ltvargt ltstmt_listgt end
gt begin A B ltvargt ltstmt_listgt end gt
begin A B C ltstmt_listgt end gt begin
A B C ltstmtgt end gt begin A B C
ltvargt ltexpressiongt end gt begin A B C B
ltexpressiongt end gt begin A B C B
ltvargt end gt begin A B C B C end So, the
program begin A B C B
C end is legal
We generate a leftmost derivation, where the next
rule applied is applied to the leftmost non-termin
al symbol (which is why we worked on the first
assignment statement before we generated the
second ltstmtgt)
7Parse Trees
- A hierarchical structure displaying the
derivation of an expression in some grammar - Leaf nodes are terminals, non-leaf nodes are
non-terminals - Parser
- Process which takes a sentence and breaks it into
its component parts, deriving a parse tree - If the parser cannot generate a parse tree, then
the sentence is not legal - If the parser can generate two or more parse
trees for the same sentence, then the grammar is
ambiguous
8Grammar and Parse Tree
ltassigngt ? ltidgt ltexprgt ltidgt ? A B C ltexprgt
? ltidgt ltexprgt ltidgt ltexprgt (ltexprgt)
ltidgt
ltassigngt ltidgt ltexprgt
A ltidgt ltexprgt B
( ltexprgt ) ltidgt
ltexprgt A ltidgt C
Parse tree for the derivation ltassigngt ? ltidgt
ltexprgt ? A ltexprgt ? A ltidgt ltexprgt ? A B
ltexprgt ? A B (ltexprgt) ? A B (ltidgt
ltexprgt) ? A B (A ltexprgt) ? A B (A
ltidgt ) ? A B (A C)
9An Ambiguous Grammar
- ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt ltexprgt ltexprgt
ltexprgt (ltexprgt) ltidgt - With this grammar, the sentence
- ltassigngt ? A B A C
- has two distinct parse trees
- see next slide
- The reason this is important is that the second
parse tree represents an interpretation of the
expression where has higher precedence than
which would give us an incorrect answer
10Two Parse Trees for A B A C
ltassigngt ltidgt ltexprgt A ltexprgt
ltexprgt ltidgt ltexprgt ltexprgt B
ltidgt ltidgt A
C
The lower down the operator in the parse tree,
the higher its precedence so on the left, has a
higher precedence than (which is as it should
be) but the tree on the right is incorrect,
essentially being A (B A) C even though
there are no parentheses specified to alter the
precedence
11An Unambiguous Grammar
- ltassigngt ? ltidgt ltexprgt
- ltidgt ? A B C
- ltexprgt ? ltexprgt lttermgt lttermgt
- lttermgt ? lttermgt ltfactorgt ltfactorgt
- ltfactorgt ? (ltexprgt) ltidgt
Here, we force operator precedence by making
a multiplication occur lower in the tree by
deriving it through an additional rule ( ) having
the highest precedence requires the most
derivation to get to it
Assume we wanted to add another operator, unary
(as in -5), how would we add it? What about
adding (exponent)?
12Derivation and Parse Tree of the Unambiguous
Grammar
ltassigngt ltidgt ltexprgt A
ltexprgt lttermgt lttermgt
lttermgt ltfactorgt ltfactorgt ltfactorgt
ltidgt ltidgt ltidgt
A B
C
ltassigngt ? ltidgt ltexprgt ? A ltexprgt ? A
ltexprgt lttermgt ? A lttermgt lttermgt ? A
ltfactorgt lttermgt ? A ltidgt lttermgt ? A B
lttermgt ? A B lttermgt ltfactorgt ? A B
ltfactorgt ltfactorgt ? A B ltidgt ltfactorgt ?
A B C ltfactorgt ? A B C ltidgt ? A B
C A
13Associativity
- We maintain operator precedence through
additional rules whereby higher precedence
operators appear after the application of more
rules - Should we worry about associativity? Notice that
A B C C A B, should we force them to
generate the same parse tree? - It doesnt seem worthwhile, and yet if A, B and C
are floats instead of ints, then A B C may
not equal C A B, so associativity should be
preserved - How?
- We will require that all rules in our BNF be left
recursive for left associativity and right
recursive for right associativity - Left recursive means that a recursive
non-terminal must appear to the left of any other
non-terminals - ltexprgt ? ltexprgt lttermgt is left recursive and
- ltfactorgt ? ltexprgt ltfactorgt is right recursive
14Ambiguous If-Then-Else
- ltif_stmtgt ? if ltlogical exprgt then ltstmtgt
if ltlogical exprgt then ltstmtgt else ltstmtgt - Since a ltstmtgt could be another ltif_stmtgt we
could generate - if X gt 0 then if Y gt 0 then X0 else XY
- The problem is that this is ambiguous
- Is the else the alternative to the first then or
the second then (that is, which condition does
the else get attached to?) - We could use to remove the ambiguity but it
is better to create an unambiguous grammar
15Unambiguous If-Then-Else
- ltstmtgt ? ltmatchedgt ltunmatchedgt
- ltmatchedgt ? if ltlogical exprgt then ltmatchedgt
else ltmatchedgt any non-if statement - ltunmatchedgt ? if ltlogical exprgt then ltstmtgt
if ltlogical exprgt then ltmatchedgt else
ltunmatchedgt
Here, an if-then with a nested if-then-else is
allowed, but an if-then-else where the
then-clause contains an if-then is not allowed
(the then and else clauses must be matched, which
means either another if-then-else, or a non-if
statement In this way, any else clause is always
associated with the most recent then clause Most
languages follow this grammar, or require
explicit delimiters (like )
16Extended BNF Grammars
Here, we revise our ltexprgt portion of the grammar
to illustrate how much easier it is to notate the
grammar using EBNF
- 3 common extensions to BNF
- - used to denote optional elements (saves
some space so that we dont have to enumerate
options as separate possibilities) - - used to indicate 0 or more instances
- ( ) - for a list of choices
- These extensions are added to a BNF Grammar for
convenience allowing us to shorten the grammar
- BNF
- ltexprgt ? ltexprgt lttermgt
- ltexprgt - lttermgt
- lttermgt
- lttermgt ? lttermgt ltfactorgt
- lttermgt / ltfactorgt
- ltfactorgt
- ltfactorgt ? ltexpgt ltfactorgt
- ltexpgt
- ltexpgt ? (ltexprgt) ltidgt
- EBNF
- ltexprgt ? ltexprgt ( -) lttermgt
- lttermgt ? lttermgt ( /) ltfactorgt
- ltfactorgt ? ltexpgt ltfactorgt
- ltexpgt ? (ltexprgt) ltidgt
17Attribute Grammars
- It is not possible to describe all aspects of a
language solely with a BNF Grammar - BNF Grammar lacks static semantics (that is,
rules that the language dictates for a program to
be syntactically correct) - For example
- Making sure that the number of parameters in a
function call match the number of parameters in
the function header - Making sure in an assignment statement that the
left hand side type matches (or is compatible
with) the value computed by the right hand sides
expression - Attribute grammars are added to BNF grammars to
handle these gaps - We will add attributes and predicate functions to
every BNF grammar rule the attributes will
store such information as variable type or number
of parameters and the predicates will test to
make sure that the attributes match accordingly
18Attribute Grammar Features
- Synthesized attributes
- information passed up the parse tree
- Inherited attributes
- information passed down the parse tree
- Semantic functions
- rules or predicates associated with grammar rules
that compare synthesized attributes to inherited
attributes - if any predicate fails its test, then we have a
syntax error - Intrinsic attributes
- leaf node attributes derived when a rule
generates a terminal - for instance, if lttypegt ? int, then the attribute
for the declared variable receives its intrinsic
value (whatever value we use to denote that the
variable is an int)
19Example Deriving Identifiers
- In some languages, identifier names are limited
- Here, we look at Pascal where an identifier name
must start with a letter or _, consist of _,
letters and numbers, and be less than or equal to
31 characters in length - Our BNF rule for deriving an identifier is
- ltidentifiergt ? _ltidgt ltlettergtltidgt
- ltidgt ? _ ltlettergt ltdigitgt _ltidgt
ltlettergtltidgt ltdigitgtltidgt - We enhance our grammar with the attribute length
- ltidentifiergt ? _ltidgt ltlettergtltidgt
- ltidentifiergt.length 1
- ltidgt ? _ ltlettergt ltdigitgt _ltidgt
ltlettergtltidgt ltdigitgtltidgt - ltidentifiergt.length ? ltidentifergt.length 1
- Predicate ltidentifergt.length lt 31
20Example Assignment Stmt
- Our grammar now becomes
- 1. Syntax rule ltassigngt ? ltvargt ltexprgt
- Semantic rule ltexprgt.expected_type ?
ltvargt.actual_type - 2. Syntax rule ltexprgt ? ltvargt2 ltvargt3
- Semantic rule ltexprgt.actual_type ? if
(ltvargt2.actual_type int) and
(ltvargt3.actual_type int) then int else real - Predicate ltexprgt.actual_type
ltexprgt.expected_type - 3. Syntax rule ltexprgt ? ltvargt
- Semantic rule ltexprgt.actual_type ?
ltvargt.actual_type - Predicate ltexprgt.actual_type
ltexprgt.expected_type - 4. Syntax rule ltvargt ? A B C
- Semantic rule ltvar.actual_type ?
look-up(ltvargt.string)
- Attributes
- Actual_Type
- synthesized for ltvargt and ltexprgt, stores the type
- Expected_Type
- inherited for ltexprgt based on the ltvargt type
- LHS_Type
- synthesized for ltassigngt
- Env
- inherited for ltassigngt, ltexprgt, ltvargt carrying
the reference to the symbol table
21Example
ltassigngt ltexprgt ltvargt ltvargt2
ltvargt3 A A B
expected_type
Assume A is a float and B is an
int ltvargt.actual_type float ltvargt2.actual_ty
pe float ltvargt3.actual_type
int ltexprgt.actual_type float (derived
from var2 and var3 through semantic
rule) ltexprgt.expected_type float
(inherited from ltassigngt which is
inherited from ltvargt) ltexprgt.expected_type
ltexprgt.actual_type, so predicate is
satisifed, no syntax error
actual_type
actual_type
actual_type
ltexprgt.expected_type ? inherited from
parent ltvargt1.actual_type ? lookup
(A) ltvargt2.actual_type ? lookup
(B) ltvargt1.actual_type ? ltvargt2.actual_type
ltexprgt.actual_type ? ltvargt1.actual_type ltexprgt.a
ctual_type ? ltexprgt.expected_type
22Dynamic Semantics
- Describing the meaning of a program or of a
statement or group of statements - Describing the syntax of a language or of a set
of code is relatively easy, how do we describe
the meaning behind code? - We could express it in English (e.g., through
comments) but this is too informal and perhaps
too incomplete/imprecise - What if we want to use the semantics to make sure
that the program does what is intended? This is
known as verification. We would need more formal
methods of defining semantics for this, so we
turn to - Operational Semantics
- how the statement will be executed
- Axiomatic Semantics
- what results to expect from the statement
- Denotational Semantics
- functional way of mapping the affects of a
statement
23Operational Semantics
- This can be thought of as tracing through a
program to see what affects an instruction will
have - Implemented as an interpreter or compiler or
assembler - that is, how will the computer execute this
instruction? - This is simply a mechanistic description of the
statement and does not necessarily help us
understand the statement
Example C for-loop for(expr1 expr2 expr3)
stmt Becomes expr1 loop if expr2
0 goto out stmt expr 3 goto loop out
24Axiomatic Semantics
- Used mainly to prove correctness of code
- Each statement in the language has associated
assertions what we expect to be true before and
after the statement executes - We list these assertions as pre- and
post-conditions that specify how the machine
changes (changes to variables) - Given the state of the machine prior to executing
a statement, we can then determine what must be
true afterward - The basic form of an axiomatic semantic is P
S Q - This is interpreted as
- if P is true before S, then Q is true after S
- We must now define how to determine Q given P and
S
25Weakest pre-condition
- We will start with a given post-condition and
derive the weakest pre-condition - We work backwards mainly because we will start
with an overall goal in mind for the given
statement or program - We want to derive the weakest pre-condition for a
given post-condition because this is the least
restrictive pre-condition that will guarantee
validity - Weakest means most general what is the greatest
range of values for a given variable such that
the result will be true? - For example, consider the assignment statement
- sum 2x1
- with post-condition sum gt 1
- Possible pre-conditions are x gt 10, x gt 50
and x gt 1000 - But the weakest pre-condition is x gt 0
26Assignment Statement Rule
- We will use the following notation for an
assignment statement axiomatic rule - Qx?E x E Q
- This is read as follows
- If Q is true after the assignment, then Qx?E is
true prior - The notation Qx?E means to replace all instances
of x in Q with E - Examples
- ab/2-1 a lt 10
- We replace a in a lt 10 with b / 2 1 and solve
for b, thus Qx?E is b / 2 1 lt 10 or b lt
22 - So we have b lt 22 a b / 2 1 a lt 10
that is, if b lt 22 prior to the assignment
statement, then a will be less than 10 afterward - x 2 y 3 x gt 25
- pre-condition is 2 y 3 gt 25 or y gt 14
- c d e 4 c gt 0
- pre-condition is d e 4 gt 0 or d e gt 4,
we might want to list this as d gt 4 / e or e gt
4 / d, or even d gt 4 / e d ! 0 e ! 0
27Sequences
- In general, a series of statements S1, S2, S3,
..., Sn can be expressed as - P S1 Q1 Q1 S2 Q2 Q2 S3 Q3 ...
Qn Sn Q - This can be simplified to P S1, S2, S3, ...,
SnQ - Therefore, we can combine rules to show the
axiomatic semantics of a block of code - Example
- y 3 x 1 x y 3
- If our post-condition is x lt 10 then our
pre-condition between the two statements is y3
lt 10 or y lt 7 and our pre-condition before the
first statement is 3 x 1 lt 7 or x lt 2 - If x lt 2 before the first statement, then x lt 10
after the second statement
28Rule of Consequence
- In the previous example, we had a sequence of 2
assignment statements, but this works in general
with any number of statements of any kind - The rule of consequence is shown as follows
- The rule means that if P implies P and Q implies
Q and we have already proven that P S Q is
true, then we can infer P S Q is also true - notice that P gt P means that P is a weaker
condition than P whereas Q gt Q means that Q is
a stronger condition than Q - this allows us to take a proven rule and weaken
its postcondition and strengthen its precondition
gt means implies
29Selection Axiomatic Semantic
- Given a statement if (B) S1 else S2
- The semantic rule is B P S1 Q, (!B)
P S2 Q - if Q is our post-condition, then we have two
pre-conditions, if the if statements condition
is true (B) then B P, and if the if statements
condition is false (Not B) then !B P, so we
must derive P that will allow the same
post-condition no matter if B or !B is true - Example
- if (x gt 0) y-- else y
- Suppose the post-condition is y gt 0
- the pre-condition for the if-clause is y gt 1
- the pre-condition for the else-clause is y gt -1
- the condition y gt 1 is subsumed by the
condition y gt -1 (that is, if y gt 1 is true,
then y gt -1 must also be true - So, we select y gt 1 as our weakest
pre-condition - we cannot use y gt -1 because, if x gt 0 and y
-1, our post-condition is not true
30Logical Pretest Loops
- Our pre-test loop looks like this
- while (B) S
- We must derive a pre-condition that is true prior
to the loop whether it B is true or not, but also
the pre-condition must remain true if B is true
and S is executed that is, P must be true prior
to each loop iteration - To derive P, we create I, a loop invariant
- The invariant will always be true both before and
after each loop iteration - The pre-condition must include I and the
post-condition must include I and Not B - NOTE determine a loop invariant is difficult
and does not necessarily seem to help us
understand the loop - For these reasons, we will go over an example,
but not cover this in any more detail
31While-Loop Example
- Our loop is
- while (y ! x) y
- Our post-condition is y x
- the post-condition states that the condition (y
! x)is false - The pre-condition must include the condition (y!
x) and the loop invariant - what is the invariant? We need to select
something that will be true both before and after
each loop iteration - notice that y initially will not equal x and then
we add 1 to x, so that y lt x or y x after each
loop iteration - we cannot have y gt x before the loop because this
would be an infinite loop and that would result
in the post-condition never being true since
the post-condition must be true, y gt x can not be
true beforehand - either y lt x or y x will be true, our
loop-invariant is y lt x
32Two More Loop Examples
while s gt 1 do s s / 2 end s 1 What is the
weakest precondition? (wp) Lets apply the loop
one time if s gt 1 then for s 1 afterward,
we would have s s / 2 s 1 our wp
is s 2 for 2 iterations, we would then
have s s / 2 s 2 so our wp is s
4 for 3 iterations, we would then have s
s / 2 s 4 so our wp is s 8 We can now
derive the invariant as being s is a
non-negative power of 2 or s 2n for n gt 0
The following code computes z x y assuming y
is positive z0 ny while(ngt0)
zx n-- So, our
post-condition is zxy y gt 0
n0 where n0 is NOT B and zxy is P. I, our
loop invariant, is not merely y gt 0 however.
If we analyze each loop iteration for z and n,
we find that zx(y-n) and ngt0
A precondition then is s gt 1 and s 2n but
this is not the weakest, we can make it weaker by
using s gt 1
33Program Proofs
- As you can see by the last example, finding an
invariant is not necessarily easy - the invariant must include the loops terminating
condition but also be weak enough to describe
what happens during each loop iteration - in using axiomatic semantics for a loop, the
requirement that the invariant include the
terminating condition is often ignored - in such a case, the axiomatic description is
known as offering only partial correctness rather
than total correctness if the terminating
condition is met - By combining these axiomatic rules, we can prove
the correctness of an entire program - consider the example below which swaps two
variable values - our precondition requires that the two variable
values have in fact been swapped, now we will
prove it
- P t x x y y t x B AND y A
- P t x P1, P1 x y P2, P2 y t x
B AND y A - P2 is x B AND t A
- P1 is y B AND t A
- P is y B AND x A
- and y B AND x A gt x A AND y B
x A AND y B t x x y y t x B AND
y A
The chapter offers a more complete example if you
are interested
34Denotational Semantics
- This form of semantics is a more rigorous method
of describing the meaning of a program than our
previous approaches - Denotational semantics is based on recursive
function theory - That is, derive a function that defines the
affects of an instruction - Because the function is recursive, this tends to
be a very difficult topic, probably the hardest
thing when studying programming languages - In essence, the function will map from an
instance of a mathematical object (the state of
the machine) onto another mathematic object - so this is telling us what happens to the machine
after applying an instruction (or program) - We will look at an example of a recursive
function and then apply the idea to 3 types of
instructions
35Simple Examples
- We define the value of a binary number
- ltbin_numgt ? 0 1 ltbin_numgt0 ltbin_numgt1
- that is, a binary number is a 0, a 1, or
recursively a binary number followed by a 0 or a
binary number followed by a 1 - the function must map from a binary number
derived from the above grammar rule to a
mathematical object (an integer value) - Mbin(ltbin_num)
- Mbin(0) 0
- Mbin(1) 1
- Mbin(ltbin_numgt0)2Mbin(ltbin_numgt)
- Mbin(ltbin_numgt1)2Mbin(ltbin_numgt) 1
- Mbin(101) 2Mbin(10) 1 2(2Mbin(1))1
2211 5
Expressions Me (E, s) if VARMAP(i, s)
undef for some i in E then error else
E, where E is the result of evaluating
evaluating E after setting each variable
i in E to VARMAP(i, s)
Expression E, in state s, is an error if some i
(variable) in E is undefined, otherwise it is
E value of evaluating E by applying each
variable I and operator in E using VARMAP (symbol
table) currently in s
36Assignment and Loop
Assignment Statements Ma(x E, s) if
Me(E, s) error then error else s
lti1,v1gt,lti2,v2gt,...,ltin,vngt, where
for j 1, 2, ..., n, vj VARMAP(ij, s)
if ij ltgt x Me(E, s)
if ij x
Here, the state of the machine is an error if
there is an error when evaluating E in s,
otherwise the state of the machine is modified
where x is now equal to E, but all
other variables in s remain the same
If B, when evaluating given the state of the
machine s, is undefined, then we have an error,
otherwise if B evaluates to False, then the state
remains s, otherwise the state becomes the state
when L is executed, so the state of the machine
changes to be whatever the function M(L, s)
returns Since L is some non-specified
statement, we dont know exactly what will happen
Ml(while B do L, s) ? if Mb(B, s) undef
then error else if Mb(B, s) falsethen
s else if Msl(L, s) error
then error else
Ml(while B do L, Msl(L, s))