Syntax and Semantics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Syntax and Semantics

1
Syntax and Semantics

Lexeme
lowest level syntactic unit in the language
Token
a language category for the lexemes

Syntax
form or structure of expressions or statements
for a given language
For instance, in Java, the form of a while loop
is
while(ltbool_expr) ltstmtgt
Semantics
meaning of the expressions
Language
group of words that can be combined and the rules
for combining those words
Sentence
a legal statement in the language

Example (in Java) index 2 count
17 Lexeme Token index
identifier assignment_operator
2 integer_literal
mult_operator count identifier
addition_operator 17
integer_literal
semicolon
2
Languages

Language Recognizer
given a sentence, is it in the given language?
Language Generator
given a language, create legal and meaningful
sentences
We can build a language recognizer if we already
have a language generator
Grammar
a description of a language - can be used for
generation or, given the grammar, a language
recognizer (known as a parser) can be created

We classify languages into one of four
categories
Regular
Context-Free
Context-Sensitive
Recursively Enumerable
Here, we are interested in the context-free
grammar
these include those which can be generated from a
language generator
all natural languages and programming languages
fall into this category
You will study the other languages in more detail
in 485

3
BNF (Backus Naur Form)

Equivalent to a context-free language
BNF is a notation (or a meta-language) used to
specify the grammar of a language
The BNF can then be used for language generation
or recognition
BNF uses rules to map non-terminal symbols
(tokens) into other non-terminals and terminals
(lexemes)
We define a BNF Grammar as
Galphabet, rules, ltstartgt
alphabet consists of those symbols used in the
rules
both terminal symbols and non-terminal symbols,
non-terminal symbols are placed in lt gt
rules map from a non-terminal to other elements
in the alphabet
for instance, a rule might say ltAgt ? altBgt bltAgt
rules can be recursive as shown above where an
ltAgt can be applied to generate a terminal (b) and
another ltAgt
ltstartgt is a non-terminal which is the starting
point for a language generator and must be on at
least 1 rules left hand side

4
Examples of Grammar Rules
ltprogramgt -gt begin ltstmt_listgt end Notice the
use of both terminals and non-terminals on the
right side Recursion is used as
necessary ltident_listgt -gt ident ident,
ltident_listgt The symbol means or so that an
ltident_listgt can be a single ident, or an ident,
a comma, and more of an ident_list Other
examples are ltassigngt -gt ltvargt
ltexpressiongt ltif_stmtgt -gt if ltlogical_exprgt then
ltstmtgt if ltlogical_exprgt then ltstmtgt else
ltstmtgt
5
Example Grammar
ltprogramgt -gt begin ltstmt_listgt end ltstmt_listgt -gt
ltstmtgt ltstmtgt ltstmt_listgt ltstmtgt -gt ltvargt
ltexprgt ltvargt -gt a b c d ltexprgt -gt ltvargt
ltvargt ltvargt - ltvargt ltvargt
This grammar can be used to generate a program
(granted, a program that will only consist of
assignment statements) or we can use the grammar
to generate a recognizer to recognize if a given
program is syntactically valid in this
language A derivation is a generation, starting
at the ltstartgt symbol (in this case, the start
symbol is ltprogramgt) and applying rules until all
non-terminal symbols have been removed from the
generated sentence. Such a sentence will be a
legal sentence in the language
6
A derivation from the grammar
ltprogramgt gt begin ltstmt_listgt end gt
begin ltstmtgt ltstmtgt_listgt end gt begin
ltvargt ltexpressiongt ltstmt_listgt end gt
begin A ltexpressiongt ltstmt_listgt end gt
begin A ltvargt ltvargt ltstmt_listgt end
gt begin A B ltvargt ltstmt_listgt end gt
begin A B C ltstmt_listgt end gt begin
A B C ltstmtgt end gt begin A B C
ltvargt ltexpressiongt end gt begin A B C B
ltexpressiongt end gt begin A B C B
ltvargt end gt begin A B C B C end So, the
program begin A B C B
C end is legal
We generate a leftmost derivation, where the next
rule applied is applied to the leftmost non-termin
al symbol (which is why we worked on the first
assignment statement before we generated the
second ltstmtgt)
7
Parse Trees

A hierarchical structure displaying the
derivation of an expression in some grammar
Leaf nodes are terminals, non-leaf nodes are
non-terminals
Parser
Process which takes a sentence and breaks it into
its component parts, deriving a parse tree
If the parser cannot generate a parse tree, then
the sentence is not legal
If the parser can generate two or more parse
trees for the same sentence, then the grammar is
ambiguous

8
Grammar and Parse Tree
ltassigngt ? ltidgt ltexprgt ltidgt ? A B C ltexprgt
? ltidgt ltexprgt ltidgt ltexprgt (ltexprgt)
ltidgt
ltassigngt ltidgt ltexprgt
A ltidgt ltexprgt B
( ltexprgt ) ltidgt
ltexprgt A ltidgt C
Parse tree for the derivation ltassigngt ? ltidgt
ltexprgt ? A ltexprgt ? A ltidgt ltexprgt ? A B
ltexprgt ? A B (ltexprgt) ? A B (ltidgt
ltexprgt) ? A B (A ltexprgt) ? A B (A
ltidgt ) ? A B (A C)
9
An Ambiguous Grammar

ltassigngt ? ltidgt ltexprgt
ltidgt ? A B C
ltexprgt ? ltexprgt ltexprgt ltexprgt
ltexprgt (ltexprgt) ltidgt
With this grammar, the sentence
ltassigngt ? A B A C
has two distinct parse trees
see next slide
The reason this is important is that the second
parse tree represents an interpretation of the
expression where has higher precedence than
which would give us an incorrect answer

10
Two Parse Trees for A B A C
ltassigngt ltidgt ltexprgt A ltexprgt
ltexprgt ltidgt ltexprgt ltexprgt B
ltidgt ltidgt A
C
The lower down the operator in the parse tree,
the higher its precedence so on the left, has a
higher precedence than (which is as it should
be) but the tree on the right is incorrect,
essentially being A (B A) C even though
there are no parentheses specified to alter the
precedence
11
An Unambiguous Grammar

ltassigngt ? ltidgt ltexprgt
ltidgt ? A B C
ltexprgt ? ltexprgt lttermgt lttermgt
lttermgt ? lttermgt ltfactorgt ltfactorgt
ltfactorgt ? (ltexprgt) ltidgt

Here, we force operator precedence by making
a multiplication occur lower in the tree by
deriving it through an additional rule ( ) having
the highest precedence requires the most
derivation to get to it
Assume we wanted to add another operator, unary
(as in -5), how would we add it? What about
adding (exponent)?
12
Derivation and Parse Tree of the Unambiguous
Grammar
ltassigngt ltidgt ltexprgt A
ltexprgt lttermgt lttermgt
lttermgt ltfactorgt ltfactorgt ltfactorgt
ltidgt ltidgt ltidgt
A B
C
ltassigngt ? ltidgt ltexprgt ? A ltexprgt ? A
ltexprgt lttermgt ? A lttermgt lttermgt ? A
ltfactorgt lttermgt ? A ltidgt lttermgt ? A B
lttermgt ? A B lttermgt ltfactorgt ? A B
ltfactorgt ltfactorgt ? A B ltidgt ltfactorgt ?
A B C ltfactorgt ? A B C ltidgt ? A B
C A
13
Associativity

We maintain operator precedence through
additional rules whereby higher precedence
operators appear after the application of more
rules
Should we worry about associativity? Notice that
A B C C A B, should we force them to
generate the same parse tree?
It doesnt seem worthwhile, and yet if A, B and C
are floats instead of ints, then A B C may
not equal C A B, so associativity should be
preserved
How?
We will require that all rules in our BNF be left
recursive for left associativity and right
recursive for right associativity
Left recursive means that a recursive
non-terminal must appear to the left of any other
non-terminals
ltexprgt ? ltexprgt lttermgt is left recursive and
ltfactorgt ? ltexprgt ltfactorgt is right recursive

14
Ambiguous If-Then-Else

ltif_stmtgt ? if ltlogical exprgt then ltstmtgt
if ltlogical exprgt then ltstmtgt else ltstmtgt
Since a ltstmtgt could be another ltif_stmtgt we
could generate
if X gt 0 then if Y gt 0 then X0 else XY
The problem is that this is ambiguous
Is the else the alternative to the first then or
the second then (that is, which condition does
the else get attached to?)
We could use to remove the ambiguity but it
is better to create an unambiguous grammar

15
Unambiguous If-Then-Else

ltstmtgt ? ltmatchedgt ltunmatchedgt
ltmatchedgt ? if ltlogical exprgt then ltmatchedgt
else ltmatchedgt any non-if statement
ltunmatchedgt ? if ltlogical exprgt then ltstmtgt
if ltlogical exprgt then ltmatchedgt else
ltunmatchedgt

Here, an if-then with a nested if-then-else is
allowed, but an if-then-else where the
then-clause contains an if-then is not allowed
(the then and else clauses must be matched, which
means either another if-then-else, or a non-if
statement In this way, any else clause is always
associated with the most recent then clause Most
languages follow this grammar, or require
explicit delimiters (like )
16
Extended BNF Grammars
Here, we revise our ltexprgt portion of the grammar
to illustrate how much easier it is to notate the
grammar using EBNF

3 common extensions to BNF
- used to denote optional elements (saves
some space so that we dont have to enumerate
options as separate possibilities)
- used to indicate 0 or more instances
( ) - for a list of choices
These extensions are added to a BNF Grammar for
convenience allowing us to shorten the grammar

BNF
ltexprgt ? ltexprgt lttermgt
ltexprgt - lttermgt
lttermgt
lttermgt ? lttermgt ltfactorgt
lttermgt / ltfactorgt
ltfactorgt
ltfactorgt ? ltexpgt ltfactorgt
ltexpgt
ltexpgt ? (ltexprgt) ltidgt
EBNF
ltexprgt ? ltexprgt ( -) lttermgt
lttermgt ? lttermgt ( /) ltfactorgt
ltfactorgt ? ltexpgt ltfactorgt
ltexpgt ? (ltexprgt) ltidgt

17
Attribute Grammars

It is not possible to describe all aspects of a
language solely with a BNF Grammar
BNF Grammar lacks static semantics (that is,
rules that the language dictates for a program to
be syntactically correct)
For example
Making sure that the number of parameters in a
function call match the number of parameters in
the function header
Making sure in an assignment statement that the
left hand side type matches (or is compatible
with) the value computed by the right hand sides
expression
Attribute grammars are added to BNF grammars to
handle these gaps
We will add attributes and predicate functions to
every BNF grammar rule the attributes will
store such information as variable type or number
of parameters and the predicates will test to
make sure that the attributes match accordingly

18
Attribute Grammar Features

Synthesized attributes
information passed up the parse tree
Inherited attributes
information passed down the parse tree
Semantic functions
rules or predicates associated with grammar rules
that compare synthesized attributes to inherited
attributes
if any predicate fails its test, then we have a
syntax error
Intrinsic attributes
leaf node attributes derived when a rule
generates a terminal
for instance, if lttypegt ? int, then the attribute
for the declared variable receives its intrinsic
value (whatever value we use to denote that the
variable is an int)

19
Example Deriving Identifiers

In some languages, identifier names are limited
Here, we look at Pascal where an identifier name
must start with a letter or _, consist of _,
letters and numbers, and be less than or equal to
31 characters in length
Our BNF rule for deriving an identifier is
ltidentifiergt ? _ltidgt ltlettergtltidgt
ltidgt ? _ ltlettergt ltdigitgt _ltidgt
ltlettergtltidgt ltdigitgtltidgt
We enhance our grammar with the attribute length
ltidentifiergt ? _ltidgt ltlettergtltidgt
ltidentifiergt.length 1
ltidgt ? _ ltlettergt ltdigitgt _ltidgt
ltlettergtltidgt ltdigitgtltidgt
ltidentifiergt.length ? ltidentifergt.length 1
Predicate ltidentifergt.length lt 31

20
Example Assignment Stmt

Our grammar now becomes
1. Syntax rule ltassigngt ? ltvargt ltexprgt
Semantic rule ltexprgt.expected_type ?
ltvargt.actual_type
2. Syntax rule ltexprgt ? ltvargt2 ltvargt3
Semantic rule ltexprgt.actual_type ? if
(ltvargt2.actual_type int) and
(ltvargt3.actual_type int) then int else real
Predicate ltexprgt.actual_type
ltexprgt.expected_type
3. Syntax rule ltexprgt ? ltvargt
Semantic rule ltexprgt.actual_type ?
ltvargt.actual_type
Predicate ltexprgt.actual_type
ltexprgt.expected_type
4. Syntax rule ltvargt ? A B C
Semantic rule ltvar.actual_type ?
look-up(ltvargt.string)

Attributes
Actual_Type
synthesized for ltvargt and ltexprgt, stores the type
Expected_Type
inherited for ltexprgt based on the ltvargt type
LHS_Type
synthesized for ltassigngt
Env
inherited for ltassigngt, ltexprgt, ltvargt carrying
the reference to the symbol table

21
Example
ltassigngt ltexprgt ltvargt ltvargt2
ltvargt3 A A B
expected_type
Assume A is a float and B is an
int ltvargt.actual_type float ltvargt2.actual_ty
pe float ltvargt3.actual_type
int ltexprgt.actual_type float (derived
from var2 and var3 through semantic
rule) ltexprgt.expected_type float
(inherited from ltassigngt which is
inherited from ltvargt) ltexprgt.expected_type
ltexprgt.actual_type, so predicate is
satisifed, no syntax error
actual_type
actual_type
actual_type
ltexprgt.expected_type ? inherited from
parent ltvargt1.actual_type ? lookup
(A) ltvargt2.actual_type ? lookup
(B) ltvargt1.actual_type ? ltvargt2.actual_type
ltexprgt.actual_type ? ltvargt1.actual_type ltexprgt.a
ctual_type ? ltexprgt.expected_type
22
Dynamic Semantics

Describing the meaning of a program or of a
statement or group of statements
Describing the syntax of a language or of a set
of code is relatively easy, how do we describe
the meaning behind code?
We could express it in English (e.g., through
comments) but this is too informal and perhaps
too incomplete/imprecise
What if we want to use the semantics to make sure
that the program does what is intended? This is
known as verification. We would need more formal
methods of defining semantics for this, so we
turn to
Operational Semantics
how the statement will be executed
Axiomatic Semantics
what results to expect from the statement
Denotational Semantics
functional way of mapping the affects of a
statement

23
Operational Semantics

This can be thought of as tracing through a
program to see what affects an instruction will
have
Implemented as an interpreter or compiler or
assembler
that is, how will the computer execute this
instruction?
This is simply a mechanistic description of the
statement and does not necessarily help us
understand the statement

Example C for-loop for(expr1 expr2 expr3)
stmt Becomes expr1 loop if expr2
0 goto out stmt expr 3 goto loop out
24
Axiomatic Semantics

Used mainly to prove correctness of code
Each statement in the language has associated
assertions what we expect to be true before and
after the statement executes
We list these assertions as pre- and
post-conditions that specify how the machine
changes (changes to variables)
Given the state of the machine prior to executing
a statement, we can then determine what must be
true afterward
The basic form of an axiomatic semantic is P
S Q
This is interpreted as
if P is true before S, then Q is true after S
We must now define how to determine Q given P and
S

25
Weakest pre-condition

We will start with a given post-condition and
derive the weakest pre-condition
We work backwards mainly because we will start
with an overall goal in mind for the given
statement or program
We want to derive the weakest pre-condition for a
given post-condition because this is the least
restrictive pre-condition that will guarantee
validity
Weakest means most general what is the greatest
range of values for a given variable such that
the result will be true?
For example, consider the assignment statement
sum 2x1
with post-condition sum gt 1
Possible pre-conditions are x gt 10, x gt 50
and x gt 1000
But the weakest pre-condition is x gt 0

26
Assignment Statement Rule

We will use the following notation for an
assignment statement axiomatic rule
Qx?E x E Q
This is read as follows
If Q is true after the assignment, then Qx?E is
true prior
The notation Qx?E means to replace all instances
of x in Q with E
Examples
ab/2-1 a lt 10
We replace a in a lt 10 with b / 2 1 and solve
for b, thus Qx?E is b / 2 1 lt 10 or b lt
22
So we have b lt 22 a b / 2 1 a lt 10
that is, if b lt 22 prior to the assignment
statement, then a will be less than 10 afterward
x 2 y 3 x gt 25
pre-condition is 2 y 3 gt 25 or y gt 14
c d e 4 c gt 0
pre-condition is d e 4 gt 0 or d e gt 4,
we might want to list this as d gt 4 / e or e gt
4 / d, or even d gt 4 / e d ! 0 e ! 0

27
Sequences

In general, a series of statements S1, S2, S3,
..., Sn can be expressed as
P S1 Q1 Q1 S2 Q2 Q2 S3 Q3 ...
Qn Sn Q
This can be simplified to P S1, S2, S3, ...,
SnQ
Therefore, we can combine rules to show the
axiomatic semantics of a block of code
Example
y 3 x 1 x y 3
If our post-condition is x lt 10 then our
pre-condition between the two statements is y3
lt 10 or y lt 7 and our pre-condition before the
first statement is 3 x 1 lt 7 or x lt 2
If x lt 2 before the first statement, then x lt 10
after the second statement

28
Rule of Consequence

In the previous example, we had a sequence of 2
assignment statements, but this works in general
with any number of statements of any kind
The rule of consequence is shown as follows
The rule means that if P implies P and Q implies
Q and we have already proven that P S Q is
true, then we can infer P S Q is also true
notice that P gt P means that P is a weaker
condition than P whereas Q gt Q means that Q is
a stronger condition than Q
this allows us to take a proven rule and weaken
its postcondition and strengthen its precondition

gt means implies
29
Selection Axiomatic Semantic

Given a statement if (B) S1 else S2
The semantic rule is B P S1 Q, (!B)
P S2 Q
if Q is our post-condition, then we have two
pre-conditions, if the if statements condition
is true (B) then B P, and if the if statements
condition is false (Not B) then !B P, so we
must derive P that will allow the same
post-condition no matter if B or !B is true
Example
if (x gt 0) y-- else y
Suppose the post-condition is y gt 0
the pre-condition for the if-clause is y gt 1
the pre-condition for the else-clause is y gt -1
the condition y gt 1 is subsumed by the
condition y gt -1 (that is, if y gt 1 is true,
then y gt -1 must also be true
So, we select y gt 1 as our weakest
pre-condition
we cannot use y gt -1 because, if x gt 0 and y
-1, our post-condition is not true

30
Logical Pretest Loops

Our pre-test loop looks like this
while (B) S
We must derive a pre-condition that is true prior
to the loop whether it B is true or not, but also
the pre-condition must remain true if B is true
and S is executed that is, P must be true prior
to each loop iteration
To derive P, we create I, a loop invariant
The invariant will always be true both before and
after each loop iteration
The pre-condition must include I and the
post-condition must include I and Not B
NOTE determine a loop invariant is difficult
and does not necessarily seem to help us
understand the loop
For these reasons, we will go over an example,
but not cover this in any more detail

31
While-Loop Example

Our loop is
while (y ! x) y
Our post-condition is y x
the post-condition states that the condition (y
! x)is false
The pre-condition must include the condition (y!
x) and the loop invariant
what is the invariant? We need to select
something that will be true both before and after
each loop iteration
notice that y initially will not equal x and then
we add 1 to x, so that y lt x or y x after each
loop iteration
we cannot have y gt x before the loop because this
would be an infinite loop and that would result
in the post-condition never being true since
the post-condition must be true, y gt x can not be
true beforehand
either y lt x or y x will be true, our
loop-invariant is y lt x

32
Two More Loop Examples
while s gt 1 do s s / 2 end s 1 What is the
weakest precondition? (wp) Lets apply the loop
one time if s gt 1 then for s 1 afterward,
we would have s s / 2 s 1 our wp
is s 2 for 2 iterations, we would then
have s s / 2 s 2 so our wp is s
4 for 3 iterations, we would then have s
s / 2 s 4 so our wp is s 8 We can now
derive the invariant as being s is a
non-negative power of 2 or s 2n for n gt 0
The following code computes z x y assuming y
is positive z0 ny while(ngt0)
zx n-- So, our
post-condition is zxy y gt 0
n0 where n0 is NOT B and zxy is P. I, our
loop invariant, is not merely y gt 0 however.
If we analyze each loop iteration for z and n,
we find that zx(y-n) and ngt0
A precondition then is s gt 1 and s 2n but
this is not the weakest, we can make it weaker by
using s gt 1
33
Program Proofs

As you can see by the last example, finding an
invariant is not necessarily easy
the invariant must include the loops terminating
condition but also be weak enough to describe
what happens during each loop iteration
in using axiomatic semantics for a loop, the
requirement that the invariant include the
terminating condition is often ignored
in such a case, the axiomatic description is
known as offering only partial correctness rather
than total correctness if the terminating
condition is met
By combining these axiomatic rules, we can prove
the correctness of an entire program
consider the example below which swaps two
variable values
our precondition requires that the two variable
values have in fact been swapped, now we will
prove it

P t x x y y t x B AND y A
P t x P1, P1 x y P2, P2 y t x
B AND y A
P2 is x B AND t A
P1 is y B AND t A
P is y B AND x A
and y B AND x A gt x A AND y B

x A AND y B t x x y y t x B AND
y A
The chapter offers a more complete example if you
are interested
34
Denotational Semantics

This form of semantics is a more rigorous method
of describing the meaning of a program than our
previous approaches
Denotational semantics is based on recursive
function theory
That is, derive a function that defines the
affects of an instruction
Because the function is recursive, this tends to
be a very difficult topic, probably the hardest
thing when studying programming languages
In essence, the function will map from an
instance of a mathematical object (the state of
the machine) onto another mathematic object
so this is telling us what happens to the machine
after applying an instruction (or program)
We will look at an example of a recursive
function and then apply the idea to 3 types of
instructions

35
Simple Examples

We define the value of a binary number
ltbin_numgt ? 0 1 ltbin_numgt0 ltbin_numgt1
that is, a binary number is a 0, a 1, or
recursively a binary number followed by a 0 or a
binary number followed by a 1
the function must map from a binary number
derived from the above grammar rule to a
mathematical object (an integer value)
Mbin(ltbin_num)
Mbin(0) 0
Mbin(1) 1
Mbin(ltbin_numgt0)2Mbin(ltbin_numgt)
Mbin(ltbin_numgt1)2Mbin(ltbin_numgt) 1
Mbin(101) 2Mbin(10) 1 2(2Mbin(1))1
2211 5

Expressions Me (E, s) if VARMAP(i, s)
undef for some i in E then error else
E, where E is the result of evaluating
evaluating E after setting each variable
i in E to VARMAP(i, s)
Expression E, in state s, is an error if some i
(variable) in E is undefined, otherwise it is
E value of evaluating E by applying each
variable I and operator in E using VARMAP (symbol
table) currently in s
36
Assignment and Loop
Assignment Statements Ma(x E, s) if
Me(E, s) error then error else s
lti1,v1gt,lti2,v2gt,...,ltin,vngt, where
for j 1, 2, ..., n, vj VARMAP(ij, s)
if ij ltgt x Me(E, s)
if ij x
Here, the state of the machine is an error if
there is an error when evaluating E in s,
otherwise the state of the machine is modified
where x is now equal to E, but all
other variables in s remain the same
If B, when evaluating given the state of the
machine s, is undefined, then we have an error,
otherwise if B evaluates to False, then the state
remains s, otherwise the state becomes the state
when L is executed, so the state of the machine
changes to be whatever the function M(L, s)
returns Since L is some non-specified
statement, we dont know exactly what will happen
Ml(while B do L, s) ? if Mb(B, s) undef
then error else if Mb(B, s) falsethen
s else if Msl(L, s) error
then error else
Ml(while B do L, Msl(L, s))

Write a Comment

User Comments (0)

About PowerShow.com

Syntax and Semantics PowerPoint PPT Presentation