Title: Chapter 3: Describing Syntax and Semantics
1Chapter 3 Describing Syntax and Semantics
2Objective
- Introduction
- The General Problem of Describing Syntax
- Formal Methods of Describing Syntax
- Attribute Grammars
- Describing the Meanings of Programs Dynamic
Semantics
33.1 Introduction
- Who must use language definitions?
- Other language designers
- Implementers
- Programmers (the users of the language)
- Syntax
- The form or structure of the expressions,
statements and program units. - Semantics
- The meaning of the expressions, statements, and
program units.
43.2 Describing Syntax
- A sentence is a string of characters over some
alphabet. - A language is a set of sentences.
- A lexeme is the lowest level syntactic unit of a
languages (e.g., , sum, begin). - A token is a category of lexemes (e.g.,
identifier). - Formal approaches to describing syntax
- Recognizer ? used in compiler (see chapter 4).
- Generators ? what well study in this chapter.
53.3 Formal Methods of Describing Syntax
- Context-Free Grammars
- Developed by Noam Chomsky in the mid 1950s
- Language generators, meant to describe the syntax
of natural languages - Define a class of languages called context-free
languages.
63.3 Formal Methods of Describing Syntax (cont.)
- Backus-Naur Form (1959).
- Invented by John Backus to describe Algol 58.
- BNF is equivalent to context-free grammars
- A meta language is a language used to describe
another language. - In BNF, abstractions are used to represent
classes of syntactic structures. They act like
syntactic variables (also called nonterminal
symbols) - ltwhile_stmtgt ? while ( ltlogic_exprgt ) ltstmtgt
- This is a rule it describes the structure of a
while statement.
73.3 Formal Methods of Describing Syntax (cont.)
- A rule has a left-hand side (LHS) and a
right-hand side (RHS), and consists of terminal
and nonterminal symbols - An abstraction (or nonterminal symbol) can have
more than one RHS - ltstmtgt -gt ltsingle_stmtgt
- begin ltstmr_listgt end
- Syntactic lists are described using recursion
- ltident_listgt -gt ident
- ident, ltident_listgt
- A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols)
83.3 Formal Methods of Describing Syntax (cont.)
- An example grammar
- ltprogramgt -gt ltstmtsgt
- ltstmtsgt -gt ltstmtgt ltstmtgt ltstmtsgt
- ltstmtsgt -gt ltvargt ltexprgt
- ltvargt -gt a b c d
- ltexprgt -gt lttermgt lttermgt ltterm termgt
- lttermgt -gt ltvargt const
- An example derivation
- ltprogramgt gt ltstmtsgt gt ltstmtgt
- gt ltvargt ltexprgt gt a ltexprgt
- gt a lttermgt lttermgt
- gt a ltvargt lttermgt
- gt a b lttermgt
- gt a b const
93.3 Formal Methods of Describing Syntax (cont.)
- Every string of symbols in the derivation is a
sentential - A sentence is a sentential form that has only
terminal symbols - A leftmost derivation is one in which the
leftmost nonterminal in each sentential form is
the one that is expanded - A derivation may be neither leftmost nor rightmost
103.3 Formal Methods of Describing Syntax (cont.)
- A parse tree is a hierarchical
- representation of a derivation
- A grammar is ambiguous iff it
- generates a sentential form that
- has two or more distinct parse trees
113.3 Formal Methods of Describing Syntax (cont.)
- A parse tree for the simple statement A B (A
C)
123.3 Formal Methods of Describing Syntax (cont.)
- An ambiguous expression grammar
- ltexprgt -gt ltexprgt ltopgt ltexprgt const
- If we use the parse tree to indicate precedence
level of the operators, we cannot have ambiguity
133.3 Formal Methods of Describing Syntax (cont.)
- Two distinct parse trees for the same sentence, A
B C A
143.3 Formal Methods of Describing Syntax (cont.)
- An unambiguous expression grammar
153.3 Formal Methods of Describing Syntax (cont.)
- Operator associativity can also be indicated by
grammar
163.3 Formal Methods of Describing Syntax (cont.)
- Extended BNF (just abbreviations)
- Optional parts are placed in brackets ()
- ltproc_callgt -gt ident ( ltexpr_listgt)
- Put alternative parts of RHSs in parentheses and
separate them with vertical bars - lttermgt -gt lttermgt ( -) const
- Put repetitions (0 or more) in braces ()
- ltidentgt -gt letter letter digit
173.3 Formal Methods of Describing Syntax (cont.)
183.3 Formal Methods of Describing Syntax (cont.)
- Syntax Graphs
- put the terminals in circles or ellipses and put
the nonterminals in rectangles connect with
lines with arrowheads e.g., Pascal type
declarations
19(No Transcript)
20Attribute Grammar
- CFGs cannot describe all of the syntax of
programming languages. - Attribute grammar describe more of the structure
of a PL as compared to CFGs. - It an extension of CFGs.
- The extension allows certain language rules to be
described, such as type compatibility.
21Static semantics
- Static semantics of a language are non syntactic
rules which define legality of a program and can
be analyzed at compile time. - Consider a rule which states that a variable must
be declared before it is referenced. - Cannot be specified in a context-free grammar.
- Can be tested at compile time.
- Some rules can be specified in the grammar of a
language, but will unnecessarily complicate the
grammar. - e.g. a rule in JAVA that states that a string
literal cannot be assigned to a variable which
was declared to be type int.
22Basic Concepts
- Attribute grammars are grammars to which have
been added attributes, attribute computation
functions and predicate functions. - Attributes ? associate with grammar symbols, are
similar to variables in the sense that they can
have values assigned to them. - Attribute computation functions ? semantic
functions, are associated with grammar rules.
They are used to specify how attribute values are
computed. - Predicate function ? which state some of the
syntax and static semantic rules of the language
are associated with grammar rule.
23Attribute grammars defined
- Def An attribute grammar is a CFG with the
following additions - For each grammar symbol x there is a set A(x) of
attribute values. - Each rule has a set of functions that define
certain attributes of the nonterminals in the
rule. - Each rule has a set of predicates to check for
attribute consistency.
24Attribute Grammars (continued)
- Associate with each grammar symbol X is a set of
attributes A(X). - A(X) consists of two disjoint sets S(X) and I(X),
called synthesized and inherited attributes - Synthesized attributed are used to pass semantic
information up a parse tree. - Inherited attributes pass semantic information
down a parse tree. - Let X0 ? X1 Xn be a rule
- Function of the form I (Xn) f (A(X0), A (Xn))
define synthesized attributes. - Functions of the form I(Xj) f(A(X0), ... , A
(Xn)), for i lt j lt n, define inherited
attributes
25Attribute Grammars (continued)
- Example expressions of the form id id
- id's can be either int_type or real_type
- types of the two id's must be the same
- type of the expression must match it's expected
type - BNF
- ltexprgt ? ltvargt ltvargt
- ltvargt ? id
- Attributes
- Actual_type A synthesized attribute associated
with the nonterminals for ltvargt and ltexprgt - Exceted_type An inherited attribute associated
with the nonterminal ltexprgt
26Example
- Syntax rule ltproc_defgt ? procedure
ltproc_namegt1 - ltproc_bodygt end ltproc_namegt2
- Semantic rule ltproc_namegt1.string
ltproc_namegt2.string - ltassigngt ? ltvargt ltexprgt
- ltexprgt ? ltvargt ltvargt
- ltvargt
- ltvargt ? A B C
27An attribute grammar for simple assignment
statements
- 1. Syntax rule ltassigngt ? ltvargt ltexprgt
- Semantic rule ltexprgt.expected_type ?
ltvargt.actual_type - 2. Syntax rule ltexprgt ? ltvargt2 ltvargt3
- Semantic rule ltexprgt.actual_type ?
- if (ltvargt2.actual_type int) and
- (ltvargt3.actual_type int)
- then int
- else real
- end if
- Predicate ltexprgt.actual_type
ltexprgt.expected_type - 3. Syntax rule ltexprgt ? ltvargt
- Semantic rule ltexprgt.actual_type ?
ltvargt.actual_type - Predicate ltexprgt.actual_type
ltexprgt.expected_type - 4. Syntax rule ltvargt ? A B C
- Semantic rule ltvargt.actual_type ? look-up
(ltvargt.string) - The look-up function looks up a given name in
symbol table and returns the variables type.
28Describing the meaning of Programs Dynamic
semantics
- There is no single widely acceptable notation or
formalism for describing semantics - The dynamic semantics of a program is the meaning
of its expressions, statements, and program
units. - Accurately describing semantics is essential so
that... - ...users writing a program can precisely
understand how the various language constructs
work. - ...compilers will be implemented consistently
with respect to one another. - Three methods that are used to describe semantics
formally - Operational semantics
- Axiomatic semantics
- Denotational semantics
29Operational Semantics
- Describe the meaning of a program by executing
its statements on a machine, either simulated or
actual. The change in the state of the machine
(memory, registers, etc.) defines the meaning of
the statement - The for loop in C...
- for (expr1 expr2 expr3) stmt
- ...might translate like this
- expr1LoopTop if expr2 0 goto LoopEnd
stmt expr3 goto LoopTopLoopEnd
30Operational Semantics (cont.)
- Evaluation of operational semantics
- Good if used informally (language manuals, etc.)
- Extremely complex if used formally (e.g., VDL)
312. Axiomatic Semantics
- Axiomatic semantics defined in conjunction with
the development of a method to prove the
correctness of a program. - Axiomatic semantics is based on mathematical
logic. The logical expressions are called
predicated, or assertions. - An assertion immediately following a statement
describes a new constraints on those variables
after execution of the statement. - These assertions are called the precondition and
post condition. - Developing an axiomatic description or proof of a
given program requires that every statement in
the program have both a precondition and a post
condition.
322. Axiomatic Semantics
- Based on formal logic (first order predicate
calculus) - Original purpose formal program verification
- Approach Define axioms or inference rules for
each statement type in the language (to allow
transformations of expressions to other
expressions) - The expressions are called assertions
- An assertion before a statement (a precondition)
states the relationships and constraints among
variables that are true at that point in
execution - An assertion following a statement is a
postcondition - A weakest precondition is the least restrictive
precondition that will guarantee the
postcondition - Pre-post form P statement Q
- An example a b 1 a gt 1
- One possible precondition b gt 10
- Weakest precondition b gt 0
33Program proof process
- The postcondition for the whole program is the
desired results. Work back through the program
to the first statement. If the precondition on
the first statement is the same as the program
spec, the program is correct. - An axiom for assignment statements (x E)
- Qx-gtE x E Q
- The Rule of Consequence
- P S Q, P' gt P, Q gt Q'
- P' S Q'
- An inference rule for sequences
- For a sequence S1S2
- P1 S1 P2
- P2 S2 P3
- the inference rule is
- P1 S1 P2, P2 S2 P3
- P1 S1 S2 P3
343. Denotational Semantics
- Based on recursive function theory
- The most abstract semantics description method
- Originally developed by Scott and Strachey (1970)
- The process of building a denotational spec for a
language (not necessarily easy) - Define a mathematical object for each language
entity - Define a function that maps instances of the
language entities onto instances of the
corresponding mathematical objects - The meaning of language constructs are defined by
only the values of the program's variables - The difference between denotational and
operational semantics In operational semantics,
the state changes are defined by coded
algorithms in denotational semantics, they are
defined by rigorous mathematical functions - The state of a program is the values of all its
current variables - s lti1, v1gt, lti2, v2gt, , ltin, vngt
- Let VARMAP be a function that, when given a
variable name and a state, returns the current
value of the variable - VARMAP(ij, s) vj