Title: Language Translation Issues
1Language Translation Issues
2Text References
3Content
- Programming language syntax
- Stages in translation
- Formal translation models
4Syntax
- Syntax, which is defined as the arrangement of
words as elements in a sentence to show their
relationship, describes the sequence of symbols
that make up valid programs. - Syntax
- provides significant information needed for
understanding a program and - provides much-needed information toward the
translation of source program into an object
program.
5Syntax
- Syntax alone is insufficiently to unambiguously
specify the structure of a statement.
x2.453.67
Was X declared? Was X declared as type
real? Integer or real addition?
6Semantics
- We need more than just syntactic structures for
the full description of a programming language. - Semantics is concerned with
- the use of declarations, operations, sequence
control, and referencing environments
7General Syntactic Criteria
- Readability
- understandable without any separate documentation
- Enhanced by such language features as
- natural statement formats
- structured statements
- liberal use of keywords and noise words,
- provision for embedded comments,
- unrestricted length identifiers
- mnemonic operator symbols
- free-field formats, and
- complete data declarations
- Enhanced if syntactic differences reflecting
underlying semantic difference
8General Syntactic Criteria
- Writeability
- Often in conflict with readability
- Default rules reducing redundancy if inferrable
from the context - E.g., FORTRANs implicit typing
9General Syntactic Criteria
- Ease of verifiability
- Ease of translation
- Lack of ambiguity
if e1 then if e2 then s1 else s2
if e1 then if e2 then s1 else s2
if e1 then if e2 then s1 else s2
10Syntactic Elements of a Language
- Character set
- Identifiers
- Operator symbols
- Keywords and reserved words
- Noise words
- Comments
- Blanks (spaces)
- Delimiters and brackets
- Free- and fixed-field formats
- Expressions
- Statements
11Overall Program-Subprogram Structure
- Separate subprogram definitions
- Each is defined as a separate syntactic unit.
- Compiled separately and linked at load time
- Separate data definitions
- The class mechanism in OO languages
- Nested subprogram definitions
- Pascal
- Separate interface definitions
- Data descriptions separated from executable
statements - Unseparated subprogram definitions
12Stages in Translation
Source program
Lexical analysis
SOURCE PROGRAM RECIGNITION PAHSES
Lexical tokens
Syntactic analysis
Symbol table
Executable code
Other tables
Parse tree
Object code from other compilation
Semantic analysis
Linking
Object code
Intermediate code
Optimization
Code generation
OBJECT CODE GENERATION PAHSES
Optimized intermediate code
13Lexical Analysis
- Group the source program, a long undifferentiated
sequence of symbols, into its elementary
constituents - identifier, delimiters, operator symbols,
numbers, keywords,noise words, blanks, comments,
etc. - The basic model used to design lexical analyzer
is the finite-state automata.
DO 10 I1,5 DO 10 I1.5 DO10I1.5
14Syntactic Analysis (Parsing)
- Larger program structures are identified
- statements,
- declarations,
- expressions
15Semantic Analysis
- The bridge between analysis and synthesis
- Common functions of semantic analyzers
- Symbol-table maintenance
- Insertion of implicit information
- Error detection
- Macro processing and compile-time operations
16Synthesis of the Object Program
- Optimization
- Code generation
- Linking and loading
17Formal Translation Models
- The formal definition of the syntax of a
programming language is called a grammar. - The two classes of grammars useful in compiler
technology include - the BNF grammar (or context-free grammar) and
- the regular grammar.
18BNF Grammars
- A sentence may be a simple declarative sentence
or a simple interrogative sentence.
ltsentencegt ltdeclarativegt ltinterrogativegt ltde
clarativegt ltsubjectgt ltverbgt
ltobjectgt. ltsubjectgt ltarticlegt
ltnoungt ltinterrogativegt ltauxiliary verbgt
ltsubjectgtltpredicategt?
Backus-Naur form
19Syntax BNF Grammars
- A BNF grammar is composed of a finite set of BNF
grammar rules, which define a language, in our
case, programming language. - Because syntax is concerned only with form rather
than meaning, a language, considered
syntactically, consists of a set of syntactically
correct programs.
The home / ran / the girl.
Syntactically correct, but does not make
sense under nominal interpretations of these
words.
20Syntax BNF Grammars
- Natural language is incomplete in describing the
syntax rules of programming language. - Formal mathematical set of rules is given to
solve the problems on using natural language.
ltdigitgt 1123456789 ltconditional
statementgt if ltBoolean expressiongt then
ltstatementgt else ltstatementgt if
ltBoolean expressiongt then ltstatementgt ltunsigned
integergt ltdigitgt ltunsigned integergt
ltdigitgt
21Parse Trees
- Given a grammar, we can use a single-replacement
rule to generate strings in out language. - For example, all balanced parentheses can be
generated by the grammar
S ? SS(S)() S ? (S) ? (SS) ? (()S) ? (()())
is derived from
sentential form
22Parse Trees
- To determine if a given string represents a
syntactically valid program in the language
defined by a BNF grammar, we must use the
grammar rules to construct a syntactic analysis
or parse of the string. - If the string can be successfully parsed, then it
is in the language. - If no way can be found of parsing the string with
the given grammar rules, then the string is not
in the language.
Grammar for simple assignment statements Parse
tree for an assignment statement
23ltassignment statementgt ltvariablegtltarithmeti
c expressiongt ltarithmetic expressiongt
lttermgt ltarithmetic expressiongtlttermgt
ltarithmetic expressiongt-lttermgt lttermgt
ltprimarygt lttermgtltprimarygt
lttermgt/ltprimarygt ltprimarygt ltvariablegt
ltnumbergt (ltarithmetic expressiongt) ltvariablegt
ltidentifiergt ltidentifiergtltsubscript
listgt ltsubscribe listgt ltarithmetic
expressiongt ltsubscribe listgt,ltarithmetic
expressiongt
Grammar for simple assignment statement
24ltassignment statementgt
ltarithmetic expressiongt
ltvariablegt
lttermgt
ltidentifiergt
ltprimarygt
lttermgt
W
)
ltprimarygt
(
ltarithmetic expressiongt
ltvariablegt
ltarithmetic expressiongt
lttermgt
lttermgt
ltprimarygt
ltidentifiergt
ltprimarygt
ltvariablegt
Y
ltvariablegt
ltidentifiergt
Parse tree for an assignment statement
ltidentifiergt
V
U
25Ambiguity
G1S?SS01
G2T?0T1T01
26Extension to BNF
27Syntax Charts
28Finite-State Automata
- The lexical analysis phase of the compiler breaks
down the source program into a stream of tokens,
such as identifiers, integers, IF, etc. - A simple model, called a finite-state automaton
(FSA), recognizes such tokens.
0
0
1
FSA to recognize an odd number of 1s
A
1
Input 100101
29Finite-State Automata
FSA to recognize optionally signed integers
digit
digit
digit
-digitdigit
30Finite-State Automata
- Deterministic FSA
- For each state, and for each input symbol, we
have a unique transition to the same or different
state. - Nondeterministic FSA
- The presence of multiple arcs from a state with
the same label so that you have a choice as which
way to go
31Regular Grammars
- Regular grammar is a special case of BNF, which
turns out to be equivalent to the FSA language. - Form
ltnonterminalgtltterminalgtltnonterminalgtltterminalgt
A ? 0A 1A 0
32Regular Expressions
- A third form of language definition that is
equivalent to the FSA and regular grammar - Defined recursively as
- Individual terminal symbols are regular
expressions. - If a and b are regular expressions, then so are
a?b, ab, (a), and a. - Nothing else is a regular expression.
- We can use regular expressions to represent any
language defined by a regular grammar or FSA,
although converting any FAS to a regular
expression is not always obvious.
33Regular Expressions
34Regular Expressions
0
0
0
1
1
1
Converting (0?1)01(0?1) to an FSA
35Pushdown Automata
- A pushdown automata (PDA) is equivalent to the
BNF grammar. - A PDA is an abstract model machine similar to the
FSA. - It has a finite set of states.
- In addition, it has a pushdown stack.
36Pushdown Automata
- Moves of the PDA
- An input symbol is read and the top symbol on the
stack is read. - Based upon both inputs, the machine enters a new
state and write zero or more symbols onto the
pushdown stack. - Acceptance of a string occurs if the stack is
even empty. (Alternatively, acceptance can be if
the PDA is in a final state. Both models can be
shown to be equivalent.)
37Pushdown Automata
- PDAs are more powerful than FSAs by examining the
recognition of anbn. - It can not be recognized by an FSA but can be
easily recognized by a PDA. - Simply stack the initial a symbols, and for each
b, pop an a off the stack. - If the end of input is reached at the same time
that the stack becomes empty, the string is
accepted.
38Efficient Parsing Algorithms
- From Chomskys work, each type of formal grammar
is closely related to a type of automaton, - a simple abstract machine that usually defined to
be - capable of reading an input tape containing a
sequence of characters and - producing an output tape containing another
sequence of characters.
39Efficient Parsing Algorithms
- Problem
- Because a BNF grammar may be ambiguous, the
automaton must be nondeterministic - For programming language translation, a more
restricted automaton that never has to guess
called a deterministic automaton is needed. - For unambiguous BNG grammars straightforward
parsing techniques has been discovered. - Recursive descent parser
- LR grammars (left-to-right parsing algorithms)
describe all BNF grammars recognized
deterministic pushdown automata. - LR(1), SLR, LALR
40Recursive Descent Parsing
- Basic idea, for example
- ltarithmetic expressiongtlttermgt-lttermgt
- This states that we first recognize a lttermgt, and
then as long as the next symbol is either or -,
we recognize another lttermgt. - Assumption
- the variable nextchar always contains the first
character of the respective nonterminal and - the function getchar reads in a character, then
- we may directly rewrite the above extended BNF
rule as the following recursive procedure
41(No Transcript)
42Semantic Modeling
- The following slides, which were selected from
the Sebestas teaching material, describe the
operational semantics.
43Imperative or Operational Models
- Operational Semantics
- Describe the meaning of a program by executing
its statements on a machine, either simulated or
actual. The change in the state of the machine
(memory, registers, etc.) defines the meaning of
the statement - To use operational semantics for a high-level
language, a virtual machine in needed - A hardware pure interpreter would be too
expensive - A software pure interpreter also has problems
- The detailed characteristics of the particular
computer would make actions difficult to
understand - Such a semantic definition would be
machine-dependent
44Imperative or Operational Models
- A better alternative A complete computer
simulation - The process
- Build a translator (translates source code to the
machine code of an idealized computer) - Build a simulator for the idealized computer
- Evaluation of operational semantics
- Good if used informally
- Extremely complex if used formally (e.g., VDL)