Title: Chapter3: Language Translation issues
1Chapter3 Language Translation issues
- Programming language Syntax
- Key criteria concerning syntax
- Basic syntactic concepts
- Overall Program-Subprogram structure
- Stages in Translation
- Analysis of the source program
- Synthesis of the object program
- Bootstrapping
2What is Syntax
The syntax of a programming language describes
the structure of programs without any
consideration of their meaning.
3Key criteria concerning syntax
Readability a program is considered readable if
the algorithm and data are apparent by
inspection. Writeability ease of writing the
program. Verifiability ability to prove program
correctness (very difficult issue)
Translatability ease of translating the
program into executable form. Lack of ambiguity
the syntax should provide for ease of avoiding
ambiguous structures
4Basic syntactic concepts
- Character set The alphabet of the language.
Several different character sets are used ASCII,
EBCIDIC, Unicode - Identifiers strings of letters of digits
usually beginning with a letter - Operator Symbols -/
- Keywords or Reserved Words used as a fixed part
of the syntax of a statement
5Basic syntactic concepts
- Noise words optional words inserted into
statements to improve readability - Comments used to improve readability and for
documentation purposes. Comments are usually
enclosed by special markers - Blanks rules vary from language to language.
Usually only significant in literal strings
6Basic syntactic concepts
- Delimiters used to denote the beginning and
the end of syntactic constructs - Expressions functions that access data objects
in a program and return a value - Statements these are the sentences of the
language, they describe a task to be performed
7Overall Program-Subprogram Structure
Separate subprogram definitions Separate
compilation, linked at load time E.G.
C/C Separate data definitions General approach
in OOP. Nested subprogram definitions Subprogram
definitions appear as declarations within the
main program or other subprograms. E.G. Pascal
8Overall Program-Subprogram Structure
Separate interface definitions C/C header
files Data descriptions separated from executable
statements. A centralized data division contains
all data declarations. E.G. COBOL Unseparated
subprogram definitions No syntactic distinction
between main program statements and subprogram
statements. E.G BASIC
9Stages in Translation
- Analysis of the source program
- Synthesis of the object program
- Bootstrapping
10Analysis of the source program
Lexical analysis (scanning) identifying the
tokens of the programming language keywords,
identifiers, constants and other symbols In the
program void main() printf("Hello
World\n") the tokens are void, main, (, ),
, printf, (, "Hello World\n", ), ,
11Syntactic and semantic analysis
Syntactic analysis (parsing) determining the
structure of the program, as defined by the
language grammar. Semantic analysis - assigning
meaning to the syntactic structures Example
int variable1 meaning 4 bytes for
variable1 , a specific set of operations to be
used with variable1.
12Basic semantic tasks
- The semantic analysis builds the bridge between
analysis and synthesis. - Basic semantic tasks
- Symboltable maintenance
- Insertion of implicit information
- Error detection
- Macro processing
- Result an internal representation, suitable to
be used for code optimization and code generation.
13Synthesis of the object program
Three main steps Optimization - Removing
redundant statements Code generation -
generating assembler commands with relative
memory addresses for the separate program modules
- obtaining the object code of the
program. Linking and loading - resolving the
addresses - obtaining the executable code of the
program.
14Optimization example
Assembler code not optimized LOAD_R B ADD_R
C STORE_R Temp1 LOAD_R Temp1 ADD_R D STORE_R
Temp2 LOAD_R Temp2 STORE_R A
Intermediate code Temp1 B C Temp2 Temp1
D A Temp2
Statements in yellow can be removed
15Bootstrapping
- The compiler for a given language can be written
in the same language. - a program that translates some internal
representation into assembler code - the programmer manually re-writes the compiler
into the internal representation, using the
algorithm that is encoded into the compiler. - From there on the internal representation is
translated into assembler and then into machine
language.