Title: Compiler Construction
1Compiler Construction
2Compilers are Translators
- C/C
- Fortran
- Java
- PERL
- MATLAB
- Natural Lang.
- Command Lang.
- Machine code
- Virtual machine code
- Transformed code (C, Java, )
- Lower level commands
- Semantic Components
translate
3Translation Mechanisms
- Compilation
- To translate a source program in one language
into an executable program in another language
and produce results while executing the new
program - Examples C, C, FORTRAN
- Interpretation
- To read a source program and produce the results
while understanding that program - Examples BASIC, LISP
- Case Study JAVA
- First, translate to java bytecode
- Second, execute by interpretation (JVM) /
compilation (JIT)
4Comparison of Compiler/Interpreter
Results
interpreter
Source Code
Data
5Phases of A Modern Compiler
Source Program
IF (altb) THEN c1d
Lexical Analyzer
IF
(
ID a
lt
ID b
THEN
ID c
CONST 1
ID d
Token Sequence
a
Syntax Analyzer
cond_expr
lt
b
Syntax Tree
IF_stmt
lhs
c
list
1
assign_stmt
rhs
Semantic Analyzer
d
GE a, b, L1 MUlT 1, d, c L1
3-Address Code
Code Optimizer
GE a, b, L1 MOV d, c L1
loadi R1,a cmpi R1,b jge L1 loadi R1,d storei
R1,c L1
Optimized 3-Addr. Code
Code Generation
Assembly Code
6Lexical and Syntax Analysis
- Lexical Analysis
- Recognize token ? smallest unit over letters
- Analyze input (strings of characters) from source
- Scan from left to right
- Report errors
- Syntax Analysis
- Group tokens into hierarchy groups
7Semantic Analysis
- Determine the meaning using the structure
- Checks performed to ensure components fit
together meaningfully - Information is added
- Limited analysis to catch inconsistencies, e.g.
type checking - Put semantic meaning in the structure
- Produce IR (intermediate representation)
- Easier to generate machine code from IR
8Code Optimization and Generation
- Code Optimization
- modify program representation so that program
- Run faster
- Uses less memory
- In general, reduce the resource consumed
- Code Generation produce target code
- Instruction selection
- Memory allocation
- Resource allocation registers, processors, etc.
9Symbol Table Management
- Collect and maintain information about ids
- Attributes storage, type, scope, number,
- Used by most compiler passes
- Phases add information
- lexical, parsing, semantic
- Phases use information
- code optimization, code generation
- Debuggers use some form of symbol table
10Traditional Three-pass Compilation
Front End
Middle End
Back End
Source program
IR
IR
machine code
error
- Code Optimization
- Analyzes IR and rewrites (or transforms) IR
- Primary goal is to reduce running time of the
compiled code - May also improve space, power consumption,
- Must preserve meaning of the code
- Measured by values of named variables
11Distinction Between Phases and Passes
- Passes number of times through a program
representation - 1-pass, 2-pass, multi-pass compilation
- Language become more complex more passes
- Phases conceptual and sometimes physical stages
- Symbol table coordinates information between
phases - Phases are not completely separate
- Semantic phase may do things that syntax phase
should do - Interaction are possible
12Compiler Tools
- Automatic Generator
- Lexical Analysis v lex, flex
- Syntax Analysis v yacc, bison
- Semantic Analysis
- Code Optimization
- Code Generation
13Compilers vs. Language Design
- There is a strong mutual influence
- Hard to compile languages are hard to read
- Easy to compile languages lead to quality
compilers, better code, smaller compiler, more
reliable, cheaper, wider use, better
diagnostics. - Example dynamic typing
- seems convenient because type declaration is
not needed - However, such languages are
- hard to read because the type of an identifier
is not known - hard to compile because the compiler cannot
make assumptions about the identifiers type.
14Compilers vs. Computer Architecture
- Complex instructions were available when
programming at assembly level. - RISC architecture became popular with the advent
of high-level languages. - Today, the development of new instruction set
architectures (ISA) is heavily influenced by
available compiler technology.