Title: Introduction to Compilation
1Introduction to Compilation
2What is a compiler?
- a program that translates an executable program
in one language into an executable program in
another language - the compiler typically lowers the level of
abstraction of the program - for optimizing compilers, we also expect the
program produced to be better, in some way, than
the original
3Abstract view of compiler
- Implications
- recognize legal (and illegal) programs
- generate correct code
- manage storage of all variables and code
- need format for object (or assembly) code
4Traditional decomposition of a compiler
- Implications
- intermediate language (il)
- front end maps legal code into il
- back end maps il onto target machine
- simplify retargeting
- allows multiple front ends
- multiple phases gt better code
- Front end is O(n) or O(n log n) Back end is
NP-Complete
5Advantage of the decomposition
6Components of a Compiler
- Analysis
- Lexical Analysis
- Syntax Analysis
- Semantic Analysis
- Synthesis
- Intermediate Code Generation
- Code Optimization
- Code Generation
7The Structure of a Compiler
- Front-end
- Lexical Analysis
- Parsing
- Semantic Analysis
- intermediate code generation
- back-end
- Optimization
- Code Generation
- The first 3, at least, can be understood by
analogy to how humans comprehend a natural
language.
8Responsibilities of Frond End
- recognize legal programs
- report errors
- produce il
- preliminary storage map
- shape the code for the back end
- Much of front end construction can be automated
9Responsibilities of Back-end
- code optimization middle-end
- analyzes and changes il
- goal is to reduce runtime
- must preserve values
- code generation
- translate il into target machine code
- choose instructions for each il operation
- decide what to keep in registers at each point
- ensure conformance with system interfaces
10Lexical Analysis
- First step recognize words.
- Smallest unit above letters
- Compiler is an interesting course.
- Note the
- Capital C (start of sentence symbol)
- Blank (word separator)
- Period . (end of sentence symbol)
11More Lexical Analysis
- Lexical analysis is not trivial. Consider
- ????????????
- Programming languages are typically more cryptic
than English - h-gtj -12.345e-5
12And More Lexical Analysis
- Lexical analyzer divides program text into
words or tokens - if x y then z 1 else z 2
- Units
- if, x, , y, then, z, , 1, , else, z, , 2,
13Parsing (syntax analysis)
- Once words are understood, the next step is to
understand sentence structure - Parsing Diagramming Sentences
- The diagram is a tree
14Diagramming a Sentence
VP
sentence
15Parsing Programs
- Parsing program expressions is the same
- Consider
- If x y then z 1 else z 2
- Diagrammed
16Semantic Analysis
- Once sentence structure is understood, we can try
to understand meaning - But meaning is too hard for compilers
- Compilers perform limited analysis to catch
inconsistencies - Some do more analysis to improve the performance
of the program
17Semantic Analysis in Natural Language
- Example
- ????????????.
- ???????? ??,?? or ??? ?
- Even worse
- Jack said Jack left his assignment at home?
- How many Jacks are there?
- Which one left the assignment?
18Semantic Analysis in Programming
- Programming languages define strict rules to
avoid such ambiguities - This C code prints 4 the inner definition is
used - Illegal in Java.
-
- int x 3
-
- int x 4
- cout ltlt x
-
-
19More Semantic Analysis
- Compilers perform many semantic checks besides
variable bindings - Example
- John loves her sister.
- A type mismatch between her and John we know
they are different people - Presumably John is male
20Optimization
- No strong counterpart in English, but akin to
editing - Automatically modify programs so that they
- Run faster
- Use less memory
- In general, conserve some resource
21Optimization Example
- X Y 0 is the same as X 0
- X Y 2 is the same as X Y Y
- Assume X and Y are integers
22Code Generation
- Produces assembly code (usually)
- A translation into another language
- Analogous to human translation
23Intermediate Languages
- Many compilers perform translations between
successive intermediate forms - All but first and last are intermediate languages
internal to the compiler - Typically there is 1 IL
- ILs generally ordered in descending level of
abstraction - Highest is source
- Lowest is assembly
24Intermediate Languages (Cont.)
- ILs are useful because lower levels require
exposure of many features hidden by higher levels - registers
- memory layout
- etc.
- It is hard to obtain all these hidden features
directly from the source input.
25Example
- source line a bbabs(c-7)
- a sequence of ASCII characters in a text file.
- The scanner groups characters into tokens
- a bbabs(c-7)
- After scanning, we have the token sequence
- Ida Asg Idbb Plus Idabs Lparen Idc
Minus IntLiteral7 Rparen Semi
26Example
- The parser groups these tokens into parse tree
note (, ) and disappear in the tree.
27- The type checker resolves types and binds
declarations within scopes
28- Finally, JVM code is generated for each node in
the tree (leaves first, then roots) - iload 3 // push local 3 (bb)
- iload 2 // push local 2 (c)
- ldc 7 // Push literal 7
- isub // compute c-7
- invokestatic java/lang/Math/abs(I)I
- iadd // compute bbabs(c-7)
- istore 1 // store result into local 1(a)
29Issues
- Compiling is almost this simple, but there are
many pitfalls. - Example How are erroneous programs handled?
- Language design has big impact on compiler
- Determines what is easy and hard to compile
- Course theme many trade-offs in language design
30Compilers Today
- The overall structure of almost every compiler
adheres to the outline - The proportions have changed since FORTRAN
- Early lexing, parsing most complex, expensive
- Today optimization dominates all other phases,
lexing and parsing are cheap
31Applications of Compilation Techniques
- Editor
- Interpreter
- Debugger
- Word Processing (Tex, Word)
- VLSI Design (VHDL, Verilog)
- Pattern Recognition
32Trends in Compilation
- Compilation for speed is less interesting. But
- scientific programs
- advanced processors (Digital Signal Processors,
advanced speculative architectures) - Ideas from compilation used for improving code
reliability - memory safety
- detecting data races
- ...