Title: Compilers Modern Compiler Design
1CompilersModern Compiler Design
Interpretation Code Generation
NCYU C. H. Wang
2Overview
3Interpretation
- An interpreter is a program that consider the
nodes of the AST in the correct order and
performs the actions prescribed for those nodes
by the semantics of the language. - Two varieties
- Recursive
- Iterative
4Interpretation
- Recursive interpretation
- operates directly on the AST attribute grammar
- simple to write
- thorough error checks
- very slow 1000x speed of compiled code
- Iterative interpretation
- operates on intermediate code
- good error checking
- slow 100x speed of compiled code
5Recursive Interpretation
6Self-identifying data
- must handle user-defined data types
- value pointer to type descriptor
- array of subvalues
- example complex number
3.0
re
4.0
im
7Complex number representation
8Iterative interpretation
- Operates on threaded AST
- Active node pointer
- Flat loop over a
- case statement
9Sketch of the main loop
10Example for demo compiler
11Code Generation
- Compilation produces object code from the
intermediate code tree through a process called
code generation - Tree rewriting
- Replace nodes and subtrees of the AST by target
code segments - Produce a linear sequence of instructions from
the rewritten AST
12Example of code generation
13Machine instructions
- Load_Addr MRi, C, Rd
- Loads the address of the Ri-th element of the
array at M into Rd, where the size of the
elements of M is C bytes - Load_Byte (MRo)Ri, C, Rd
- Loads the byte contents of the Ri-th element of
the array at M plus offset Ro into Rd, where the
other parameters have the same meanings as above
14Two sample instructions with their ASTs
15Code generation
- Main issues
- Code selection which template?
- Register allocation too few!
- Instruction ordering
- Optimal code generation is NP-complete
- Consider small parts of the AST
- Simplify target machine
- Use conventions
16Object code sequence
- Load_Byte (bRd)Rc, 4, Rt
- Load_Addr 9Rt, 2, Ra
17Trivial code generation
18Code for (7(15))
19Partial evaluation
20New Code
21Simple code generation
- Consider one AST node at a time
- Two simplistic target machines
- Pure register machine
- Pure stack machine
stack
SP
vars
BP
22Pure stack machine
23Example of pp5
- Push_Local p
- Push_Const 5
- Add_Top2
- Store_Local p
24Pure register machine
25Example of pp5
- Load_Mem p, R1
- Load_Const 5, R2
- Add_Reg R2, R1
- Store_Reg R1, p
26Simple code generation for a stack machine
27The ASTs for the stack machine instructions
28The AST for bb - 4(ac) rewritten
29Simple code generationfor a stack machine (demo)
- example bb 4ac
- threaded AST
-
b
b
4
a
c
30Simple code generationfor a stack machine (demo)
- example bb 4ac
- threaded AST
Sub_Top2
-
Mul_Top2
Mul_Top2
b
b
4
Mul_Top2
Push_Local b
Push_Local b
Push_Const 4
a
c
Push_Local a
Push_Local c
31Simple code generationfor a stack machine (demo)
Push_Local b Push_Local b Mul_Top2 Push_Const
4 Push_Local a Push_Local c Mul_Top2 Mul_Top2 Su
b_Top2
- example bb 4ac
- rewritten AST
32Depth-first code generation
33Stack configurations
34Simple code generation for a register machine
- The ASTs for the register machine instructions
35Code generation with register allocation
36Code generation with register numbering
37Register machine code for bb - 4(ac)
38Register contents
39Weighted register allocation
- It is advantageous to generate the code for the
child that requires the most registers first - Weight
- The number of registers required by a node
40Register weight of a node
41AST for bb-4(ac) with register weights
42Weighted register machine code
43Example
- Parameter number N 2 3 1
- Stored weight 4 2
1 - Registers occupied when 0 1 2
- starting parameter N
- Maximum per parameter 4 3 3
- Overall maximum 4
44Example Tree representation
45Register spilling
- Too few registers?
- Spill registers in memory, to be retrieved later
- Heuristic select subtree that uses all
registers, and replace it by a temporary - example
- bb 4ac
- 2 registers
3
2
2
2
2
1
1
1
1
1
46Register spilling
Load_Mem b, R1 Load_Mem b, R2 Mul_Reg R2,
R1 Store_Mem R1, T1 Load_Mem a, R1 Load_Mem c,
R2 Mul_Reg R2, R1 Load_Const 4, R2 Mul_Reg R1,
R2 Load_Mem T1, R1 Sub_Reg R2, R1
47Another example
3
2
2
2
2
1
1
1
48Algorithm
49Machines with register-memory operations
- An instruction
- Add_Mem X, R1
- Adding the contents of memory location X to R1
50Register-weighted tree for a memory-register
machine
51Code generation for basic blocks
- Finding the optimal rewriting of the AST with
available instruction templates is NP-complete. - Three techniques
- Basic blocks
- Bottom-up tree rewriting
- Register allocation by graph coloring
52Basic block
- Improve quality of code emitted by simple
code generation - Consider multiple AST nodes at a time
- Generate code for maximal basic blocks that
cannot be extended by including adjacent AST nodes
basic block a part of the control graph that
contains no splits (jumps) or combines (labels)
53Example of basic block
- A basic block consists of expressions and
assignments - Fixed sequence () limits code generation
- An AST is too restrictive
54From AST to dependency graph
- AST for the simple basic block
55Simple algorithm to convert AST to a data
dependency graph
- Replace arcs by downwards arrows (upwards for
destination under assignment) - Insert data dependencies from use of V to
preceding assignment to V - Insert data dependencies from the assignment to a
variable V to the previous assignment to V - Add roots to the graph (output variables)
- Remove -nodes and connecting arrows
56Simple data dependency graph
57Cleaned-up graph
58Exercise
int n n a1 x (bc) n n
n1 y (bc) n
Convert the above codes to a data dependency graph
59Answer
60Common subexpression elimination
- Simple example
- xaa2ab bb
- yaa-2ab bb
- Three common subxpressions
- double quads aa bb
- double cross_prod 2ab
- x quads cross_prod
- y quads cross_prod
61Common subexpression
- Equal subexpression in a basic block are not
necessarily common subexpressions - xaa2ab bb
- ab0
- yaa-2ab bb
62Common subexpression example (1/3)
63Common subexpression example (2/3)
64Common subexpression example (3/3)
65From dependency graph to code
- Rewrite nodes with machine instruction templates,
and linearize the result - Instruction ordering ladder sequences
- Register allocation graph coloring
66Linearization of thedata dependency graph
- Example
- (ab)c d
- Definition of a ladder sequence
- Each root node is a ladder sequence
- A ladder sequence S ending in operator node N can
be extended with the left operand of N - If operator N is commutative then S may also
extended with the right operand of N
Load_Mem a, R1 Add_Mem b, R1 Mul_Mem, c,
R1 Sub_Mem d, R1
67Code generated for a given ladder sequence
load_Mem b, R1 Add_Reg I1, R1 Add_Mem
c, R1 Store_Reg R1, x
68Heuristic ordering algorithm
- To delay the issues of register allocation, use
pseudo-registers during the linearization
- Select ladder sequence S without more than one
incoming dependencies - Introduce temporary (pseudo-) registers for
non-leaf operands, which become additional roots - Generate code for S, using R1 as the ladder
register - Remove S from the graph
- Repeat step 1 through 4 until the entire data
dependency graph has been consumed and rewritten
to code
69Example of linearization
X1
70The code for y, ,
- Load_Reg X1, R1
- Add_Const 1, R1
- Multi_Mem d, R1
- Store_Reg R1, y
71Remove the ladder sequence y, ,
72The code for x, , ,
- Load_Reg X1, R1
- Mult_Reg X1, R1
- Add_Mem b, R1
- Add_Mem c, R1
- Store_Reg R1, x
73The Last step
- Load_Mem a, R1
- Add_Const 1, R1
- Load_Reg R1, X1
74The results of code generation
75Exercise
- Generate code for the following dependency graph
x
y
-
2
76Answers
R4
R2
R3
77Register allocation for the linearized code
- Map the pseudo-registers to memory locations or
real registers
gcc compiler
78Code optimization in the presence of pointers
- Pointers cause two different problems for the
dependency graph - ax y
- p 3
- b x y
- ap y
- b 3
- c p q
x y is not a common subexpression if p
happens to point to x or y
p q is not a common subexpression if p
happens to point to b
79Example (1/4)
- Assignment under a pointer
80Example (2/4)
Data dependency graph with an assignment under a
pointer
81Example (3/4)
Cleaned-up graph
82Example (4/4)
xR1
Target code
83BURS code generation
- In practice, machines often have a great variety
of instructions, simple ones and complicated
ones, and better code can be generated if all
available instructions are utilized. - Machines often have several hundred different
machine instructions, often each with ten or more
addressing modes, and it would be very advantages
if code generators for such machines could be
derived from a concise machine description rather
than written by hand.
84BURS code generation
- Simple instruction patterns (1/2)
85BURS code generation
- Simple instruction patterns (2/2)
86Example Input tree
87Naïve rewrite
- Its cost is 17 units
- 1 3 4 1 4 3 1 17
88Code resulting
89Top-down largest-fit rewrite
90Discussions
- How do we find all possible rewrites, and how do
we represent them? It will be clear that we do
not fancy listing them all!! - How do we find the best/cheapest rewrite among
all possibilities, preferably in time linear in
the size of the expression to be translated.
91Bottom-up pattern matching
92Outline code for bottom-up pattern matching
93Label set resulting
94Instruction selection by dynamic programming
- Bottom-up pattern matching with costs
5-gtreg 6-gtreg 7.1 8.1
Instructions selection
95Cost evaluation
- Lower
- 5-gtreg_at_7
- 6-gtreg_at_8 (134)
- Higher
- 6-gtreg_at_12 (174)
- 8-gtreg_at_9 (135)
- Top (?)
- Exercise
96Code generation by bottom-up matching
97Code generation by bottom-up matching, using
commutativity
98Pattern matching and instruction selection
combined
- Two basic operands
- State S1
- -gt cst_at_0
- 1-gtreg_at_1
- State S2
- -gt mem_at_0
- 2-gtreg_at_3
99States of the BURS
100Creating the cost-conscious next-state table
- The triplet , S1, S1S3
- S3
- 4-gtreg_at_3 (111)
- , S1, S2 S5
- S5
- 3-gtreg_at_1034
- 4-gtreg_at_1315
- Exercise , S1, S5
- Exercise , S1, S2
- 5-gtreg_at_1067 (4)
- 6-gtreg_at_1348
- 7.1_at_0303 (0)
- 8.1_at_0303 (0)
101Cost conscious next table
102Code generation using cost-conscious next-state
table
103Register allocation by graph coloring
- Procedure-wide register allocation
- Only live variables require register storage
- Two variables(values) interfere when their live
ranges overlap
dataflow analysis a variable is live at node N
if the value it holds is used on some path
further down the control-flow graph otherwise it
is dead
104A program segment for live analysis
105Live range of the variables
106Graph coloring
- NP complete problem
- Heuristic color easy nodes last
- Find node N with lowest degree
- Remove N from the graph
- Color the simplified graph
- Set color of N to the first color that is not
used by any of Ns neighbors
107Coloring process
3 registers
108Preprocessing the intermediate code
- Preprocessing of expressions
- char lower_case_from_capital(char ch)
- return ch (a A)
-
- Constant expression evaluation
- char lower_case_from_capital(char ch)
- return ch 32
-
109Arithmetic simplification
- Transformations that replace an operation by a
simpler one are called strength reductions. - Operations that can be removed completely are
called null sequences.
110Some transformations for arithmetic simplification
111Preprocessing of if-statements and goto statements
- When the condition in an if-then-else statement
turns out to be constant, we can delete the code
of the branch that will never be executed. This
process is called dead code elimination. - If a goto or return statement is followed by code
that has no incoming data flow, that code is
dead and can be eliminated.
112Stack representations
113Stack representations (details)
IF
condition
ELSE
gt
x 7
y
0
FI
114Preprocessing of routines
115In-lining result
Advanced examples int n3 printf(squared\n,
nn) gt int n3 printf(squared\n,
33) gt int n3 printf(squared\n, 9)
Load_par squared\n Load_par 9 Call
printf
116Cloning
- Example
- double poewr_series(int n, double a, double x)
- int p
- for (p0 pltn p) result ap (xp)
- return result
-
- Is called with x set to 1.0
double poewr_series(int n, double a) int p
for (p0 pltn p) result ap (1.0p)
return result
double poewr_series(int n, double a) int p
for (p0 pltn p) result ap return
result
117Postprocessing the target code
- Stupid instruction sequences
- Load_Reg R1, R2
- Load_Reg R2, R1
- or
- Store_Reg R1, n
- Load_Mem n, R1
118Creating replacement patterns
- Example
- Load_Reg Ra, Rb Load_Reg Rc, Rd
- RaRd, RbRc gt Load_Reg Ra, Rb
- Load_const 1, Ra Add_Reg Rb, Rc
- RaRb, is_last_use(Rb) gt Increment Rc
119Locating and replacing instructions
- Multiple pattern matching
- Using FSA
- Dotted items
120Homework
- Study sections
- 4.2.13 Machine code generation
- 4.3 Assemblers, linkers and loaders