Title: MIT 6'035 Introduction to Compilation
1MIT 6.035Introduction to Compilation
- Martin Rinard
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Programming Language Dilemma
- Stored program computer
- How to instruct computer what to do?
- Need a program that computer can execute
- Must be written in machine language
- Unproductive to code in machine language
- Design a higher level language
- Implement higher level language
- Alternative 1 Interpreter
- Alternative 2 Compiler
3Compilation As Translation
Starting Point
Source Program in Some Programming Language
Compiler
Generated Program in Machine Language
Ending Point
4Starting Point
- Standard imperative language (Java, C, C)
- State
- Variables,
- Structures,
- Arrays
- Computation
- Expressions (arithmetic, logical, etc.)
- Assignment statements
- Control flow (conditionals, loops)
- Procedures
5Ending Point State (SPARC)
- Memory (32 bit addresses, byte addressable)
- 32 Integer Registers
- g0-g7 global registers
- g0 reads as 0, writes have no effect
- o0-o7 output registers
- l0-l7 local registers
- i0-i7 input registers
- Condition Codes
- Indicate results of integer operations
- Used in branch instructions
6Ending Point Computation (SPARC)
- ld ltaddrgt,ltreggt
- st ltreggt, ltaddrgt
- ltbinary opgt ltsrc1gt, ltsrc2gt, ltdstgt
- ltcomp opgt ltsrc1gt, ltsrc2gt
- ltbranch opgt ltaddressgt
- Conditional
- Unconditional
7Exploring Compiler Behavior
- Start with sample input programs
- Compile to assembler using cc S
- Try to match up source code with generated code
- State Translation
- Variables (global, local, parameters)
- Structures and arrays
- Computation Translation
- Expression evaluation and assignment
- Flow of control constructs
- Procedure call and return
8Implementing Global Variables
- Allocate memory location for variable
- When program accesses variable, compiler
generates load and store instructions
Memory
a
int a, b, c
b
c
9Implementing Global Variables
sethi hi(b),l0 or l0,lo(b),l0 ld
l00,l2 sethi hi(c),l0 or
l0,lo(c),l0 ld l00,l1 add
l2,l1,l1 sethi hi(a),l0 or
l0,lo(a),l0 st l1,l00
Load value of b into l0
Load value of c into l1
.align 8 .common a,4,4 .common b,4,4 .common
c,4,4
Add l0 and l1
Store result into a
Allocate storage for a,b, and c
10Sethi Instruction
- Encode parts of address in instruction stream
- Machine code format of sethi instruction
- Effect of sethi instruction
- Replace top 22 bits of ltreggt with ltimmediategt
- Set bottom 10 bits of ltreggt to zero
- Example of general theme store constant values
in immediate fields of instructions
00
5 bit ltreggt
100
22 bit ltimmediategt
11Implementing Local Variables
- Concept of procedure call stack
- Each procedure invocation has state
- Local variables
- Return address
- Stores state in frame
- New frame allocated for each call
- Frames usually allocated on call stack
- Call stack allocated at top of memory
- Call stack grows down
12Implementing Local Variables
- Frame pointer register points to current frame
- Decreases at calls (stack grows down)
- Increases on returns
- Local variables allocated in frame
proc() int a, b, c
Memory
Frame for caller of proc
Registers
a
Local variables of proc
b
Frame for invocation of proc
c
fp
Return addr
13Implementing Local Variables
proc() int a, b, c a b c
- ld fp-12,l0
- ld fp-16,l1
- add l0,l1,l0
- st l0,fp-8
Note fp is same as i6 Points to frame for
procedure
14Implementing Structures
- Structures contain several fields
- Each structure typically stored in a contiguous
block of memory
Memory
typedef struct int x, y, z foo foo p
z
y
x
p
15Implementing Structures
Compute address of f-gty
ld fp-8,l0 add l0,4,l0 ld
l00,l2 ld fp-8,l0 add
l0,8,l0 ld l00,l1 add
l2,l1,l1 ld fp-8,l0 st
l1,l00
typedef struct int x, y, z foo proc()
foo f f-gtx f-gtyf-gtz
load f-gty
Compute address of f-gtz
load f-gtz
add values
Store into f-gtx
16Optimized Version
typedef struct int x, y, z foo proc()
foo f f-gtx f-gtyf-gtz
ld fp-4,o0 ld o04,o1 ld
o08,o2 add o1,o2,o1 st o1,o0
17Alignment, Padding, and Packing
- Machines often have alignment requirements
- Integers (4 bytes) must start at 4-byte aligned
address (bottom 2 bits 0) - Shorts (2 bytes) must start at 2-byte aligned
address (bottom bit 0) - Alignment requirements raise issues
- Padding between fields to ensure alignment
- Field packing to minimize memory usage
18Padding and Packing Example
Packed Layout (4 byte savings)
Naïve Layout
typedef struct int w char x int y
char z foo foo p
Memory
Memory
z
y
y
x
x, z
w
w
p
p
19Implementing Arrays
- Allocate memory locations for array elements
- Elements stored contiguously
Memory
a3
a2
int a4
a1
a0
20Implementing Arrays
ld fp-12,l0 sll l0,2,l0 sethi
hi(a),l1 or l1,lo(a),l1 add
l0,l1,l0 ld l00,l0 st
l0,fp-8
Compute address of aj
int a4 proc() int i, j i aj
load aj into l0
store l0 into i
address of aj address of a0 (4 j) a
(4 j)
21Expression Evaluation
- Evaluate subexpressions, combine to get value of
outer expression - Must always have values of operands in registers
- Final result placed in register
22Implementing Expression Evaluation
mov 3,l0 st l0,fp-8 mov 2,l0 st
l0,fp-12 ld fp-8,l0 ld
fp-12,l1 add l0,l1,l2 ld
fp-8,l0 ld fp-12,l1 or
l0,l1,l1 sub l2,l1,l1 sethi
hi(x),l0 or l0,lo(x),l0 st
l1,l00
Initialize a and b
int x proc() int a,b a 3 b 2 x
(ab)-(ab)
Load a and b
Add a and b to l2
Load a and b
Or a and b to l1
Compute l1l2-l1
Load address of x
Store result in x
23Implementation Issues
- Generating a linear sequence of instructions to
compute a nested expression - Allocate storage for temporary values
- Typically registers, but there are a limited
number of registers for machine - May need to store temporaries in memory
- Expression evaluation order affects number of
values you need to keep around - In many cases, may be able to statically compute
value of subexpressions
24Optimized Implementation
int x proc() int a,b a 3 b 2 x
(ab)-(ab)
or g0,2,g2 sethi hi(x),g1 st
g2,g1lo(x)
25Flow of Control
- Convert structured flow of control to branch
statements - Two pervasive shapes
if C then A else B
while C A
Code to evaluate C
Code to evaluate C
Code to execute A
Code to execute A
Code to execute B
Code after while statement
Code after if statement
26Conditional Example
sethi hi(a),g1 ld g1lo(a),g1 cmp
g1,0 be .L1 sethi hi(b),g1 br
.L2 st g0,g1lo(b) .L1 or
g0,1,g2 st g2,g1lo(b) .L2 retl nop
int a, b proc() if (a) b 0 else
b 1
27Optimized Conditional Example
sethi hi(a),g1 ld g1lo(a),g1 cmp
g1,0 be .L1 sethi hi(b),g1 retl st
g0,g1lo(b) .L1 or
g0,1,g2 retl st g2,g1lo(b)
int a, b proc() if (a) b 0 else
b 1
28Apparent Anomaly in Code
sethi hi(a),g1 ld g1lo(a),g1 cmp
g1,0 be .L1 sethi hi(b),g1 br
.L2 // branch over else
part st g0,g1lo(b) // store value into
b for then part .L1 or g0,1,g2 st
g2,g1lo(b) .L2 retl nop
int a, b proc() if (a) b 0 else
b 1
- Branch appears before store
- Why will b get correct value?
29Concept of Branch Delay Slots
- In SPARC architecture, instruction after branch
executes even if branch is taken! -
- be .L1
- sethi hi(b),g1
- Why do this?
- It improved the performance of the initial
version of the processor - Compiler could handle the complexity
- What if there is no instruction to execute? nop!
This instruction executes even if the branch is
taken!
30Instruction Scheduling
- Branch delay slots are special case of
instruction scheduling - Instruction scheduling packs instructions
together for concurrent/pipelined execution - Sophisticated part of compilation
- Moves work from hardware to compiler
- Illustrates rarity of direct assembly coding
- Required for IA-64 to work well
31Implementation of Loops
- Initialize i to 0 and n to 10
- Load i and n
- Branch to end if i gt n
- Compute address of ai
- Load ai
- Increment value
- Store back into ai
- Increment i
- Branch back to top if i lt n
int a10 proc() int n 10 int i i
0 while (i lt n) ai i
32.L16 ld fp-12,l0 sll
l0,2,l0 sethi hi(a),l1 or
l1,lo(a),l1 add l0,l1,l0
st l0,fp-16 ld
fp-16,l0 ld l00,l0
add l0,1,l1 ld fp-16,l0
st l1,l00 ld
fp-12,l0 add l0,1,l0 st
l0,fp-12 ld fp-12,l1
ld fp-8,l0 cmp l1,l0
bl .L16 nop .L18
Implementation
mov 10,l0 st
l0,fp-8 mov 0,l0 st
l0,fp-12 ld fp-12,l1
ld fp-8,l0 cmp l1,l0
bge .L18 nop
int a10 proc() int n 10 int i i
0 while (i lt n) ai i
33Optimizations
- Keep i and address of ai in registers
- Compute address of a0 before loop body,
increment by 4 in loop body - Dont store n in memory or register, just use 10
whenever you see it - Omit initial branch at top of loop
34Optimized Implementation
sethi hi(a),g1 add g1,lo(a),g1 or
g0,0,g2 ld g1,g3 .L900000106 add
g3,1,g3 st g3,g1 add
g2,1,g2 add g1,4,g1 cmp
g2,10 bl,a .L900000106 ld g1,g3
.L77000006
Load base address of a
int a10 proc() int n 10 int i i
0 while (i lt n) ai i
Init i
Load ai
Increment and store ai
Update i and ptr to ai
Loop back
Load ai
35Broader View
- Compilation is a specific instance of language
processing and translation - Technical world is littered with small languages
- Scripting languages
- Configuration languages
- Domain-specific languages
- Language processing crucial skill that you can
apply in many areas to improve productivity - Key aspects
- Developer representation (text)
- Internal representation (data structures)
- Parsing, analysis, transformation, code
generation - Studying compilers gives you skills you need to
do language processing
36Summary
- Compiler responsibilities
- Data layout and access
- Global and local variables, parameters
- Structures, arrays, and objects
- Expression evaluation
- Flow of control
- Procedure and method calls
- Hide low-level machine complexities
- Optimizations
37More Optimizations
sethi hi(a),g1 add g1,lo(a),g1 or
g0,0,g2 ld g1,g3 .L900000106 add
g3,1,g3 st g3,g1 add
g2,1,g2 add g1,4,g1 cmp g1,10
a40 bl,a .L900000106 ld g1,g3
.L77000006
Load base address of a
int a10 proc() int n 10 int i i
0 while (i lt n) ai i
Init i
Load ai
Increment and store ai
Update i and ptr to ai
Loop back
Load ai
38Procedure Call
- Protocol between caller and callee
- Heavily architecture dependent
- SPARC concepts
- Caller actions
- Store parameters in o0-o6
- Jump to callee, storing PC in o7
- Get return result in o0
- Callee actions
- Get parameters in i0-i6, PC from caller in i7
- Put return result in i0
- Use PC from caller to return back to caller
39Register Windows
- Parameter issue
- Caller puts parameters in o0-06
- Callee expects parameters in i0-i6
- Return result issue
- Callee puts return result in i0
- Caller expects result in o0
- Why? Register windows!
- Conceptually, have an overlapping stack of
register windows
40Prev
Visual Register Windows
g0-g7
i0-i7
l0-l7
Current
o0-o7
i0-i7
l0-l7
Next
o0-o7
i0-i7
l0-l7
- Have current window
- i0-i7 of current window are same as o0-o7 of
previous window - o0-o7 of current window are same as i0-i7 of next
window
o0-o7
41Register Window Instructions
- save sp, ltnumgt, sp
- Pushes current window on stack
- Allocates new window (o0-o7 become i0-i7, new
l0-l7 and o0-o7) - Sets sp (o6) in new window to sp in old window
plus ltnumgt - Note that o6 in old window becomes i6 in new
window (sp becomes fp) - restore
- Pops current window
- (i0-i7 become o0-o7)
42Stack and Frame Pointers
save sp, -12, sp (In practice, need at least
1264 bytes to leave space for reg saves)
Memory
Old Reg Win
fp i6
New Reg Win
sp o6
fp i6
sp o6
43Procedure Call Example
sethi hi(n),l0 or l0,lo(n),l0 ld
l00,l0 mov l0,o0 call foo nop
Load n
int n bar() foo(n)
Set up parameter
Call foo (stores PC of call instruction into o7)
44Procedure Example
save sp,-104,sp st i0,fp68 ld
fp68,l0 add l0,1,l0 st
l0,fp-4 ba .L13 nop .L13 ld
fp-4,l0 mov l0,i0 jmp i78 restore
Standard Prologue
New Reg Win, frame
Store param
int foo(int n) return n1
Compute result
Load result
Standard Epilogue
Return to caller
Restore Reg Win frame
45Optimized Leaf Procedure
- Punt register windows completely
- Just compute in window from caller
int foo(int n) return n1
jmp o78 add o0,1,o0
Return to caller
Compute result
46Complex Design
- Need for separate compilation
- Must be standard call/return protocol
- Need for performance
- Parameters/return value passed in registers
- Supports efficient caller/callee linkage
- Protocol supports tailored code generation
- Caller does not set up register window, frame
pointer, or stack pointer for callee - Enables leaf procedure optimizations
- Compiler hides all this from programmer!
47Objects and Inheritance
- Object consists of
- State (fields of object)
- Behavior (methods)
- Inheritance
- Augment base class with new fields
- Augment behavior of base class with new methods
- Override some methods of base class
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
48Key Consideration Subtyping
- If B inherits from A, then
- Anywhere program declares object of type A
- Must be able to execute with object of type B
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
49Implementing Object State (Single
Inheritance Only)
- Extension of structure approach
- Store fields in contiguous block of memory
- Fields of extending class stored after fields of
base class - Key property - fields from base class stored at
same offset in all objects that inherit from base
class
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
fields of B
x
w
z
fields of A
y
50Virtual Function Calls (Single
Inheritance Only)
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
void p(A) a.f()
A a new A() int i p() // p() calls f() in
class A B b new B() int j p() // p()
calls f() in class B
- Different executions of this call site may invoke
different methods - Invoked method depends on class of object
51Vtable Approach
- Each object has reference to a vtable
- One vtable per class
- Vtable contains pointers to methods for all
objects of its class
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
void p(A) a.f()
z
A object
y
Vtable for A objects
vtable ptr
x
w
Vtable for B objects
B object
z
y
vtable ptr
52Virtual Function Calls
- Calling convention stays same
- But call site determines address of the invoked
method dynamically
class A public int y,z public int f()
return yz class B extends A public
int w,x public int f() return w-x
void p(A) a.f()
Generated Code at Call Site
ld i0,o1 ld o1,o1 call o1,0 nop
Load vtable ptr
Load function ptr
Indirect call