Title: Code Generation
1Code Generation
Token Stream
Parser
- A more elaborate example
- A very different variation on 3-address code
- More powerful symbol table
Intermediate Code
Optimization
Object Code
The example from this lecture will be available
on the class web site as codegen.zip
2Program Structure
CCompile
Reads the file. Runs Lex and Yacc. Outputs to
three address code.
CThreeAddressCode
The three address code and symbol tables.
CSymbolTable
CCodeGenerator
Converts three address code to object code.
CRegisterAssignment
CEmulator
Emulates the object code.
3The parts (in compiler.cpp)
// The three address code (intermediate
code) // we will be generating CThreeAddressCode
tac // The compiler we will use CCompile
compiler // The code generator we will
use CCodeGenerator generator // The code
emulator we will use CEmulator emulator //
Object code address vectorltBytegt code
4The compilation process (compiler.cpp)
// Compile compiler.Compile(file, tac) //
Display the results cout ltlt "Intermediate code"
ltlt endl tac.Display() // Generate code cout
ltlt endl cout ltlt "Generated code" ltlt
endl generator.Generate(tac, code) //
Emulate the code CSymbolTableSymbol f
tac.GlobalSymbolTable()-gtFindFunction("main") if
(f ! NULL) emulator.Emulate(code,
((CSymbolTableFunctionSymbol )f)-gtm_codeAddr)
else cout ltlt "Function main is not
defined" ltlt endl
5Our 3-address code
- Weve been limited in our 3-address code design
up till now for several reasons. - 3AC feeds a code generator. Weve not built one
yet, so we dont know exactly what we need. - The simple example was based on only a trivial
function symbol table. - We wanted to execute the 3AC, a somewhat
artificial phenomena
Ive written a variation on functional (miniature
language) that generates code and uses a more
powerful and easier to use 3-address code.
6Symbol Table
class CSymbolTable public CSymbolTable(void)
CSymbolTable(void) // Symbol types enum
STypes Constant, Function, Temp, Parameter,
Label // Value types enum VTypes Float,
Int, Double ...
This is the symbol table class. It can contain
constants, functions, temporaries, parameters,
and labels.
7Symbol Types
- Constants
- Constant values. Can be Float, Int, Double
- Functions
- A function with all of its parameter definitions
- A function has its own symbol table as well
- Temporaries
- Temporary values for a function
- Parameters
- The parameters to a function
- Labels
- Code labels we will jump to
TT
8Example
Constants
Functions
0 fac - x 1 t1 1
if! t1 - lbl1 2
- x 1 t3 3
param t3 - - 4 call
fac - t4 5 x
t4 t5 6 lbl1 t5 -
t2 7 jmp lbl2 - - 8
1 - t2 9
lbl2 ret t2 - - 10 main
param 4 - - 11 call
fac - t1 12 ret t1
- -
Temporaries
Labels
9How we describe them...
class Symbol public virtual Symbol()
virtual STypes SymbolType()
0 class ConstantSymbol
Symbol plublic virtual STypes SymbolType()
return Constant VTypes type union
float flt_value int int_value double
dbl_value int num // Just an
numerical id int codeAddr // In generated
code
Superclass for all symbols
Describes a constant
These are nested inside CSymbolTable
// All constants stdlistltConstantSymbolgt
m_constants
10Creating a float constant
CSymbolTableSymbol CSymbolTableCreateConstant
(float value) // Does this constant already
exist? for(listltConstantSymbolgtiterator
im_constants.begin() i!m_constants.end()
i) if(i-gttype Float i-gtflt_value
value) return (i) // Does not, create
a new one ConstantSymbol cs cs.type
Float cs.flt_value value m_constants.push_b
ack(cs) return (m_constants.back())
Dont return cs. (Why?) Why must I use list and
not vector?
11Temps, Parameters, Labels
class TempSymbol Symbol public virtual
STypes SymbolType() return Temp int
num VTypes type class ParameterSymbol
Symbol public virtual STypes SymbolType()
return Parameter stdstring name int
num VTypes type class LabelSymbol
Symbol public virtual STypes SymbolType()
return Label int addr int num
// Parameters stdmapltstdstring,
ParameterSymbolgt m_parms // Temporaries stdlis
tltTempSymbolgt m_temps // Members of
FunctionSymbol, not CSymbolTable
stdlistltLabelSymbolgt m_labels // Member of
CSymbolTable
12Why we have the Symbol superclass
// // A Three Address Code instruction // str
uct TAC stdstring name CSymbolTableSy
mbol a1 CSymbolTableSymbol a2 CSymbolTab
leSymbol a3
13Creating labels
CSymbolTableSymbol CSymbolTableCreateLabel()
LabelSymbol ls ls.addr 0 // Till we
know... ls.num (int)m_labels.size()
1 m_labels.push_back(ls) return
m_labels.back()
TT
14Functions
struct FunctionSymbol Symbol FunctionSymbo
l() virtual FunctionSymbol() virtual
STypes SymbolType() return Function //
Start and end address int m_start int m_e
nd // First location after ret // Function
name stdstring m_name //
Parameters stdmapltstdstring,
ParameterSymbolgt m_parms //
Temporaries stdlistltTempSymbolgt m_temps
TT
15CSymbolTableSymbol CSymbolTableCreateFunction
(stdstring name) // Does this function
already exist? mapltstring, FunctionSymbolgtitera
tor i m_functions.find(name) if(i !
m_functions.end()) cout ltlt "Function " ltlt
name ltlt " already defined!" ltlt endl return
(i-gtsecond) // Create the entry in the
symbol table m_functionsname
FunctionSymbol() FunctionSymbol fs
(m_functionsname) fs-gtInitialize() fs-gtm_na
me name return fs CSymbolTableSymbol
CSymbolTableFindFunction(stdstring
name) mapltstring, FunctionSymbolgtiterator i
m_functions.find(name) if(i
m_functions.end()) cout ltlt "Function " ltlt
name ltlt " not defined!" ltlt endl return
NULL return (i-gtsecond)
16Functions
TempSymbol CreateTemp(CSymbolTableVTypes
type) TempSymbol ts ts.num
(int)m_temps.size() 1 ts.type
type m_temps.push_back(ts) return
m_temps.back() ParameterSymbol
CreateParameter(stdstring name,
CSymbolTableVTypes type) ParameterSymbol
ts ts.num (int)m_parms.size() ts.name
name ts.type type m_parmsname
ts return m_parmsname ParameterSymb
ol ParameterAddress(stdstring
name) stdmapltstdstring,
ParameterSymbolgtiterator p m_parms.find(name)
if(p m_parms.end()) return
NULL return (p-gtsecond)
17All of the way to Yacc
union int intval float fval char
sval void pval
How well refer to symbols!
F FLOAT_VALUE CreateConstantFloat(1)
18The path...
In functions.h
void CreateConstantFloat(float value)
In compile.cpp
extern "C" void CreateConstantFloat(float
value) return compile.GlobalSymbolTable()-gtCre
ateConstant(value)
In SymbolTable.cpp
CSymbolTableSymbol CSymbolTableCreateConstant
(float value) // Does this constant already
exist? for(listltConstantSymbolgtiterator
im_constants.begin() i!m_constants.end()
i) if(i-gttype Float i-gtflt_value
value) return (i) // Does not, create
a new one ConstantSymbol cs cs.type
Float cs.flt_value value m_constants.push_b
ack(cs) return (m_constants.back())
19When I display the 3AC
void CCompileDisplayAddress(CSymbolTableSymbol
a1) if(a1 NULL) cout ltlt
"\t-" return switch(a1-gtSymbolType())
case CSymbolTableConstant CSymbolTable
ConstantSymbol cs (CSymbolTableConstantSymbo
l )a1 switch(cs-gttype) case
CSymbolTableFloat cout ltlt "\t" ltlt
cs-gtflt_value break break
...
20Simplified function calling and parameters
Using parameter
0 fac - x 1 t1 1
if! t1 - lbl1 2
- x 1 t3 3
param t3 - - 4 call
fac - t4 5 x
t4 t5 6 lbl1 t5 -
t2 7 jmp lbl2 - - 8
1 - t2 9
lbl2 ret t2 - - 10 main
param 4 - - 11 call
fac - t1 12 ret t1
- -
Passing parameter
Where to put result
No stack usage in this variation at all.
21Theres more...
Lots more on How to implement a conditional
(ternary operator) The various functions See
the code for examples
22Our machine language
- Registers
- There are 16 general purpose registers, with R15
designated as the stack pointer (sp). - There are 8 floating point registers (64 bit)
designated F0-F7. - Addressing Modes
- The value of a byte defines the addressing mode.
These are the values - 0x00-0x0f Registers R0 to R15
- 0x10-0x1f Location pointed to by register plus
immediate - 0x20-0x2f Register plus immediate
- 0x30-0x3f Register indirect (location pointed to
by register) - 0xf0 Immediate integer constant following
instruction - 0xf1 Absolute address following instruction
- If more than one item has content following the
instruction, the content will be in the order of
the instructions. Values of this type are
designated A in the opcode definitions.
23Opcodes
24The meat of it code generation (CCodeGenerator)
void CCodeGeneratorGenerate(CThreeAddressCode
tac, stdvectorltBytegt code) code.clear() m
_code code m_tac tac // // Generate
any constants // m_lastIC 0 // Prevents
labels below listltCSymbolTableConstantSymbolgt
constants m_tac-gtGlobalSymbolTable()-gtConstants
() int i1 for(listltCSymbolTableConstantSym
bolgtiterator cconstants.begin()
c!constants.end() c, i) c-gtcodeAddr
CodeAddr() c-gtnum i switch(c-gttype)
case CSymbolTableFloat Generate((Byte
)(c-gtflt_value), 4) DisplayPC(c-gtcodeAddr)
cout ltlt "con" ltlt i ltlt "\tfloat\t" ltlt
c-gtflt_value ltlt endl break
Generated Code 000000 0000803f con1
float 1 000004 00004040 con2 float
3
25Generating functions
// // Now find all of the functions // m_IC
0 // Intermediate code address m_lastIC
1 // So we now get labels while(m_IC lt
m_tac-gtCodeSize()) // Find the function that
is starting here // This should never fail,
since we have functions one after
another CSymbolTableFunctionSymbol func
(CSymbolTableFunctionSymbol
)m_tac-gtGlobalSymbolTable()-gtFindLabel(m_IC)
CodeGeneratorFunction(func) m_IC
func-gtm_end
We generate code a function at a time.
26Code generation for function
void CCodeGeneratorCodeGeneratorFunction(CSymbol
TableFunctionSymbol function) // Space
before the function in the display cout ltlt
endl // Indicate to the function where it
starts function-gtm_codeAddr CodeAddr() // /
/ Register assignment // int regCnt 2 //
First available register int memCnt CRegisterA
ssignment registerAssign registerAssign.Registe
rAssignment(function, regCnt, memCnt)
27Code generation for function
// // Create procedure beginning // uint
frameSize memCnt regCnt 4 // Set the
offsets to the parameters as the frame return
value return addr function-gtParameterOffsets(fr
ameSize 8) // Make space for the local
frame Opcode(sub, sp, sp, immediate,
frameSize) // // Save registers // for(int
r0 rltregCnt r) Opcode(st, spp, r0
r, none, r 4) // // Parameter collector.
This collects up // the parameters for a
function call. // vectorltCSymbolTableSymbol
gt parameters
000008 fac sub sp, sp, 28 000016
st 0(sp), r0 000024 st 4(sp),
r1 000032 st 8(sp), r2 000040
st 12(sp), r3 000048 st
16(sp), r4 000056 st 20(sp), r5
28Code generation loop
for( m_IC lt function-gtm_end m_IC) //
Is there a label at this location? CSymbolTable
Symbol sym m_tac-gtGlobalSymbolTable()-gtFindLab
el(m_IC) if(sym ! NULL) switch(sym-gtSym
bolType()) case CSymbolTableLabel
CSymbolTableLabelSymbol label
(CSymbolTableLabelSymbol )sym label-gtcode
Addr CodeAddr() for(vectorltuintgtiterato
r ilabel-gtreferences.begin()
i
! label-gtreferences.end() i) ((
uint )(m_code)i) CodeAddr()
break CThreeAddressCodeTAC
tac m_tac-gtCode(m_IC)
29Forward references
00000c 020f0ff0 fac sub sp, sp, 16 000014
101f00ff st 0(sp), r0 00001c
101f01ff st 4(sp), r1 000024
101f02ff st 8(sp), r2 00002c
101f03ff st 12(sp), r3 000034
11001fff ld r0, 24(sp) 00003c
1101f1ff ld r1, 0 000044 04020001
subf r2, r0, r1 000048 13f002ff
beqz 0, r2 000050 1100f1ff ld r0,
4 000058 110300ff ld r3, r0 00005c
12f0ffff br 0 000064 1100f1ff lbl1
ld r0, 8 00006c 110300ff ld r3,
r0 000070 101f03ff lbl2 st 20(sp),
r3 000078 11001fff ld r0,
0(sp) 000080 11011fff ld r1,
4(sp) 000088 11021fff ld r2,
8(sp) 000090 11031fff ld r3,
12(sp) 000098 010f0ff0 add sp, sp,
16 0000a0 121fffff br 0(sp)
lbl1
lbl2
303AC to assembly
for( m_IC lt function-gtm_end m_IC) TAC
tac m_codem_IC if(tac.name
"-") Operator3("subf", tac) else
...
31Operator3
void CCodeGeneratorOperator3(Opcodes op,
CThreeAddressCodeTAC tac) int r1
LoadOperand(tac.a1, 0) int r2
LoadOperand(tac.a2, 1) // Where will the
result go? int r3 TargetOperand(tac.a3,
0) Opcode(op, r3, r1, r2) TargetWrite(tac.
a3, r3)
32LoadOperand
int CCodeGeneratorLoadOperand(CSymbolTableSymb
ol op, int regToUse, int offset) // Operands
can be from register or memory CSymbolTableSymb
ol s op switch(s-gtSymbolType()) case
CSymbolTableConstant // Load into the
register to use uint addr ((CSymbolTableCons
tantSymbol )s)-gtcodeAddr Opcode(ld, regToUse,
absolute, none, addr) break case
CSymbolTableTemp CSymbolTableTempSymbol
ts (CSymbolTableTempSymbol )s if(ts-gtreg
gt 0) regToUse ts-gtreg else uint
mem ts-gtmem offset Opcode(ld, regToUse,
spp, none, mem) break case
CSymbolTableParameter ... return
regToUse
Constants must be loaded into a register. Temps
may be in a register already or have to be loaded
into a register.
33Parameter operands
case CSymbolTableTemp CSymbolTableTempSymb
ol ts (CSymbolTableTempSymbol
)s if(ts-gtreg gt 0) regToUse
ts-gtreg else uint mem ts-gtoffset
offset Opcode(ld, regToUse, spp, none,
mem) break case CSymbolTableParameter
CSymbolTableParameterSymbol ps
(CSymbolTableParameterSymbol )s uint mem
ps-gtoffset offset Opcode(ld, regToUse, spp,
none, mem) break
34Stack usage parameters and temps
CSymbolTableTempSymbol ts (CSymbolTableTe
mpSymbol )s uint mem ts-gtoffset
offset Opcode(ld, regToUse, spp, none, mem)
x
SP 32
y
SP 28
Return value
SP 24
Return address
SP 20
CSymbolTableParameterSymbol ps
(CSymbolTableParameterSymbol )s uint mem
ps-gtoffset offset Opcode(ld, regToUse, spp,
none, mem)
t1
SP 16
t2
SP 12
t3
SP 8
r1
SP 4
Stack pointer
r2
35Recall...
// // Create procedure beginning // uint
frameSize memCnt regCnt 4 // Set the
offsets to the parameters as the frame return
value return addr function-gtParameterOffsets(fr
ameSize 8)
void ParameterOffsets(uint offset) for(Parm
pm_parms.begin() p ! m_parms.end()
p) p-gtsecond.offset offset offset
p-gtsecond.Size()
in CSymbolTableFunctionSymbol
36Now, function calls...
- Collect up the parameters
- Make space for the parameters, return value, and
return address - Put the parameters on the stack
- Set the return address
- Jump to the function
- Clear the stack
37Collecting up the parametersparam code
generation
else if(tac.name "param") parameters.push_ba
ck(tac.a1)
vectorltCSymbolTableSymbol gt parameters
38Function call
else if(tac.name "call") GenerateCall(tac,
parameters)
39CCodeGeneratorGenerateCall()
void CCodeGeneratorGenerateCall(CThreeAddressCod
eTAC tac, vectorltCSymbolTableSymbol
gt parms) // Find the function
description CSymbolTableFunctionSymbol
function (CSymbolTableFunctionSymbol
)tac.a1 // Where will the result go? int reg
TargetOperand(tac.a3, 0)
40Setting up the stack
uint callsize 8 (uint)parms.size() 4 //
For parameters, return address, and result //
Save space for the parameters, result, and return
address Opcode(sub, sp, sp, immediate,
callsize) // // Put the parameters onto the
stack // uint addr 8 // First parameter
address for(stdvectorltCSymbolTableSymbol
gtiterator pparms.begin() p ! parms.end()
p) int reg LoadOperand(p, 0,
callsize) Opcode(st, spp, reg, none,
addr) addr 4
Note the offset!
0000c8 020f0ff0 sub sp, sp, 12 0000d0
1100f1ff ld r0, 8 0000d8 101f00ff
st 8(sp), r0
41Return address and branch
// Save the return address uint retAddr
CodeAddr() 20 Opcode(ld, r0, immediate, none,
retAddr) Opcode(st, spin, r0, none) // Call
the function Opcode(br, immediate, none, none,
function-gtm_codeAddr)
0000e0 1100f0ff ld r0, 244 0000e8
103f00ff st r15, r0 0000ec 12f0ffff
br 12
42Return value and cleaning up the stack
// Put the return value into result
register Opcode(ld, reg, spp, none, 4) //
Clean up the stack Opcode(add, sp, sp,
immediate, callsize) if(reg
r0) TargetWrite(tac.a3, reg)
0000f4 11021fff ld r2, 4(sp) 0000fc
010f0ff0 add sp, sp, 12 000104
101f02ff st 16(sp), r2
43Function return
000104 101f02ff st 16(sp), r2 00010c
11001fff ld r0, 0(sp) 000114
11011fff ld r1, 4(sp) 00011c
11021fff ld r2, 8(sp) 000124
010f0ff0 add sp, sp, 12 00012c
121fffff br 0(sp)
else if(tac.name "ret") int r1
LoadOperand(tac.a1, 0) // Set the return
value uint retval frameSize 4 Opcode(st,
spp, r1, none, retval) // Restore saved
registers for(int r0 rltregCnt
r) uint offset r 4 Opcode(ld, r,
spp, none, offset) // Clear the
stack Opcode(add, sp, sp, immediate,
frameSize) // Return from function Opcode(br
, spp, none, none, 0)
44jmp
else if(tac.name "jmp") CSymbolTable
LabelSymbol ls (CSymbolTableLabelSymbol
)tac.a1 // Save this reference so we can
set the address later if this is // a forward
reference. ls-gtreferences.push_back(CodeAddr()
4) Opcode(br, immediate, none, none,
ls-gtcodeAddr)
00005c 12f0ffff br 0 000060
00000000 000064 1100f1ff lbl1 ld r0,
8 000068 08000000 00006c 110300ff ld
r3, r0 000070 101f03ff lbl2 st 20(sp), r3
45if!
else if(tac.name "if!") int reg
LoadOperand(tac.a1, 0) CSymbolTableLabel
Symbol ls (CSymbolTableLabelSymbol
)tac.a3 // Save this reference so we can
set the address later if this is // a forward
reference. ls-gtreferences.push_back(CodeAddr()
4) Opcode(beqz, immediate, reg, none,
ls-gtcodeAddr)
000048 13f002ff beqz 0, r2 000050
1100f1ff ld r0, 4 000058 110300ff
ld r3, r0 00005c 12f0ffff br
0 000064 1100f1ff lbl1 ld r0, 8 00006c
110300ff ld r3, r0 000070 101f03ff
lbl2 st 20(sp), r3
46Register Assignment (dummy version)
void CRegisterAssignmentRegisterAssignment(CSymb
olTableFunctionSymbol function, int regCnt,
int memCnt) stdlistltCSymbolTableTempSymbol
gt temps function-gtm_temps // // For now
I'm doing a simple assignment of the first temps
to // registers, all remaining to
memory. // memCnt 0 // First available
memory location for(stdlistltCSymbolTableTemp
Symbolgtiterator ttemps.begin()
t!temps.end() t) // I'm limiting to 6
register here // I'm also assuming all temps
fit in the registers and are 4 bytes if(regCnt
lt 6) t-gtreg regCnt t-gtoffset
-1 regCnt else // Goes in
memory t-gtreg -1 t-gtoffset
memCnt memCnt 4