Title: Overview of Compiling
1Overview of Compiling
- Basics of Compilation
- Main Components
2Structure of a Compiler
Source code
Target code
Front End
Back End
- FRONT ENDDetermine and represent structure of
input program - Ensure that it is well-formed report errors
- BACK ENDGenerate corresponding object code for
the target architecture - Optimize code according to compiler flags
3Major Modules in Open64
Here are the front ends
-IPA
-O3
.B
LNO
Local IPA
Main IPA
Lower to High IR
Inliner
gfec
.I
Lower I/O
gfecc
(only for f90)
.w2c.c .w2c.h
/.w2f.f
WHIRL2 C/Fortran
f90
-mp
(only for OpenMP)
Lower MP
Take either path
-O0
Lower all
Very high WHIRL
CG
High WHIRL
-phase woff
Mid WHIRL
Low WHIRL
Main opt
Lower Mid W
-O2/O3
4The Back End (BE)
Object code
Back End
- Main purpose of Back End (BE) is to generate
target machine code - Match IR with hardware features
- Translation details are machine-specific
- But there are typical problems and strategies for
overcoming them for classes of architectures - BE decides where to store data objects in program
- Object code usually makes calls to a run time
library - Handles common actions, improves efficiency
- Part of compiler design is to decide what should
handled by in run-time system
Code Generator
Optimizer
5Three-Address Code
IR
- 3-address code is popular IR for interface
between Front End and Back End - General form Instruction argument1 argument2
result - Two-address or one-address code have been used
- one argument, single argument is also result
location - e.g. a increments contents of a single location
- these can save memory
- Complex instructions broken down into several
simpler ones to generate 3-address code. - Compiler generates temporary variables as needed
t1 ? c d a ? b t1
a b c d becomes
6Translation Into Machine Code
Code Generator
- Many translations are straightforward. But there
are some tricky problems too - How do we deal with operations where the operands
have different data types? - How do we figure out where variables in different
storage classes should be stored? - How is high-level control flow (loops, switches,
) realized? - How are calls to procedures and functions
implemented?
7Selecting Instructions
Code Generator
- Select instructions for each operation
- Should be as efficient as possible
- Difficulty of selection depends on machine
instructions available. - Order of instructions may also affect efficiency
of target code - But there is no optimal order.
- We initially generate code in order produced by
intermediate code generation - Latest technology requires compiler to generate
bundles of instructions for concurrent execution
8The Back End (BE)
SYMBOL TABLE
SYMBOL TABLE
Optimizer
IR
IR
- Most of work goes into optimization
- This is complex and there are many trade-offs and
very hard problems - For this reason, back end is usually hand-coded
- Some major optimization goals
- Improve selection of instructions,
- Instruction scheduling (reordering for pipelines
and other hardware features - Assign data to registers
The challenge these are interrelated. Moreover,
there are no optimal solutions.
9Instruction Selection and Optimization
- Goal is to produce fast, efficient code
- Take advantages of features of target machine
instruction set such as variety of addressing
modes - Usually dealt with as a pattern matching problem
- Patterns in IR input to back end, ad hoc approach
- Advent of RISC instruction sets simplified this
greatly - Doing this well occupied compiler writers in the
1970s
10Low-Level Optimization
Optimizer
- Improve selection of instructions
- eliminate redundant operations
- choose most efficient instructions
- reorder (schedule) instructions
- Allocate registers
- keep most important (most heavily used) data in
registers - optimize lifetime of data in registers
11Example
Optimizer
- a b c d a e
- LDI b, Ri
- LDI c, Rj
- ADDI Ri, Rj
- STI Rj, a Ri, Rj are registers
- LDI a, Ri
- LDI e, Rj
- ADDI Ri, Rj
- STI Rj, d
- But we can avoid storing and reloading a.
12Register Optimization
Optimizer
- Registers provide particularly fast access, thus
code compiled with data in registers will execute
faster. - Instructions with operands in registers take up
less space thus good use of registers also saves
memory. - So it is important to use registers well when
generating code.
The problem is to manage a limited set of
resources
13Register Optimization
Optimizer
- Goal in register allocation is to hold as many
operands as possible in register. - Save data in memory only when run out of
registers. - Allocation strategies also aim to reuse values in
registers when possible. - During register allocation, we select values that
will reside in registers at a point in the
program. - Register assignment is an NP-complete problem, so
there are no optimal solutions. - There are some popular strategies.
Compilers approximate solutions to NP-Complete
problems
14Why are Optimizations hard?
Optimizer
- The next problem is that instruction selection
and register allocation are not independent
problems - p w 2
- q p r
- s w 2
- Register optimization suggests we remove p (and
w) from registers as soon as possible. - But we need the same value later If we keep p in
a register, we dont have to recompute it. - So to save an instruction, we need an additional
register. - This kind of trade-off is typical!
15Instruction Scheduling
- Modern machines have multiple functional units
- Need to avoid hardware stalls and interlocks
- Use all functional units productively
- Reordering can modify lifetime of variables
(thus perhaps changing the register allocation) - Optimal scheduling is also NP-Complete
16Setting Up Run Time Storage
Code Generator
- a b c 2
- MULI LOCc, 2, R1
- To generate the correct instructions, we need to
know how to access a, b and c. - Code generator uses information stored in symbol
table in order to perform this translation - However, the symbol table is not around at run
time
SYMBOL TABLE
17Setting Up Run Time Storage
SYMBOL TABLE
- At run time, memory is required to store the
programs object code and its data objects. - An assignment to a variable will modify the
contents of the corresponding storage location. - Intermediate representations use the symbol table
as a means to refer to variables. - Before code is generated, these references will
be replaced by the memory locations
18Setting Up Run Time Storage
- So back end must also set up storage for program
and its variables - deal with different storage classes
- Adapts code to reflect the locations chosen
- usually relocatable (i.e. with offset, not
absolute addresses) - Compiler assumes contiguous memory
- Job of OS is to manage this
19Preparing for Run Time
- Program consists of collection of procedures.
- An invocation of a procedure results in its
activation at run time. - Compiler must generate code to
- begin and terminate execution of procedures
- Pass arguments to and results from called routine
- Ensure proper return to calling procedure and
restore its environment - A call stack is used to save local data, pass
arguments and results, and save state of caller
20Setting Up Run Time Storage
- Each storage class is considered separately when
reserving memory - What is known at end of compile time
- size of object code,
- amount of storage required for some data objects,
- storage class of each data object.
21Run-Time Memory Allocation
- Here is one possible organization of memory
22Run Time System
- Target code would be too large if all operations
are coded entirely in machine code. - So compilers usually have a run time library
- Perform functions that are always carried out the
same way, e.g. Initialization routines and
termination code - Save space by performing repetitive non-trivial
functions, e.g. Input and Output - Interrupts
- Back end generate calls to run time library
routines
23Role of the Run-time System
- Memory management services
- Allocate
- In the heap or in an activation record (stack
frame) - Deallocate
- Collect garbage
- Run-time type checking
- Error processing
- Interface to the operating system
- Input and output
- Support of parallelism
- Parallel thread initiation
- Communication and synchronization
24Where do Optimizations Occur?
- Back end translates intermediate code to
machine-like code - Called lowering
- Optimizations are performed
- In practice lowering may occur several times,
interleaved with the optimizations - Some optimizations require information that may
later be lost - In other words, they may depend on a certain kind
of IR - Some optimizations may be repeated
- This is the process we are going to start looking
at in more detail
25Outlook
- Next, we will extend our description of a
compilers structure - A more realistic description
- Shows central role of optimizations
- Other topics we will look into a bit more
- Intermediate Representation
- Symbol Tables
- Preparing for Execution
- Memory management and handling procedure calls
26Run-Time Stack
- Begin with result and actual parameters, other
data of known size, then fields whose size may
not be not fixed at compile time. - may be filled
needed
when - by caller
procedure
ends -
unless in fixed
storage area - size initially
- unknown
27Run Time Stack
- Local variables can be saved on stack
- So can temporaries
- Global and static variables outlive a procedure
activation - so they must be stored separately
- a fixed area is usually reserved for them
- Dynamic variables require a different storage
area - the heap
28Summary Code Generation
- Code generation translates intermediate code into
form like target machine code - Organizes memory usage
- Optimization is essential for modern
architectures. - Peephole optimizations try to improve target code
in small region of consecutive instructions. - Register allocation, instruction selection and
instruction scheduling