Title: 4' Processing the intermediate code
14. Processing the intermediate code
- From Chapter 4, Modern Compiler Design, by Dick
Grunt et al.
24.0 Background
- The AST still bears very much the traces of the
source language and the the programming paradigm
it belongs to - higher-level constructs are still represented by
nodes and subtrees. - The next step in processing the AST
- transformation to intermediate code
- IC generation
- serves to reduce the set of the specific node
types to a small set of general concepts that can
be implemented easily on actual machines. - IC generation
- finds the language-characteristic nodes and
subtrees in the AST and rewrites them into
subtrees that employ only a small number of
features, each of which corresponds rather
closely to a set of machine instructions. - The resulting tree should probably be called an
intermediate code tree.
34.0 Background
- The standard IC tree features
- expressions, including assignments, routine
calls, procedure headings, and return statements,
and conditional and unconditional jumps. - Administrative features
- memory allocation for global variables,
- activation record allocation, and
- module linkage information.
- IC generation
- increases the size of the AST, bit
- reduces the conceptual complexity
4Deferred to Chs.6 through 9
This chapter
54.0 Background
- Roadmap
- 4. Processing the intermediate code
- 4.1 Interpretation
- 4.2 Code generation
- 4.3 Assemblers, linkers, and loaders
- A sobering thought
- whatever the processing method, writing the
run-time system and library routines used by the
programs will be a substantial part of the work. - Little advice can be given on this most of it is
just coding, and usually there is much of it.
64.1 Interpretation
- The simplest way to have the actions expressed by
the source program performed is to - process the AST using an interpreter.
- An interpreter is
- a program that considers the nodes of the AST in
the correct order and performs the actions
prescribed for those nodes by the semantics of
the language. - Two varieties of interpreter
- Recursive works directly on the AST and requires
less preprocessing - Iterative works on a linearized version of the
AST but requires more preprocessing.
74.1.1 Recursive interpretation
- A recursive interpreter has an interpreting
routine for each node type in the AST. - Such an interpreting routine calls other similar
routines, depending on its children - it essentially does what it says in the language
definition manual. - This architecture is possible because the meaning
of a given language construct is defined as a
function of the meanings of its components. - For example, if-statement condition then part
else part
8(No Transcript)
94.1.1 Recursive interpretation
- An important ingredient in a recursive
interpreter is the uniform self-identifying data
representation. - The interpreter has to manipulate data values
defined in the program being interpreted, but the
types and sizes of these values are not known at
the time the interpreter is written. - This makes it necessary to implement these values
in the interpreter as variable-size records that
specify the type of the run-time value, its size,
and the run-time value itself.
10(No Transcript)
114.1.1 Recursive interpretation
- Another important feature is the status
indicator. - It is used to direct the flow of control.
- Its primary component the mode of operation of
the interpreter. - An enumeration value, like Normal mode,
indicating sequential flow of control, but - other values are available, to indicate jumps,
exceptions, function returns, etc. - Its second component a value to supply
information about non-sequential flow of control. - Return mode, Exception mode, Jump mode
124.1.1 Recursive interpretation
- Each interpreting routine checks the status
indicator after each call to another routine, to
see how to carry on. - If Normal mode, the routine carries on normally.
- Otherwise, it checks to see if the mode is one it
should handle - If it is, it does so, but
- If it is not, the routine returns immediately, to
let one of the parent routines handle the mode.
PROCEDURE Elaborate return with expression
statement (Rwe node) SET Result TO Evaluate
expression (Rwe node .expression) IF Status
.mode / Normal mode RETURN SET Status .mode
TO Return mode SET Status .value TO Result
13(No Transcript)
144.1.1 Recursive interpretation
- Variables, named constants, and other named
entities are handled by entering them into the
symbol table, in the way they are described in
the manual. - It is useful to attach additional data to the
entry. - E.g., if in the manual the entry for declaration
of a variable V of type T states that room
should be allocated for it on the stack, - we allocate the required room on the heap and
enter into the symbol table under the name V a
record with the following fields - A pointer to the name V,
- The file name and line number of its declaration,
- An indication of the kind of declarable
(variable, constant, field selector, etc.), - A pointer to the type T,
- A pointer to newly allocated room for the value
of V, - A bit telling whether or not V has been
initialized, if known, - One or more scope- and stack-related pointers,
depending on the language, - Perhaps other data, depending on the language.
154.1.1 Recursive interpretation
- A recursive interpreter can be written relatively
quickly, and is useful for rapid prototyping - It is not the architecture of choice for
heavy-duty interpreter. - A secondary advantage it can help the language
designer to debug the design of the language and
its description. - Disadvantages
- Speed of execution
- May be a factor of 1000 or more lower than what
could be achieved with a compiler - Can be improved by doing judicious memorization.
- Lack of static context checking
- If needed, full static context checking can be
achieved by doing attribute evaluation before
stating the interpretation.
164.1.2 Iterative interpretation
- The structure of an iterative interpreter
consists of - a flat loop over a case statement which contains
a code segment for each node type - the code segment of a given node type implements
the semantics of the node type, as described in
the language definition manual. - It requires
- A fully annotated and threaded AST, and
- Maintains an active-node pointer, which points to
the node to be interpreted, the active node. - It repeatedly runs the code segment for the node
pointed at by the active-node pointer - This code sets the active-node pointer to another
node, its successor, thus leading the interpreter
to that node.
17(No Transcript)
18(No Transcript)
194.1.2 Iterative interpretation
- The iterative interpreter possesses much more
information about run-time events inside a
program than a compiled program does, but less
than a recursive interpreter. - A recursive interpreter can maintain an arbitrary
information for a variable by storing it in the
symbol table, whereas iterative interpreter only
has a value at a give address. - Remedy a shadow memory parallel to the memory
array maintained by the interpreter. - Each byte in the shadow memory has 256
possibilities, for example, This byte is
uninitialized, This byte is a non-first byte of
a pointer, This byte belongs to a read-only
array, This byte is part of the routine call
linkage, etc.
204.1.2 Iterative interpretation
- The shadow data can be used for interpreter-time
checking, for example, - To detect the use of uninitialized memory,
- Incorrectly aligned data access,
- Overwritting read-only and system data, and etc.
214.1.2 Iterative interpretation
- Some iterative interpreter can store the AST in a
single array, because - Easier to write it to a file
- A more compact representation
- Historical and conceptual
22(No Transcript)
23(No Transcript)
244.1.2 Iterative interpretation
- Iterative interpreters are usually somewhat
easier to construct than recursive interpreters - They are much faster but yield less run-time
diagnostics. - Iterative interpreters are much easier to
construct than compilers and - They yield far superior run-time diagnostics.
- Much slower than compiler version
- Between 100 and 1000 times slower, but after
optimization interpreter reduced the loss perhaps
to a factor of 30 or less. - Advantages
- Increased portability
- Increased security, for example, in Java
254.2 Code generation
- Compilation produces object code from the
intermediate code tree through a process code
generation. - Basic concept
- The systematical replacement of nodes and
subtrees of the AST by target code segment, in a
way that the semantics is preserved - A linearization phase, producing a linear
sequence of instructions from the rewritten AST - The replacement process is called tree rewriting
- The linearization is controlled by the data-flow
and flow-of-control requirements of the target
code segments.
26(No Transcript)
27Ra
Ra
9
9
2
Rt
2
mem
Load_Byte (bRd)Rc,4,Rt
_at_b
Ra
Rd
Load_Address 9Rt,2,Ra
Rc
4
Load_Byte (bRd)Rc,4,Rt
284.2 Code generation
- Three main issues in code generation
- Code selection
- Which part of the AST will be rewritten with
which template, using which substitutions for
instruction parameters? - Register allocation
- What computational results are kept in registers?
Note that it is not certain that there will be
enough registers for all values used and results
obtained. - Instruction ordering
- Which part of the code is produced first and
which later?
294.2 Code generation
- Optimal code generation is NP-complete
- Compromising by restricting the problem
- Consider only small parts of the AST at a time
- Assume that the target machine is simpler that it
actually is, by disregarding some of its
complicated features - Limit the possibilities in the three issues by
having conventions for their use.
304.2 Code generation
- Preprocessing AST node patterns are replaced by
other (better) AST node patterns - Code generation proper AST node patterns are
replaced by target code sequences, and - Postprocessing target code sequences are
replaced by other (better) target code sequences,
using peephole optimization
314.2.1 Avoiding code generation altogether
AST of source program P
Interpreter
an executable program, like a compiled program
A good way to do rapid prototyping, if the
interpreter is available
324.2.2 The starting point
- Classes of the nodes in an intermediate code tree
- Administration
- For example, declarations, module structure
indications, etc. - Code needed is minimal and almost trivial.
- Flow-of-control
- For example, if-then, multi-way choice from case
statements, computed gotos, function calls,
exception handling, method application, Prolog
rule selection, RPC, etc. - Expressions
- Many of the nodes to be generated belongs to
expressions. - Techniques for code generation
- Trivial
- Simple, and
- Advanced
334.2.3 Trivial code generation
- There is a strong relationship between iterative
interpretation (II) and code generation (CG) - An II contains code segments performing the
actions required by the nodes in the AST - A CG generates code segments performing the
actions required by the node in the AST - Active node pointer is replaced by machine
instruction pointer
34(No Transcript)
35(No Transcript)
364.2.3 Trivial code generation
- At first sight it may seem pointless to compile
an expression in C to code in C, and the code
obtained is inefficient, but still several points
have been made - Compilation has taken in a real sense
- The code generator was obtained with minimal
effort - The process can be repeated for much more
complicated source languages - Two improvements
- the threaded code
- partial evaluation
374.2.3.1 Threaded code
- The code of Fig. 4.13 is very repetitive, and the
idea is to pack the code segment into routines,
possibly with parameters. - called threaded code
384.2.3.1 Threaded code
- The advantage of threaded code is that it is
small. - It is mainly used in process control and embedded
systems, to control hardware with limited
processing power, for example palmtop and
telephone. - If the ultimate in code size reduction is
desired, the routines can be numbered and the
list of calls can be replaced by an array of
routine numbers.
394.2.3.2 Partial evaluation
- The process of performing part of a computation
while generating code for the rest of the
computation is called partial evaluation. - It is a very general and powerful technique for
program simplification and optimization. - Many researchers believe that
- many of the existing optimization techniques are
special cases of partial evaluation - and that better knowledge of it would allow us to
obtain very powerful optimizers, - thus simplifying compilation, program generation,
and even program design.
40(No Transcript)
41(No Transcript)
424.2.4 Simple code generation
- Two machine types are considered
- Pure stack machine and pure register machine
- A pure stack machine
- uses a stack to store and manipulate values
- it has no registers.
- It has two types of instructions
- those that move or copy values between the top of
the stack and elsewhere and - those that do operations on the top element or
elements of the stack. - Two important data administration pointer
- the stack pointer, SP, and
- the base pointer, BP.
43(No Transcript)
44(No Transcript)
454.2.4 Simple code generation
Push_Local p //Push value of p-th local onto
stack Push_Const 5 //Push value 5 onto
stack Add_Top2 //Add top two
elements Store_Local p //Pop and store result
back in p-th local.
464.2.4 Simple code generation
- A pure register machine has
- a memory to store values in,
- a set of registers to perform operations on, and
- two set of instructions.
- One set contains instructions to copy values
between the memory and a register. - The other perform operations on the values in two
registers and leave the result in one of them.
474.2.4 Simple code generation
- The code for pp5 on a register-memory machine
would be
Load_Mem p,R1 Load_Const 5,R2 Add_Reg
R2,R1 Store_Reg R1,p
484.2.4.1 Simple code generation for a stack machine
494.2.4.1 Simple code generation for a stack machine
504.2.4.1 Simple code generation for a stack machine
Push_Local b Push_Local b Mult_Top2 Push_Con
st 4 Push_Local a Push_Local
c Mult_Top2 Mult_Top2 Store_Top2
514.2.4.1 Simple code generation for a stack machine
524.2.4.2 Simple code generation for a register
machine
- Much of what was said about code generation for
stack machine applies to the register machine as
well. - The AST of the machine instructions from Fig. 4.22
53(No Transcript)
544.2.4.2 Simple code generation for a register
machine
- Use depth-first code generation again, but have
to content with register this time. - Method
- Make order that in the evaluation of each node in
the expression tree, - the result of the expression is expected in a
given register, the target register, - and that a given set of auxiliary register is
available to help get it there.
55a number
564.2.4.2 Simple code generation for a register
machine
- Actually no set manipulation is necessary in this
case, the set can be implemented as a stack of
registers. - We pick the top of the register stack for Target
2, which leaves us the rest of the stack Aux 2.
57(No Transcript)
584.2.4.2 Simple code generation for a register
machine
- Weighted register allocation
- Motivating example
- We call the number of registers required by a
node its weight. - The weight of a subtree can be determined simply
by a depth-first prescan. - If the left tree is heavier, we compile it first.
- The same applies vice versa to the right tree if
it is heavier. - This technique is sometimes called Sethi-Ullman
numbering. - Generalization to operations with n operands (see
pp. 311-314)
593 registers
4 registers
60(No Transcript)
614.2.4.2 Simple code generation for a register
machine
- Spilling registers
- Problem the expression to be translated may
require more registers than the can get. - Solution one or more values from registers have
to be stored in memory locations to be retrieved
later. Register spilling technique - A simple method
- Consider the tree for a very complicated
expression has a top region weighting higher than
the registers we have. - Detach some of the subtrees and store to
temporary variables - This leaves us with a set of temporary variables
with expressions for which we can generate code
since we have enough registers.
62(No Transcript)
63(No Transcript)
644.2.4.3 Compilation on the stack/ compilation by
symbolic interpretation
- Employ compilation by symbolic interpretation as
a full code generation technique. - By extending the approximate stack representation
- Compilation by symbolic interpretation uses the
same technique but does keep the representation
exact. - Register and variable descriptor, or regvar
descriptor
654.2.5 Code generation for basic blocks
- As explained previously, instruction selection,
register allocation, and instruction ordering are
intertwined, and - finding the optimal rewriting of the AST with
available instruction templates is NP-complete. - We present here three techniques that each
addresses part of the problem. - Basic block, is mainly concerned with
optimization, instruction selection, and
instruction ordering in limited part of the AST.
(4.2.5) - Bottom-up tree rewriting shows how a very good
instruction selector can be generated
automatically for very general instruction sets
and cost functions, under the assumption that
enough registers are available. (4.2.6) - Register allocation by graph coloring explains a
good and very general heuristic for register
allocation. (4.2.7)
664.2.5 Code generation for basic blocks
- The idea of basic block is used in code
generation. - Basic block
- A part of the control graph containing no splits
(jumps) or combines (labels). - Usually consider only maximal basic block, basic
blocks which cannot be extended by including
adjacent nodes without violating the definition
of a basic block. - In the imperative languages, basic blocks consist
exclusively of expressions and assignments, which
flow each other sequentially. - In practice, this is also true for functional and
logic languages.
674.2.5 Code generation for basic blocks
- The effect of an assignment in a basic block
- may be local to the block, the resulting value is
not used anywhere else and the variable is dead
at the end of basic block, or - it may be non-local, in which case the variable
is an output variable of the basic block. - In general, simpler means, for example the scope
rule of C, is sufficient to determine whether
local or non-local. - If we do not have this information, we have to
assume that all variables are live at basic block
end.
684.2.5 Code generation for basic blocks
- We will now look at one way to generate code for
a basic block. - First, convert the AST and the control graph
implied in it into a dependency graph, a dag. - Then rewrite the dependency graph to code.
- Use the code in Fig. 4.41 as an example. We
assume that - n is local and dead at the end
- x and y are live at block exit.
694.2.5.1 From AST to dependency graph
- The threaded AST is not appropriate for code
generation - Control flow graphs are more restricted than
necessary - Only the data dependencies have to be obeyed.
- It is easier to generate code from data
dependency graph than control flow graph.
704.2.5.1 From AST to dependency graph
- Two main sources of data dependencies in the AST
of a basic block - Data flow inside expressions
- Data flow from values assigned to variables to
the use of these variables in further code - Third source of data dependencies
- concerning pointers (4.2.5.3)
- Three observations
- The order of the evaluation of operation in
expression in immaterial, as long as the data
dependencies inside the expressions are
respected. - If the values of a variable V is used more than
once in a basic block, the order of these uses is
immaterial, as long as each use comes after the
assignment it depends on and before the next
assignment to V. - The order in which the assignments to variables
are executed is immaterial, as long as all
assignments to a specific variable V are executed
in sequential, left-to-right, order.
714.2.5.1 From AST to dependency graph
- The previous observations give us a simple
algorithm to convert the AST of a basic block
into a data dependency graph. - Replace the arcs that connect the nodes in the
AST of the basic block by data dependency arrows. - ? destination
- Others from parent nodes downward
- Insert an arrow from each variable used as on
operand to the assignment that set its value, or
to the beginning of the basic block if V was an
input variable. - Insert an arrow from each assignment to a
variable V to the previous assignment to V, if
present. - Designate the nodes that describe the output
values as roots of the graph. - Remove the -nodes from their arrows.
72(No Transcript)
73- Replace the arcs that connect the nodes in the
AST of the basic block by data dependency arrows. - ? destination
- Others from parent nodes downward
n
x
n
y
c
a
n
1
1
n
d
b
n
n
74- Insert an arrow from each variable used as on
operand to the assignment that set its value, or
to the beginning of the basic block if V was an
input variable.
n
x
n
y
c
a
n
1
1
n
d
b
n
n
75- Insert an arrow from each assignment to a
variable V to the previous assignment to V, if
present.
n
x
n
y
c
a
n
1
1
n
d
b
n
n
76- Designate the nodes that describe the output
values as roots of the graph.
n
x
n
y
c
a
n
1
1
n
d
b
n
n
77- Remove the -nodes from their arrows.
n
x
n
y
c
a
n
1
1
n
d
b
n
n
784.2.5.1 From AST to dependency graph
- An assignment in the data dependency graph just
passes on the value and can be short-circuited. - Also we can eliminate from the graph all nodes
not reachable through al least one of the root.
794.2.5.1 From AST to dependency graph
- Fig. 4-44 has the property that if specifies the
semantics of the basic block precisely - All required nodes and data dependencies are
present and no node or data dependency is
superfluous. - Two techniques for converting the data dependency
graph into efficient machine instructions. - Common sub-expression elimination
- Triple representation of dependency graph
804.2.5.1 From AST to dependency graph Common
sub-expression elimination
double quadsaabb double cross_prod2ab xqua
dscross_prod xquadscross_prod
xaa2abbb xaa2abbb
aibi
(a4i)(b4i)
xaa2abbb ab0 xaa-2abbb
Not necessarily common sub-expressions
814.2.5.1 From AST to dependency graph Common
sub-expression elimination
- Once we have the data dependency graph, finding
the common sub-expressions is simple. - Rule Two nodes that have the operands, the
operator, and the dependencies in common can be
combined into one node. - Detecting that two or more nodes in a graph are
the same is usually implemented by storing some
representation of each node in a hash table. - If the hash value of a node depends on its
operands, its operator, and its dependencies,
common nodes will hash to the same vale.
82(No Transcript)
834.2.5.1 From AST to dependency graph The triple
representation of the data dependency graph
- Traditionally, data dependency graphs are
implemented as array of triples. - A triple is a record with three fields
representing an operator with its two operands,
and corresponding to an operator node in the data
dependency graph.
84(No Transcript)
854.2.5.2 From dependency graph to code
- Generating instructions from a data dependency
graph is very similar to doing so from an AST - The nodes are rewritten by machine instruction
templates and the result is linearlized. - Main difference the former allows more leeway
than the latter - Assume a register-memory machine is used.
864.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph
- In the absence of ordering criteria, two
orderings suggest themselves - Early evaluation code for a node is issued as
soon as the code for all its operands has been
issued. - Late evaluation code for a node is issued as
late as possible. - Early evaluation ordering tends to require more
registers than late evaluation ordering - Since EEO creates values as soon as possible,
which may be long before they are used, and the
values have to be kept in registers.
874.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph
- Available ladder sequence
- Available LSs start at root node, continuously
along left operands but may continue along the
right operand for commutative operators, may stop
anywhere, but must stop at leaves. - Code generated for a given LS
- Starts at its last node, by loading a leaf
variable if the sequence ends in a leaf, or an
intermediate value if the sequence ends earlier. - Working backwards along the sequence, code is
generated for each of the operation nodes. - Finally the resulting value is stored as
indicated in the root node.
88(No Transcript)
894.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph
- Simple heuristic ordering algorithm combining the
identification of ladder sequence with late
evaluation - Find an acceptable LS S that has the property
that none of its nodes has more than one incoming
dependency. - If any operand of a node N in S is not a leaf but
another node M, associate a new pseudo-register R
with M if it does not have one already - Use R as the operand in the code generated for N
and make M an additional root of the dependency
graph. - Generate code for the LS S, using R1 as the
ladder register. - Remove the LS S from the DDG.
- Repeat steps 1 through 4 until the entire DDG has
been consumed and rewritten to code.
90Generate the rightmost one first Load_Reg X1,
R1 Add_Const 1, R1 Mult_Mem d, R1 Store_Reg R1, y
Two available LS without multiple incoming
dependencies
Generate the next available LS Load_Reg X1,
R1 Mult_Ref X1, R1 Add_Mem b, R1 Add_Mem c,
R1 Store_Reg R1, x
91Generate the remaining LS Load_Mem a,
R1 Add_Const 1, R1 Load_Reg R1, X1
924.2.5.2 From dependency graph to codeRegister
allocation for the linearized code
- One thing remains to be done
- The pseudo-registers have to be mapped onto real
registers, or failing that, to memory locations. - A simple method
- Map the pseudo-registers onto real registers in
the order of appearance, and when running out
registers, we map the remaining ones onto memory
locations. - For a machine with at least two registers, R1 and
R2, the resulting code is shown in Fig. 4.54.
93stupid instructions generated
944.2.5.2 From dependency graph to code Register
allocation for the linearized code
- Ways to deal with stupid instruction generated
- Improving the code generation algorithm
- Do register tracking (4.2.4.3) and
- Do peephole optimization (4.2.12)
954.2.5.3 Code optimization in the presence of
pointers
96(No Transcript)
974.2.6 BURS code generation and dynamic programming
- We consider here machines with great variety of
instructions, rather than simple ones before.
98(No Transcript)
99(No Transcript)
1004.2.6 BURS code generation and dynamic programming
1014.2.6 BURS code generation and dynamic programming
1024.2.6 BURS code generation and dynamic programming
- Two main problems identified
- How do we find all possible rewrites, and how do
we represent them? - Solved by a bottom-up rewriting system, BURS
- How do we find the best/cheapest rewrite among
all possibilities, preferably in time linear in
the size of the expression to be translated? - Solved by a form of dynamic programming
1034.2.6 BURS code generation and dynamic programming
- In BURB, the code is generated in three scans
over the input tree - An instruction-selection scan bottom-up,
identifies possible instruction for each node by
pattern matching - By a post-order recursive visit
- An instruction-selection scan top-down, selects
at each node one instruction out of the possible
instructions collected during the previous scan
(most interesting) - By a pre-order recursive visit
- A code-generating scan bottom-up, emits the
instructions in the correct linearized order. - By a post-order recursive visit
1044.2.6 BURS code generation and dynamic programming
- Four variants of the instruction-selection scan
- Using item set (4.2.6.1)
- Using a tree automata (4.2.6.2)
- Using dynamic programming (4.2.6.3)
- Combining the above three to become an efficient
bottom-up scan (4.2.6.4)
1054.2.6.1 Bottom-up pattern matching
- The algorithm for bottom-up pattern matching is a
tree version of the lexical algorithm from
Section 2.1.6.1.
106(No Transcript)
107(No Transcript)
108(No Transcript)
1094.2.6.1 Bottom-up pattern matching
1104.2.6.1 Bottom-up pattern matching
1114.2.6.1 Bottom-up pattern matching
112(No Transcript)
113(No Transcript)
114(No Transcript)