4' Processing the intermediate code

About This Presentation

Title:

4' Processing the intermediate code

Description:

The AST still bears very much the traces of the source language and the the ... replaced by other (better) target code sequences, using peephole optimization ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 115

Provided by: asu9

Category:

more less

Transcript and Presenter's Notes

Title: 4' Processing the intermediate code

1
4. Processing the intermediate code

From Chapter 4, Modern Compiler Design, by Dick
Grunt et al.

2
4.0 Background

The AST still bears very much the traces of the
source language and the the programming paradigm
it belongs to
higher-level constructs are still represented by
nodes and subtrees.
The next step in processing the AST
transformation to intermediate code
IC generation
serves to reduce the set of the specific node
types to a small set of general concepts that can
be implemented easily on actual machines.
IC generation
finds the language-characteristic nodes and
subtrees in the AST and rewrites them into
subtrees that employ only a small number of
features, each of which corresponds rather
closely to a set of machine instructions.
The resulting tree should probably be called an
intermediate code tree.

3
4.0 Background

The standard IC tree features
expressions, including assignments, routine
calls, procedure headings, and return statements,
and conditional and unconditional jumps.
Administrative features
memory allocation for global variables,
activation record allocation, and
module linkage information.
IC generation
increases the size of the AST, bit
reduces the conceptual complexity

4
Deferred to Chs.6 through 9
This chapter
5
4.0 Background

Roadmap
4. Processing the intermediate code
4.1 Interpretation
4.2 Code generation
4.3 Assemblers, linkers, and loaders
A sobering thought
whatever the processing method, writing the
run-time system and library routines used by the
programs will be a substantial part of the work.
Little advice can be given on this most of it is
just coding, and usually there is much of it.

6
4.1 Interpretation

The simplest way to have the actions expressed by
the source program performed is to
process the AST using an interpreter.
An interpreter is
a program that considers the nodes of the AST in
the correct order and performs the actions
prescribed for those nodes by the semantics of
the language.
Two varieties of interpreter
Recursive works directly on the AST and requires
less preprocessing
Iterative works on a linearized version of the
AST but requires more preprocessing.

7
4.1.1 Recursive interpretation

A recursive interpreter has an interpreting
routine for each node type in the AST.
Such an interpreting routine calls other similar
routines, depending on its children
it essentially does what it says in the language
definition manual.
This architecture is possible because the meaning
of a given language construct is defined as a
function of the meanings of its components.
For example, if-statement condition then part
else part

8
(No Transcript)
9
4.1.1 Recursive interpretation

An important ingredient in a recursive
interpreter is the uniform self-identifying data
representation.
The interpreter has to manipulate data values
defined in the program being interpreted, but the
types and sizes of these values are not known at
the time the interpreter is written.
This makes it necessary to implement these values
in the interpreter as variable-size records that
specify the type of the run-time value, its size,
and the run-time value itself.

10
(No Transcript)
11
4.1.1 Recursive interpretation

Another important feature is the status
indicator.
It is used to direct the flow of control.
Its primary component the mode of operation of
the interpreter.
An enumeration value, like Normal mode,
indicating sequential flow of control, but
other values are available, to indicate jumps,
exceptions, function returns, etc.
Its second component a value to supply
information about non-sequential flow of control.
Return mode, Exception mode, Jump mode

12
4.1.1 Recursive interpretation

Each interpreting routine checks the status
indicator after each call to another routine, to
see how to carry on.
If Normal mode, the routine carries on normally.
Otherwise, it checks to see if the mode is one it
should handle
If it is, it does so, but
If it is not, the routine returns immediately, to
let one of the parent routines handle the mode.

PROCEDURE Elaborate return with expression
statement (Rwe node) SET Result TO Evaluate
expression (Rwe node .expression) IF Status
.mode / Normal mode RETURN SET Status .mode
TO Return mode SET Status .value TO Result
13
(No Transcript)
14
4.1.1 Recursive interpretation

Variables, named constants, and other named
entities are handled by entering them into the
symbol table, in the way they are described in
the manual.
It is useful to attach additional data to the
entry.
E.g., if in the manual the entry for declaration
of a variable V of type T states that room
should be allocated for it on the stack,
we allocate the required room on the heap and
enter into the symbol table under the name V a
record with the following fields
A pointer to the name V,
The file name and line number of its declaration,
An indication of the kind of declarable
(variable, constant, field selector, etc.),
A pointer to the type T,
A pointer to newly allocated room for the value
of V,
A bit telling whether or not V has been
initialized, if known,
One or more scope- and stack-related pointers,
depending on the language,
Perhaps other data, depending on the language.

15
4.1.1 Recursive interpretation

A recursive interpreter can be written relatively
quickly, and is useful for rapid prototyping
It is not the architecture of choice for
heavy-duty interpreter.
A secondary advantage it can help the language
designer to debug the design of the language and
its description.
Disadvantages
Speed of execution
May be a factor of 1000 or more lower than what
could be achieved with a compiler
Can be improved by doing judicious memorization.
Lack of static context checking
If needed, full static context checking can be
achieved by doing attribute evaluation before
stating the interpretation.

16
4.1.2 Iterative interpretation

The structure of an iterative interpreter
consists of
a flat loop over a case statement which contains
a code segment for each node type
the code segment of a given node type implements
the semantics of the node type, as described in
the language definition manual.
It requires
A fully annotated and threaded AST, and
Maintains an active-node pointer, which points to
the node to be interpreted, the active node.
It repeatedly runs the code segment for the node
pointed at by the active-node pointer
This code sets the active-node pointer to another
node, its successor, thus leading the interpreter
to that node.

17
(No Transcript)
18
(No Transcript)
19
4.1.2 Iterative interpretation

The iterative interpreter possesses much more
information about run-time events inside a
program than a compiled program does, but less
than a recursive interpreter.
A recursive interpreter can maintain an arbitrary
information for a variable by storing it in the
symbol table, whereas iterative interpreter only
has a value at a give address.
Remedy a shadow memory parallel to the memory
array maintained by the interpreter.
Each byte in the shadow memory has 256
possibilities, for example, This byte is
uninitialized, This byte is a non-first byte of
a pointer, This byte belongs to a read-only
array, This byte is part of the routine call
linkage, etc.

20
4.1.2 Iterative interpretation

The shadow data can be used for interpreter-time
checking, for example,
To detect the use of uninitialized memory,
Incorrectly aligned data access,
Overwritting read-only and system data, and etc.

21
4.1.2 Iterative interpretation

Some iterative interpreter can store the AST in a
single array, because
Easier to write it to a file
A more compact representation
Historical and conceptual

22
(No Transcript)
23
(No Transcript)
24
4.1.2 Iterative interpretation

Iterative interpreters are usually somewhat
easier to construct than recursive interpreters
They are much faster but yield less run-time
diagnostics.
Iterative interpreters are much easier to
construct than compilers and
They yield far superior run-time diagnostics.
Much slower than compiler version
Between 100 and 1000 times slower, but after
optimization interpreter reduced the loss perhaps
to a factor of 30 or less.
Advantages
Increased portability
Increased security, for example, in Java

25
4.2 Code generation

Compilation produces object code from the
intermediate code tree through a process code
generation.
Basic concept
The systematical replacement of nodes and
subtrees of the AST by target code segment, in a
way that the semantics is preserved
A linearization phase, producing a linear
sequence of instructions from the rewritten AST
The replacement process is called tree rewriting
The linearization is controlled by the data-flow
and flow-of-control requirements of the target
code segments.

26
(No Transcript)
27
Ra

Ra

9

9
2
Rt
2
mem
Load_Byte (bRd)Rc,4,Rt

_at_b
Ra
Rd

Load_Address 9Rt,2,Ra
Rc
4
Load_Byte (bRd)Rc,4,Rt
28
4.2 Code generation

Three main issues in code generation
Code selection
Which part of the AST will be rewritten with
which template, using which substitutions for
instruction parameters?
Register allocation
What computational results are kept in registers?
Note that it is not certain that there will be
enough registers for all values used and results
obtained.
Instruction ordering
Which part of the code is produced first and
which later?

29
4.2 Code generation

Optimal code generation is NP-complete
Compromising by restricting the problem
Consider only small parts of the AST at a time
Assume that the target machine is simpler that it
actually is, by disregarding some of its
complicated features
Limit the possibilities in the three issues by
having conventions for their use.

30
4.2 Code generation

Preprocessing AST node patterns are replaced by
other (better) AST node patterns
Code generation proper AST node patterns are
replaced by target code sequences, and
Postprocessing target code sequences are
replaced by other (better) target code sequences,
using peephole optimization

31
4.2.1 Avoiding code generation altogether
AST of source program P
Interpreter
an executable program, like a compiled program
A good way to do rapid prototyping, if the
interpreter is available
32
4.2.2 The starting point

Classes of the nodes in an intermediate code tree
Administration
For example, declarations, module structure
indications, etc.
Code needed is minimal and almost trivial.
Flow-of-control
For example, if-then, multi-way choice from case
statements, computed gotos, function calls,
exception handling, method application, Prolog
rule selection, RPC, etc.
Expressions
Many of the nodes to be generated belongs to
expressions.
Techniques for code generation
Trivial
Simple, and
Advanced

33
4.2.3 Trivial code generation

There is a strong relationship between iterative
interpretation (II) and code generation (CG)
An II contains code segments performing the
actions required by the nodes in the AST
A CG generates code segments performing the
actions required by the node in the AST
Active node pointer is replaced by machine
instruction pointer

34
(No Transcript)
35
(No Transcript)
36
4.2.3 Trivial code generation

At first sight it may seem pointless to compile
an expression in C to code in C, and the code
obtained is inefficient, but still several points
have been made
Compilation has taken in a real sense
The code generator was obtained with minimal
effort
The process can be repeated for much more
complicated source languages
Two improvements
the threaded code
partial evaluation

37
4.2.3.1 Threaded code

The code of Fig. 4.13 is very repetitive, and the
idea is to pack the code segment into routines,
possibly with parameters.
called threaded code

38
4.2.3.1 Threaded code

The advantage of threaded code is that it is
small.
It is mainly used in process control and embedded
systems, to control hardware with limited
processing power, for example palmtop and
telephone.
If the ultimate in code size reduction is
desired, the routines can be numbered and the
list of calls can be replaced by an array of
routine numbers.

39
4.2.3.2 Partial evaluation

The process of performing part of a computation
while generating code for the rest of the
computation is called partial evaluation.
It is a very general and powerful technique for
program simplification and optimization.
Many researchers believe that
many of the existing optimization techniques are
special cases of partial evaluation
and that better knowledge of it would allow us to
obtain very powerful optimizers,
thus simplifying compilation, program generation,
and even program design.

40
(No Transcript)
41
(No Transcript)
42
4.2.4 Simple code generation

Two machine types are considered
Pure stack machine and pure register machine
A pure stack machine
uses a stack to store and manipulate values
it has no registers.
It has two types of instructions
those that move or copy values between the top of
the stack and elsewhere and
those that do operations on the top element or
elements of the stack.
Two important data administration pointer
the stack pointer, SP, and
the base pointer, BP.

43
(No Transcript)
44
(No Transcript)
45
4.2.4 Simple code generation

The code for pp5is

Push_Local p //Push value of p-th local onto
stack Push_Const 5 //Push value 5 onto
stack Add_Top2 //Add top two
elements Store_Local p //Pop and store result
back in p-th local.
46
4.2.4 Simple code generation

A pure register machine has
a memory to store values in,
a set of registers to perform operations on, and
two set of instructions.
One set contains instructions to copy values
between the memory and a register.
The other perform operations on the values in two
registers and leave the result in one of them.

47
4.2.4 Simple code generation

The code for pp5 on a register-memory machine
would be

Load_Mem p,R1 Load_Const 5,R2 Add_Reg
R2,R1 Store_Reg R1,p
48
4.2.4.1 Simple code generation for a stack machine
49
4.2.4.1 Simple code generation for a stack machine
50
4.2.4.1 Simple code generation for a stack machine
Push_Local b Push_Local b Mult_Top2 Push_Con
st 4 Push_Local a Push_Local
c Mult_Top2 Mult_Top2 Store_Top2
51
4.2.4.1 Simple code generation for a stack machine
52
4.2.4.2 Simple code generation for a register
machine

Much of what was said about code generation for
stack machine applies to the register machine as
well.
The AST of the machine instructions from Fig. 4.22

53
(No Transcript)
54
4.2.4.2 Simple code generation for a register
machine

Use depth-first code generation again, but have
to content with register this time.
Method
Make order that in the evaluation of each node in
the expression tree,
the result of the expression is expected in a
given register, the target register,
and that a given set of auxiliary register is
available to help get it there.

55
a number
56
4.2.4.2 Simple code generation for a register
machine

Actually no set manipulation is necessary in this
case, the set can be implemented as a stack of
registers.
We pick the top of the register stack for Target
2, which leaves us the rest of the stack Aux 2.

57
(No Transcript)
58
4.2.4.2 Simple code generation for a register
machine

Weighted register allocation
Motivating example
We call the number of registers required by a
node its weight.
The weight of a subtree can be determined simply
by a depth-first prescan.
If the left tree is heavier, we compile it first.
The same applies vice versa to the right tree if
it is heavier.
This technique is sometimes called Sethi-Ullman
numbering.
Generalization to operations with n operands (see
pp. 311-314)

59
3 registers
4 registers
60
(No Transcript)
61
4.2.4.2 Simple code generation for a register
machine

Spilling registers
Problem the expression to be translated may
require more registers than the can get.
Solution one or more values from registers have
to be stored in memory locations to be retrieved
later. Register spilling technique
A simple method
Consider the tree for a very complicated
expression has a top region weighting higher than
the registers we have.
Detach some of the subtrees and store to
temporary variables
This leaves us with a set of temporary variables
with expressions for which we can generate code
since we have enough registers.

62
(No Transcript)
63
(No Transcript)
64
4.2.4.3 Compilation on the stack/ compilation by
symbolic interpretation

Employ compilation by symbolic interpretation as
a full code generation technique.
By extending the approximate stack representation
Compilation by symbolic interpretation uses the
same technique but does keep the representation
exact.
Register and variable descriptor, or regvar
descriptor

65
4.2.5 Code generation for basic blocks

As explained previously, instruction selection,
register allocation, and instruction ordering are
intertwined, and
finding the optimal rewriting of the AST with
available instruction templates is NP-complete.
We present here three techniques that each
addresses part of the problem.
Basic block, is mainly concerned with
optimization, instruction selection, and
instruction ordering in limited part of the AST.
(4.2.5)
Bottom-up tree rewriting shows how a very good
instruction selector can be generated
automatically for very general instruction sets
and cost functions, under the assumption that
enough registers are available. (4.2.6)
Register allocation by graph coloring explains a
good and very general heuristic for register
allocation. (4.2.7)

66
4.2.5 Code generation for basic blocks

The idea of basic block is used in code
generation.
Basic block
A part of the control graph containing no splits
(jumps) or combines (labels).
Usually consider only maximal basic block, basic
blocks which cannot be extended by including
adjacent nodes without violating the definition
of a basic block.
In the imperative languages, basic blocks consist
exclusively of expressions and assignments, which
flow each other sequentially.
In practice, this is also true for functional and
logic languages.

67
4.2.5 Code generation for basic blocks

The effect of an assignment in a basic block
may be local to the block, the resulting value is
not used anywhere else and the variable is dead
at the end of basic block, or
it may be non-local, in which case the variable
is an output variable of the basic block.
In general, simpler means, for example the scope
rule of C, is sufficient to determine whether
local or non-local.
If we do not have this information, we have to
assume that all variables are live at basic block
end.

68
4.2.5 Code generation for basic blocks

We will now look at one way to generate code for
a basic block.
First, convert the AST and the control graph
implied in it into a dependency graph, a dag.
Then rewrite the dependency graph to code.
Use the code in Fig. 4.41 as an example. We
assume that
n is local and dead at the end
x and y are live at block exit.

69
4.2.5.1 From AST to dependency graph

The threaded AST is not appropriate for code
generation
Control flow graphs are more restricted than
necessary
Only the data dependencies have to be obeyed.
It is easier to generate code from data
dependency graph than control flow graph.

70
4.2.5.1 From AST to dependency graph

Two main sources of data dependencies in the AST
of a basic block
Data flow inside expressions
Data flow from values assigned to variables to
the use of these variables in further code
Third source of data dependencies
concerning pointers (4.2.5.3)
Three observations
The order of the evaluation of operation in
expression in immaterial, as long as the data
dependencies inside the expressions are
respected.
If the values of a variable V is used more than
once in a basic block, the order of these uses is
immaterial, as long as each use comes after the
assignment it depends on and before the next
assignment to V.
The order in which the assignments to variables
are executed is immaterial, as long as all
assignments to a specific variable V are executed
in sequential, left-to-right, order.

71
4.2.5.1 From AST to dependency graph

The previous observations give us a simple
algorithm to convert the AST of a basic block
into a data dependency graph.
Replace the arcs that connect the nodes in the
AST of the basic block by data dependency arrows.
? destination
Others from parent nodes downward
Insert an arrow from each variable used as on
operand to the assignment that set its value, or
to the beginning of the basic block if V was an
input variable.
Insert an arrow from each assignment to a
variable V to the previous assignment to V, if
present.
Designate the nodes that describe the output
values as roots of the graph.
Remove the -nodes from their arrows.

72
(No Transcript)
73

Replace the arcs that connect the nodes in the
AST of the basic block by data dependency arrows.
? destination
Others from parent nodes downward

n
x

n
y

c
a
n
1

1
n
d

b
n
n
74

Insert an arrow from each variable used as on
operand to the assignment that set its value, or
to the beginning of the basic block if V was an
input variable.

n
x

n
y

c
a
n
1

1
n
d

b
n
n
75

Insert an arrow from each assignment to a
variable V to the previous assignment to V, if
present.

n
x

n
y

c
a
n
1

1
n
d

b
n
n
76

Designate the nodes that describe the output
values as roots of the graph.

n
x

n
y

c
a
n
1

1
n
d

b
n
n
77

Remove the -nodes from their arrows.

n
x

n
y

c
a
n
1

1
n
d

b
n
n
78
4.2.5.1 From AST to dependency graph

An assignment in the data dependency graph just
passes on the value and can be short-circuited.
Also we can eliminate from the graph all nodes
not reachable through al least one of the root.

79
4.2.5.1 From AST to dependency graph

Fig. 4-44 has the property that if specifies the
semantics of the basic block precisely
All required nodes and data dependencies are
present and no node or data dependency is
superfluous.
Two techniques for converting the data dependency
graph into efficient machine instructions.
Common sub-expression elimination
Triple representation of dependency graph

80
4.2.5.1 From AST to dependency graph Common
sub-expression elimination
double quadsaabb double cross_prod2ab xqua
dscross_prod xquadscross_prod
xaa2abbb xaa2abbb
aibi
(a4i)(b4i)
xaa2abbb ab0 xaa-2abbb
Not necessarily common sub-expressions
81
4.2.5.1 From AST to dependency graph Common
sub-expression elimination

Once we have the data dependency graph, finding
the common sub-expressions is simple.
Rule Two nodes that have the operands, the
operator, and the dependencies in common can be
combined into one node.
Detecting that two or more nodes in a graph are
the same is usually implemented by storing some
representation of each node in a hash table.
If the hash value of a node depends on its
operands, its operator, and its dependencies,
common nodes will hash to the same vale.

82
(No Transcript)
83
4.2.5.1 From AST to dependency graph The triple
representation of the data dependency graph

Traditionally, data dependency graphs are
implemented as array of triples.
A triple is a record with three fields
representing an operator with its two operands,
and corresponding to an operator node in the data
dependency graph.

84
(No Transcript)
85
4.2.5.2 From dependency graph to code

Generating instructions from a data dependency
graph is very similar to doing so from an AST
The nodes are rewritten by machine instruction
templates and the result is linearlized.
Main difference the former allows more leeway
than the latter
Assume a register-memory machine is used.

86
4.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph

In the absence of ordering criteria, two
orderings suggest themselves
Early evaluation code for a node is issued as
soon as the code for all its operands has been
issued.
Late evaluation code for a node is issued as
late as possible.
Early evaluation ordering tends to require more
registers than late evaluation ordering
Since EEO creates values as soon as possible,
which may be long before they are used, and the
values have to be kept in registers.

87
4.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph

Available ladder sequence
Available LSs start at root node, continuously
along left operands but may continue along the
right operand for commutative operators, may stop
anywhere, but must stop at leaves.
Code generated for a given LS
Starts at its last node, by loading a leaf
variable if the sequence ends in a leaf, or an
intermediate value if the sequence ends earlier.
Working backwards along the sequence, code is
generated for each of the operation nodes.
Finally the resulting value is stored as
indicated in the root node.

88
(No Transcript)
89
4.2.5.2 From dependency graph to
codeLinearlization of the data dependency graph

Simple heuristic ordering algorithm combining the
identification of ladder sequence with late
evaluation
Find an acceptable LS S that has the property
that none of its nodes has more than one incoming
dependency.
If any operand of a node N in S is not a leaf but
another node M, associate a new pseudo-register R
with M if it does not have one already
Use R as the operand in the code generated for N
and make M an additional root of the dependency
graph.
Generate code for the LS S, using R1 as the
ladder register.
Remove the LS S from the DDG.
Repeat steps 1 through 4 until the entire DDG has
been consumed and rewritten to code.

90
Generate the rightmost one first Load_Reg X1,
R1 Add_Const 1, R1 Mult_Mem d, R1 Store_Reg R1, y
Two available LS without multiple incoming
dependencies
Generate the next available LS Load_Reg X1,
R1 Mult_Ref X1, R1 Add_Mem b, R1 Add_Mem c,
R1 Store_Reg R1, x
91
Generate the remaining LS Load_Mem a,
R1 Add_Const 1, R1 Load_Reg R1, X1
92
4.2.5.2 From dependency graph to codeRegister
allocation for the linearized code

One thing remains to be done
The pseudo-registers have to be mapped onto real
registers, or failing that, to memory locations.
A simple method
Map the pseudo-registers onto real registers in
the order of appearance, and when running out
registers, we map the remaining ones onto memory
locations.
For a machine with at least two registers, R1 and
R2, the resulting code is shown in Fig. 4.54.

93
stupid instructions generated
94
4.2.5.2 From dependency graph to code Register
allocation for the linearized code

Ways to deal with stupid instruction generated
Improving the code generation algorithm
Do register tracking (4.2.4.3) and
Do peephole optimization (4.2.12)

95
4.2.5.3 Code optimization in the presence of
pointers
96
(No Transcript)
97
4.2.6 BURS code generation and dynamic programming

We consider here machines with great variety of
instructions, rather than simple ones before.

98
(No Transcript)
99
(No Transcript)
100
4.2.6 BURS code generation and dynamic programming
101
4.2.6 BURS code generation and dynamic programming
102
4.2.6 BURS code generation and dynamic programming

Two main problems identified
How do we find all possible rewrites, and how do
we represent them?
Solved by a bottom-up rewriting system, BURS
How do we find the best/cheapest rewrite among
all possibilities, preferably in time linear in
the size of the expression to be translated?
Solved by a form of dynamic programming

103
4.2.6 BURS code generation and dynamic programming

In BURB, the code is generated in three scans
over the input tree
An instruction-selection scan bottom-up,
identifies possible instruction for each node by
pattern matching
By a post-order recursive visit
An instruction-selection scan top-down, selects
at each node one instruction out of the possible
instructions collected during the previous scan
(most interesting)
By a pre-order recursive visit
A code-generating scan bottom-up, emits the
instructions in the correct linearized order.
By a post-order recursive visit

104
4.2.6 BURS code generation and dynamic programming

Four variants of the instruction-selection scan
Using item set (4.2.6.1)
Using a tree automata (4.2.6.2)
Using dynamic programming (4.2.6.3)
Combining the above three to become an efficient
bottom-up scan (4.2.6.4)

105
4.2.6.1 Bottom-up pattern matching

The algorithm for bottom-up pattern matching is a
tree version of the lexical algorithm from
Section 2.1.6.1.

106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
4.2.6.1 Bottom-up pattern matching
110
4.2.6.1 Bottom-up pattern matching
111
4.2.6.1 Bottom-up pattern matching
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)

Write a Comment

User Comments (0)