Title: Code Optimisation
1Code Optimisation
- systematically reduce disparity between
- code produced by compiler
- that produced by careful hand translation of code
- simpler techniques
- re order and merge code fragments
- more advanced techniques
- replace the source programs algorithm with more
efficient one - heavy optimisation - double or triple compilation
time (cost of optimisation)
2Code Optimisation
- where to draw the line between useful and
wasteful optimisation - a few straightforward techniques can produce most
of the required effect at low cost - examples would be constant folding, deferred
storage, global register allocation and code
motion out of loops - sub optimisation vs true optimisation
- can consider sub-optimisation as
- every technique relating to translation of single
code fragment in isolation - expression re-ordering, local register allocation
3Code Optimisation
- true optimisation
- most language definitions state
- programmer cannot rely on order of performance of
operations - even a simple translator can impose any
convenient order it wishes - take account of control context within which code
fragment operates - effect of code fragments executed before or after
- constant folding, global register allocation,
loop optimisation, peephole optimisation
4Code Optimisation
- argument against code optimisation is that
compilers which optimise produce invalid
translation - sometimes unavoidable and requires language
redefinition to clarify that restrictions apply - frequently arises because of optimisation process
- may be that changing the order of assignment
statements will produce the correct output - happens when particular optimisation technique
can only validly be used given certain properties
of the object code - compiler writer must verify that optimisation is
valid
5Code Optimisation
- net effect
- total change produced by executing the program
- ignoring the detailed mechanism by which that
change is brought about - optimisation mechanisms
- produce an object program with same net effect as
the source - may not produce it by exactly the same means
- all optimisations consider the net effect of
small fragments building up to entire procedures
6Code Optimisations
- produce translation which preserves the net
effect - may not follow the precise details of the
algorithm - such a translation depends on recognition of
essential order of events - from the initial order specified by the program
- essential order
- shows absence of order between many computations
- ex, event 1,2,3 can occur in any order before 4
7Code Optimisation
- absence of essential order between many events
makes optimisation possible - re-organise computation freely
- provided actual order of events obeys constraints
of essential order
8Code Optimisation
- basic block - smallest unit of code which can be
optimised - enter at beginning and doesnt contain any
intermediate labels or function calls - order of events is obvious and linear
- when control leaves the basic block
- as long as desired effect is achieved, user has
no way of telling if optimised - can generate code in any way that has minimum
run-time effort
9Code Optimisation
- because initial order of events within block is
linear - can use simple techniques for optimisation which
note the effect of assignment statements in the
symbol table descriptors of variables - end of block labelled statement, procedure call
- can arrive at label in different ways - must
abandon simple optimisations - effects of procedure call may also affect
optimisation
10Code Optimisation
- with extra effort
- can analyse the ways control is passed between
basic blocks - discover the regions of the source program
- reveal fragments which can be moved between
blocks and increase the range and effectiveness
of optimisation
11Code Optimisation
- constant folding, deferred storage, global
register allocation, redundant code elimination - mark the symbol table descriptor of a variable to
show - effect of the evaluation of expressions
- execution of assignment and other statements
- each mechanism must be abandoned at the end of a
block unless control flow within regions fully
analysed
12Code Optimisation
- possible sub-optimisation technique
- evaluate at compile time fragments of expressions
containing only constants - reduces size of object program and increases
speed - reasonable for user to write expression of
constants instead of just a number to increase
readability of source - similar to constant folding
13Code Optimisation
- constant folding
- translator keeps a record in the symbol table
descriptor of variables assigned constant value
within a basic block - can evaluate fragments of program at compile time
- example
b0 a3 if ugtb then xb64a yx2a
14Code Optimisation
Initial Code After constant folding stoz ,
b stoz , b / b 0 / loadn 1, 3 loadn
1, 3 store 1, a load 1, a / a 3 / load
1, u load 1, u skipge 1, b jumplt 1, e /
using b 0/ jump , e loadn 1, 192 /
0643 192 / load 1, a store 1, x /
x 192 / multn 1, 64 loadn 1, 200 / 192
23 200 / add 1, b store 1, y / y
200 / store 1, x e loadn 1, 2 /
now forget values of x and y / ashift 1,
a add 1, x store 1, y e...
15Code Optimisation
- initial code
- 14 instructions
- execute either 6 or 13 or these
- optimised code
- 9 instructions
- execute only 5 or 8 of these
- constant folding can be carried out on the fly
16Code Optimisation
- deferred storage
- holding values in the register rather than in the
store - reduces the number of store accesses
- LOADr rx, ry requires single store accesses
- LOAD rx, y requires extra access to read y
- RHS of assignment statement which assigns value
to variable is evaluated normally - but STORE instruction is not generated until as
late as possible - may be possible to reduce store access and
eliminate some instructions
17Code Optimisation
- deferred storage
- marks the symbol table as well
- to show that its value is currently held in
register rather than in its allocated store
location - makes use of global allocation of values to
registers - global register allocation
- difficulty in implementing mechanism of register
allocation - deallocation
- more candidate values than available registers
18Code Optimisation
- no easy solution to deallocation problem
- dumping should be determined by future history
of code - may not matter on a multiple register machine as
values can be loaded with one instruction - in general, when global register allocation is
used - descriptor for every variable with value in
register must state the fact - translator must keep table of registers in use
- to be used when deallocating registers
- for generating all deferred STORE instructions
19Code Optimisation
- Redundant Code elimination
- expression tree contains two identical nodes
- may be worthwhile to evaluate the node once and
use the value twice during expression evaluation - SA keeps a table of nodes and looks up each node
as table is created - can recognise common nodes
- each node now contains a count to record number
of references to that node - can use the same technique to recognise nodes
which reoccur in the program as a whole
20Code Optimisation
- can then evaluate once and eliminate redundant
evaluation code - must be careful when expression contains function
calls or accesses data structures - without knowing the side effects of either, the
code should not be considered suitable for
evaluation - very effective when combined with global register
allocation - particularly for avoiding recalculation of
addresses of array elements - can be an expensive
operation
21Code Optimisation
- Peephole Optimisation
- after all optimisations, object program is linear
sequence of instructions - attempt to detect obvious local inefficiencies
- last resort after all other methods
- Loops and Code Motion
- easily identified as parts of code executed many
times - to optimise a loop can save a lot of time
- optimising loop within loop has same effect
- optimising nested loops always starts at the
inner loop
22Code Optimisation
- first step - loop invariant code
- remove code with calculations of quantities that
dont alter during execution - detection of loop invariance depends on essential
order - if fragment has no predecessors in the essential
order and would have no predecessors if placed at
the end of the loop - Strength reduction
- often profitable to replace strong operators with
weaker ones which are cheaper
23Code Optimisation
- x2 xx, x2 xx
- can reduce the number of register accesses
- hardware design and optimisation
- some advances make optimisation more or less
necessary - very small microprocessors may require very
compact code - may be better to use interpreter using very
compact representation of source
24Code Optimisation
- very large machines require that optimisation
takes account of peculiarities of the machine - CRAY-1 - vector processing machine, requires
compiler to generate vector processing
instructions - some machines have special instructions which
reduce the need for optimisation - have instructions to calculate address of array
elements - take over global register allocation by including
associatively addressed naming store or cache
memory
25Code Optimisation
- increases the efficiency of the object program
- also increases cost of development
- by increasing compilation time
- by imposing restriction on the behaviour of
object programs which cant be checked by the
compiler - no compiler could check all conceivable
optimisations - use simple techniques to achieve powerful
optimisation at a fraction of the cost
26Code Optimisation
- discovers essential order of computation within
and between basic blocks - reorders and merges code fragments within the
blocks - moves code between the blocks
- can increase the number of computations done at
compile time - can reduce the number of store accesses required
- increase the speed of execution by moving code