Peephole Optimization - PowerPoint PPT Presentation

About This Presentation

Title:

Peephole Optimization

Description:

Can eliminate the second instruction without needing any global knowledge of mema ... L1: jmp Rc -- will become useless. L2: ret. Sb: ... ret. Rc: ... ret ... – PowerPoint PPT presentation

Number of Views:1993

Avg rating:3.0/5.0

Slides: 23

Provided by: robert948

Learn more at: https://cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Peephole Optimization

1
Peephole Optimization

Final pass over generated code
examine a few consecutive instructions 2 to 4
See if an obvious replacement is possible
store/load pairs
MOV eax gt memaMOV mema gt eax
Can eliminate the second instruction without
needing any global knowledge of mema
Use algebraic identities
Special-case individual instructions

2
Algebraic identities

worth recognizing single instructions with a
constant operand
A 2 A A
A 1 A
A 0 0
A / 1 A
More delicate with floating-point

3
Is this ever helpful?

Why would anyone write X 1?
Why bother to correct such obvious junk code?
In fact one might write
define MAX_TASKS 1...a b MAX_TASKS
Also, seemingly redundant code can be produced by
other optimizations. This is an important effect.

4
Replace Multiply by Shift

A A 4
Can be replaced by 2-bit left shift
(signed/unsigned)
But must worry about overflow if language does
A A / 4
If unsigned, can replace with shift right
But shift right arithmetic is a well-known
problem
Language may allow it anyway (traditional C)

5
Addition chains for multiplication

If multiply is very slow (or on a machine with no
multiply instruction like the original SPARC),
decomposing a constant operand into sum of powers
of two can be effective
X 125 x 128 - x4 x
two shifts, one subtract and one add, which may
be faster than one multiply
Note similarity with efficient exponentiation
method

6
The Right Shift problem

Arithmetic Right shift
shift right and use sign bit to fill most
significant bits
-5 111111...1111111011
SAR 111111...1111111101
which is -3, not -2
in most languages -5/2 -2
Prior to C99, implementations were allowed to
truncate towards or away from zero if either
operand was negative

7
Folding Jumps to Jumps

A jump to an unconditional jump can copy the
target address
JNE lab1 ...lab1 JMP lab2
Can be replaced by JNE lab2
As a result, lab1 may become dead (unreferenced)

8
Jump to Return

A jump to a return can be replaced by a return
JMP lab1 ... lab1 RET
Can be replaced by RET
lab1 may become dead code

9
Tail Recursion Elimination

A subprogram is tail-recursive if the last
computation is a call to itself
function last (lis list_type) return
lis_type is
begin
if lis.next null then return
lis
else return last (lis.next)
end
Recursive call can be replaced with
lis lis.next
goto start -- added
label

10
Advantages of tail recursion elimination

saves time an assignment and jump is faster
than a call with one parameter
saves stack space converts linear stack usage to
constant usage.
In languages with no loops, this may be a
required optimization specified in Scheme
standard.

11
Tail-recursion elimination at the Instruction
Level

Consider the sequence on the x86
CALL func RET
CALL pushes return point on stack, RET in body
of func removes it, RET in caller returns
Can generate instead JMP func
Now RET in func returns to original caller,
because single return address on stack

12
Peephole optimization in the REALIA COBOL compiler

Full compiler for Standard COBOL, targeted to the
IBM PC.
Now distributed by Computer Associates
Runs in 150K bytes, but must be able to handle
very large programs that run on mainframes
No global optimization possible multiple linear
passes over code, no global data structures, no
flow graph.
Multiple peephole optimizations, compiler
iterates until code is stable. Each pass scan
code backwards to minimize address recomputations

13
Typical COBOL code control structures and
perform blocks.

Process-Balance. if Balance is negative then
perform Send-Bill else perform
Record-Credit end-if.Send-Bill.
...Record-Credit. ...

14
Simple Assembly perform equivalent to call

Pb cmp balance, 0 jnl L1
call Sb jmp L2L1 call
RcL2 retSb ... retRc
... ret

15
Fold jump to return statement

Pb cmp balance, 0 jnl L1
call Sb jmp L2 --
jump to return
L1 call Rc L2 retSb
... retRc ... ret

16
Corresponding Assembly

Pb cmp balance, 0 jnl L1
-- jump to unconditional jump
call Sb
ret -- folded L1
jmp Rc -- will become useless
L2 ret Sb
... retRc ... ret

17
code following a jump is unreachable

Pb cmp balance, 0 jnl Rc --
folded jmp Sb ret
-- unreachable L1 jmp
Rc -- unreachable Sb ...
retRc ... ret

18
Jump to following instruction is a noop

Pb cmp balance, 0 jnl Rc
jmp Sb -- jump to next instruction
Sb ... retRc ...
ret

19
Final code

Pb cmp balance, 0 jnl RcSb
... retRc ... ret
Final code as efficient as inlining.
All transformations are local. Each optimization
may yield further optimization opportunities
Iterate till no further change

20
Arcane tricks

Consider typical maximum computation
if A gt B then C Aelse C Bend if
For simplicity assume all unsigned, and all in
registers

21
Eliminating max jump on x86

Simple-minded asm code
CMP A, B JNAE L1
MOV AgtC JMP L2L1 MOV BgtCL2
One jump in either case

22
Computing max without jumps on X86

Architecture-specific trick use subtract with
borrow instruction and carry flag
CMP A, B CF1 if B gt A, CF 0
if A gt B SBB eax,eax all 1's if B gt A,
all 0's if A gt B MOV eax, C NOT C
all 0's if B gt A, all 1's if A gt B
AND Bgteax B if BgtA, 0 if AgtB AND
AgtC 0 if B gtA, A if AgtB OR
eaxgtC B if BgtA, A if AgtB
More instructions, but NO JUMPS
Supercompiler exhaustive search of instruction
patterns to uncover similar tricks