Title: Peephole Optimization
1Peephole Optimization
- Final pass over generated code
- examine a few consecutive instructions 2 to 4
- See if an obvious replacement is possible
store/load pairs - MOV eax gt memaMOV mema gt eax
- Can eliminate the second instruction without
needing any global knowledge of mema - Use algebraic identities
- Special-case individual instructions
2Algebraic identities
- worth recognizing single instructions with a
constant operand - A 2 A A
- A 1 A
- A 0 0
- A / 1 A
- More delicate with floating-point
3Is this ever helpful?
- Why would anyone write X 1?
- Why bother to correct such obvious junk code?
- In fact one might write
- define MAX_TASKS 1...a b MAX_TASKS
- Also, seemingly redundant code can be produced by
other optimizations. This is an important effect.
4Replace Multiply by Shift
- A A 4
- Can be replaced by 2-bit left shift
(signed/unsigned) - But must worry about overflow if language does
- A A / 4
- If unsigned, can replace with shift right
- But shift right arithmetic is a well-known
problem - Language may allow it anyway (traditional C)
5Addition chains for multiplication
- If multiply is very slow (or on a machine with no
multiply instruction like the original SPARC),
decomposing a constant operand into sum of powers
of two can be effective - X 125 x 128 - x4 x
- two shifts, one subtract and one add, which may
be faster than one multiply - Note similarity with efficient exponentiation
method
6The Right Shift problem
- Arithmetic Right shift
- shift right and use sign bit to fill most
significant bits - -5 111111...1111111011
- SAR 111111...1111111101
- which is -3, not -2
- in most languages -5/2 -2
- Prior to C99, implementations were allowed to
truncate towards or away from zero if either
operand was negative
7Folding Jumps to Jumps
- A jump to an unconditional jump can copy the
target address - JNE lab1 ...lab1 JMP lab2
- Can be replaced by JNE lab2
- As a result, lab1 may become dead (unreferenced)
8Jump to Return
- A jump to a return can be replaced by a return
- JMP lab1 ... lab1 RET
- Can be replaced by RET
- lab1 may become dead code
9Tail Recursion Elimination
- A subprogram is tail-recursive if the last
computation is a call to itself - function last (lis list_type) return
lis_type is - begin
- if lis.next null then return
lis - else return last (lis.next)
- end
- Recursive call can be replaced with
- lis lis.next
- goto start -- added
label
10Advantages of tail recursion elimination
- saves time an assignment and jump is faster
than a call with one parameter - saves stack space converts linear stack usage to
constant usage. - In languages with no loops, this may be a
required optimization specified in Scheme
standard.
11Tail-recursion elimination at the Instruction
Level
- Consider the sequence on the x86
- CALL func RET
- CALL pushes return point on stack, RET in body
of func removes it, RET in caller returns - Can generate instead JMP func
- Now RET in func returns to original caller,
because single return address on stack
12Peephole optimization in the REALIA COBOL compiler
- Full compiler for Standard COBOL, targeted to the
IBM PC. - Now distributed by Computer Associates
- Runs in 150K bytes, but must be able to handle
very large programs that run on mainframes - No global optimization possible multiple linear
passes over code, no global data structures, no
flow graph. - Multiple peephole optimizations, compiler
iterates until code is stable. Each pass scan
code backwards to minimize address recomputations
13Typical COBOL code control structures and
perform blocks.
- Process-Balance. if Balance is negative then
perform Send-Bill else perform
Record-Credit end-if.Send-Bill.
...Record-Credit. ...
14Simple Assembly perform equivalent to call
- Pb cmp balance, 0 jnl L1
call Sb jmp L2L1 call
RcL2 retSb ... retRc
... ret
15 Fold jump to return statement
- Pb cmp balance, 0 jnl L1
call Sb jmp L2 --
jump to return - L1 call Rc L2 retSb
... retRc ... ret
16Corresponding Assembly
- Pb cmp balance, 0 jnl L1
-- jump to unconditional jump
call Sb
ret -- folded L1
jmp Rc -- will become useless
L2 ret Sb
... retRc ... ret
17 code following a jump is unreachable
- Pb cmp balance, 0 jnl Rc --
folded jmp Sb ret
-- unreachable L1 jmp
Rc -- unreachable Sb ...
retRc ... ret
18Jump to following instruction is a noop
- Pb cmp balance, 0 jnl Rc
jmp Sb -- jump to next instruction
Sb ... retRc ...
ret
19Final code
- Pb cmp balance, 0 jnl RcSb
... retRc ... ret - Final code as efficient as inlining.
- All transformations are local. Each optimization
may yield further optimization opportunities - Iterate till no further change
20Arcane tricks
- Consider typical maximum computation
- if A gt B then C Aelse C Bend if
- For simplicity assume all unsigned, and all in
registers
21Eliminating max jump on x86
- Simple-minded asm code
- CMP A, B JNAE L1
MOV AgtC JMP L2L1 MOV BgtCL2 - One jump in either case
22Computing max without jumps on X86
- Architecture-specific trick use subtract with
borrow instruction and carry flag - CMP A, B CF1 if B gt A, CF 0
if A gt B SBB eax,eax all 1's if B gt A,
all 0's if A gt B MOV eax, C NOT C
all 0's if B gt A, all 1's if A gt B
AND Bgteax B if BgtA, 0 if AgtB AND
AgtC 0 if B gtA, A if AgtB OR
eaxgtC B if BgtA, A if AgtB - More instructions, but NO JUMPS
- Supercompiler exhaustive search of instruction
patterns to uncover similar tricks