Peephole Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Peephole Optimization

Description:

Can eliminate the second instruction without needing any global knowledge of mema ... L1: jmp Rc -- will become useless. L2: ret. Sb: ... ret. Rc: ... ret ... – PowerPoint PPT presentation

Number of Views:1988
Avg rating:3.0/5.0
Slides: 23
Provided by: robert948
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Peephole Optimization


1
Peephole Optimization
  • Final pass over generated code
  • examine a few consecutive instructions 2 to 4
  • See if an obvious replacement is possible
    store/load pairs
  • MOV eax gt memaMOV mema gt eax
  • Can eliminate the second instruction without
    needing any global knowledge of mema
  • Use algebraic identities
  • Special-case individual instructions

2
Algebraic identities
  • worth recognizing single instructions with a
    constant operand
  • A 2 A A
  • A 1 A
  • A 0 0
  • A / 1 A
  • More delicate with floating-point

3
Is this ever helpful?
  • Why would anyone write X 1?
  • Why bother to correct such obvious junk code?
  • In fact one might write
  • define MAX_TASKS 1...a b MAX_TASKS
  • Also, seemingly redundant code can be produced by
    other optimizations. This is an important effect.

4
Replace Multiply by Shift
  • A A 4
  • Can be replaced by 2-bit left shift
    (signed/unsigned)
  • But must worry about overflow if language does
  • A A / 4
  • If unsigned, can replace with shift right
  • But shift right arithmetic is a well-known
    problem
  • Language may allow it anyway (traditional C)

5
Addition chains for multiplication
  • If multiply is very slow (or on a machine with no
    multiply instruction like the original SPARC),
    decomposing a constant operand into sum of powers
    of two can be effective
  • X 125 x 128 - x4 x
  • two shifts, one subtract and one add, which may
    be faster than one multiply
  • Note similarity with efficient exponentiation
    method

6
The Right Shift problem
  • Arithmetic Right shift
  • shift right and use sign bit to fill most
    significant bits
  • -5 111111...1111111011
  • SAR 111111...1111111101
  • which is -3, not -2
  • in most languages -5/2 -2
  • Prior to C99, implementations were allowed to
    truncate towards or away from zero if either
    operand was negative

7
Folding Jumps to Jumps
  • A jump to an unconditional jump can copy the
    target address
  • JNE lab1 ...lab1 JMP lab2
  • Can be replaced by JNE lab2
  • As a result, lab1 may become dead (unreferenced)

8
Jump to Return
  • A jump to a return can be replaced by a return
  • JMP lab1 ... lab1 RET
  • Can be replaced by RET
  • lab1 may become dead code

9
Tail Recursion Elimination
  • A subprogram is tail-recursive if the last
    computation is a call to itself
  • function last (lis list_type) return
    lis_type is
  • begin
  • if lis.next null then return
    lis
  • else return last (lis.next)
  • end
  • Recursive call can be replaced with
  • lis lis.next
  • goto start -- added
    label

10
Advantages of tail recursion elimination
  • saves time an assignment and jump is faster
    than a call with one parameter
  • saves stack space converts linear stack usage to
    constant usage.
  • In languages with no loops, this may be a
    required optimization specified in Scheme
    standard.

11
Tail-recursion elimination at the Instruction
Level
  • Consider the sequence on the x86
  • CALL func RET
  • CALL pushes return point on stack, RET in body
    of func removes it, RET in caller returns
  • Can generate instead JMP func
  • Now RET in func returns to original caller,
    because single return address on stack

12
Peephole optimization in the REALIA COBOL compiler
  • Full compiler for Standard COBOL, targeted to the
    IBM PC.
  • Now distributed by Computer Associates
  • Runs in 150K bytes, but must be able to handle
    very large programs that run on mainframes
  • No global optimization possible multiple linear
    passes over code, no global data structures, no
    flow graph.
  • Multiple peephole optimizations, compiler
    iterates until code is stable. Each pass scan
    code backwards to minimize address recomputations

13
Typical COBOL code control structures and
perform blocks.
  • Process-Balance. if Balance is negative then
    perform Send-Bill else perform
    Record-Credit end-if.Send-Bill.
    ...Record-Credit. ...

14
Simple Assembly perform equivalent to call
  • Pb cmp balance, 0 jnl L1
    call Sb jmp L2L1 call
    RcL2 retSb ... retRc
    ... ret

15
Fold jump to return statement
  • Pb cmp balance, 0 jnl L1
    call Sb jmp L2 --
    jump to return
  • L1 call Rc L2 retSb
    ... retRc ... ret

16
Corresponding Assembly
  • Pb cmp balance, 0 jnl L1
    -- jump to unconditional jump
    call Sb
    ret -- folded L1
    jmp Rc -- will become useless
    L2 ret Sb
    ... retRc ... ret

17
code following a jump is unreachable
  • Pb cmp balance, 0 jnl Rc --
    folded jmp Sb ret
    -- unreachable L1 jmp
    Rc -- unreachable Sb ...
    retRc ... ret

18
Jump to following instruction is a noop
  • Pb cmp balance, 0 jnl Rc
    jmp Sb -- jump to next instruction
    Sb ... retRc ...
    ret

19
Final code
  • Pb cmp balance, 0 jnl RcSb
    ... retRc ... ret
  • Final code as efficient as inlining.
  • All transformations are local. Each optimization
    may yield further optimization opportunities
  • Iterate till no further change

20
Arcane tricks
  • Consider typical maximum computation
  • if A gt B then C Aelse C Bend if
  • For simplicity assume all unsigned, and all in
    registers

21
Eliminating max jump on x86
  • Simple-minded asm code
  • CMP A, B JNAE L1
    MOV AgtC JMP L2L1 MOV BgtCL2
  • One jump in either case

22
Computing max without jumps on X86
  • Architecture-specific trick use subtract with
    borrow instruction and carry flag
  • CMP A, B CF1 if B gt A, CF 0
    if A gt B SBB eax,eax all 1's if B gt A,
    all 0's if A gt B MOV eax, C NOT C
    all 0's if B gt A, all 1's if A gt B
    AND Bgteax B if BgtA, 0 if AgtB AND
    AgtC 0 if B gtA, A if AgtB OR
    eaxgtC B if BgtA, A if AgtB
  • More instructions, but NO JUMPS
  • Supercompiler exhaustive search of instruction
    patterns to uncover similar tricks
Write a Comment
User Comments (0)
About PowerShow.com