Scalar Optimizations - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Scalar Optimizations

Description:

Loop Invariants & Code Motion. A loop invariant expression is a computation ... An invariant statement s: x := y z can sometimes be moved out of the loop ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 63
Provided by: csg8
Category:

less

Transcript and Presenter's Notes

Title: Scalar Optimizations


1
Scalar Optimizations
  • CS640
  • Lecture 6

2
Roadmap
  • Last two lectures
  • Iterative data flow analysis
  • SSA
  • Today selected optimizations using these
    techniques
  • Constant propagation
  • Copy propagation
  • Code motion for loop invariants
  • Partial redundancy elimination

3
Constant Propagation
  • s x C //for some constant C
  • u x
  • If statement s is the only definition of x
    reaching statement u, we can replace x with
    constant C
  • Save a register
  • Enable constant folding and dead code elimination
  • Can potentially remove conditional branches
  • What if more than one definition reaches u?
  • Data-flow analysis across basic blocks
  • Replacement is iterative
  • One replacement may trigger more opportunities

4
Using Dataflow Equations
  • ConstIn(b) pairs of ltvariable, valuegt that the
    compiler can prove to hold on entry to block b
  • One ltvariable, valuegt for each variable
  • ltx,Cgt?ConstIn(b) variable x is guaranteed to
    take a constant value C on entry to block b
  • ltx,NACgt x is guaranteed not to be a constant
  • ltx,UNDEFgt we know nothing assertive about x
  • ConstIn (b) ? ConstOut(j) for block j ? Pred(b)
  • Meet operation for the pairs
  • ltx,cgt?ltx,cgt ltx, cgt
  • ltx,c1gt?ltx,c2gt ltx, NACgt (c1? c2)
  • ltx,cgt?ltx,NACgt ltx, NACgt
  • ltx,cgt?ltx,UNDEFgt ltx,cgt
  • ltx,UNDEFgt?ltx,NACgt ltx, NACgt

5
Using Dataflow Equations
  • ConstOut(b) pairs of ltvariable, valuegt on exit
    from block b
  • Initialized to be ConstIn(b) and modified by
    processing each statement s in block b in order
  • s is a simple copy x?y, the value of y decides x
  • s is a computation x?y op z, the values of y and
    z decide x
  • ltx,c1 op c2gt ?ConstOut if lty,c1gt and ltz,
    c2gt?ConstOut
  • ltx,NACgt ?ConstOut if either lty,NACgt or ltz,
    NACgt?ConstOut
  • ltx,UNDEFgt ?ConstOut otherwise
  • s is a function call or assignment via a pointer
    ltx,NACgt ?ConstOut
  • Optimization opportunity exists only for x s.t.
    ltx,Cgt ?ConstIn(b) for some constant C

6
Example
ltX,UNDEFgt, ltY,UNDEFgt,
X2 Y3
X3 Y2
ltX,3gt,ltY,2gt,
ltX,2gt,ltY,3gt,
ltX,NACgt,ltY,NACgt,
ZXY
ltX,NACgt,ltY,NACgt,ltZ,NACgt,
7
Constant Propagation w/ SSA
  • For statements xi C, for some constant C,
    replace all xi with C
  • For xi f(C,C,...,C), for some constant C,
    replace statement with xi C
  • Iterate

8
Example SSA
a1 3 d1 2
a 3 d 2
d3 f(d2,d1) a3 f(a2,a1) f1 a3 d3 g1
5 a2 g1 d3 f1 lt g1
f a d g 5 a g d f lt g
T
F
F
T
f2 g1 1
g1 lt a2
f g 1
g lt a
F
T
T
F
f3 f(f2,f1) d2 2
d 2
9
Example SSA
a1 3 d1 2
d3 f(d2,d1) a3 f(a2,a1) f1 a3 d3 g1
5 a2 g1 d3 f1 lt g1
F
T
f2 g1 1
g1 lt a2
F
T
f3 f(f2,f1) d2 2
10
Example SSA
This may continue for a few steps ...
11
Scalar Optimizations
  • Constant propagation
  • Copy propagation
  • Code motion for loop invariants
  • Partial redundancy elimination

12
Copy Propagation
b a c 4b c gt b
d b 2
e a b
  • Idea use v for u wherever possible after the
    copy statement uv
  • Benefits
  • Can create dead code
  • Can enable algebraic simplifications

13
Using Dataflow Analysis
  • Finding copies in blocks can be represented by a
    dataflow analysis framework similar to the one
    for constant propagation
  • A pair ltu,vgt indicates that value is copied from
    v to u
  • Data flow direction?
  • Forward analysis
  • Meet operator?
  • CopyIn(b) n CopyOut(j) for every predecessor j
    of b
  • Transfer function?
  • CopyOut(b) is computed from CopyIn(b) by
    processing each operations in b
  • Similar to constant propagation

14
Example 1
CopyIn(b) CopyOut(b)

b a c 4b c gt b
b a c 4a c gt a
ltb,agt
ltb,agt
d b 2
d a 2
ltb,agt
ltb,agt
e a b
e a a
ltb,agt
15
Example 2
CopyIn(b) CopyOut(b)

c a b d c e d d
ltd,cgt
ltd,cgt
f a c g e a g d a lt c
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
f d g f gt a
h g 1
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
b g a h lt f
c 2
ltg,egt
ltd,cgt,ltg,egt
ltg,egt
16
Example 2
CopyIn(b) CopyOut(b)

c a b d c e c c
ltd,cgt
ltd,cgt
f a c g e a e c a lt c
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
f c e f gt a
h e 1
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
b e a h lt f
c 2
ltg,egt
ltd,cgt,ltg,egt
ltg,egt
17
Scalar Optimizations
  • Constant propagation
  • Copy propagation
  • Code motion for loop invariants
  • Partial redundancy elimination

18
Loop Invariants Code Motion
  • A loop invariant expression is a computation
    whose value does not change as long as control
    stays in the loop
  • Code motion is the optimization that finds loop
    invariants and moves them out of the loop
  • while (i lt limit - 2)
  • ?
  • t limit - 2
  • while (i lt t)

19
Part 1 Detecting Loop Invariants
  • Mark invariant all statements whose operands
    either are constants or have all reaching
    definitions outside the loop
  • How to know this?
  • Iterate until there are no more invariants to
    mark
  • Iteratively marking all statements whose operands
    either are constants, have all reaching
    definitions outside the loop or have only
    invariant reaching definitions

20
Loop Invariants
i 1
i lt 100
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
t
f
t1 n 2 k i t1 j i
f
j lt 100
i i 1
t
t2 100n t3 10 k t4 t2 t3 t5 t4
j j j 1
21
Loop Invariants SSA
i1 1
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
i2 f(i1,i3) i2 lt 100
f
t
t10 n1 2 k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
Outer loop
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
Inner loop
22
Part 2 Code Motion
  • An invariant statement s x y z can
    sometimes be moved out of the loop
  • Code can be moved just before the header
  • Will dominate the whole loop after code motion
  • Three conditions (following slides)

invariant
loop header
loop header
loop body
loop body
23
Code Motion
  • Condition 1 To move invariant tx op y, either
    the block that containing this invariant must
    dominate all loop exits, or t must be not
    live-out of any loop exit

x 1
u lt v
x 2 u u 1
Invariant
v v 1 v lt 20
jx
Violation of Condition 1
24
Code Motion
  • Condition 2 To move invariant tx op y, it must
    be the only definition of t in the loop

x 1
x 3 u lt v
Invariant
x 2 u u 1
v v 1 v lt 20
jx
Violation of Condition 2
25
Code Motion
  • Condition 3 To move invariant tx op y, no use
    of t in the loop is reached by any other
    definition of t

x 1
u lt v
k x u u 1
x 2 v v 1 v lt 20
Invariant
Violation of Condition 3
26
Code Motion Example
i1 1
i2 f(i1,i3) i2 lt 100
f
t
t10 n1 2 k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
Assuming t1 not live outside the loop-nest, this
stmt is invariant and all three conditions met
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
27
Code Motion Example
i1 1 t10 n1 2
i2 f(i1,i3) i2 lt 100
f
t
k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
invariant and all conditions met, assuming t2,
t3, t4 not live outside the loop-nest
28
Code Motion Example
invariant and all conditions met
29
Scalar Optimizations
  • Constant propagation
  • Copy propagation
  • Code motion for loop invariants
  • Partial redundancy elimination

30
Redundant Expressions
  • Expression E is redundant at point p if
  • On every path to p, E has been evaluated before
    reaching p and none of the constituent values of
    E has been redefined after the evaluation
  • Expression E is partially redundant at point p if
  • E is redundant along some but not all paths to p
  • To optimize insert code to make it fully
    redundant

31
Loop Invariants
  • Loop invariant expressions are partially
    redundant
  • Available for all loop iterations except for the
    very first one
  • Code motion works by making the expression fully
    redundant

a b c
a b c
a b c
redundant
partially redundant
32
Partial Redundancy Elimination
  • Uses standard data-flow techniques to figure out
    where to move the code
  • Subsumes classical global common sub-expression
    elimination and code motion of loop invariants
  • Used by many optimizing compilers
  • Traditionally applied to lexically equivalent
    expressions
  • With SSA support, applied to values as well

33
Partial Redundancy Elimination
  • May add a block to deal with critical edges
  • Critical edge edge leading from a block with
    more than one successor to a block with more than
    one predecessor

tde at
ade
tde
cde
ct
34
Partial Redundancy Elimination
  • Code duplication to deal with redundancy

ade
ade t a
B4
B4
B4
cde
cde
ct
Can we find a way to deal with redundancy in
general??
35
Lazy Code Motion
  • Redundancy common expressions, loop invariant
    expressions, partially redundant expressions
  • Desirable Properties
  • All redundant computations of expressions that
    can be eliminated with code duplication are
    eliminated.
  • The optimized program does not perform any
    computation that is not in the original program
    execution
  • Expressions are computed at the latest possible
    time.

36
Lazy Code Motion
  • Solve four data-flow problems that reveal the
    limit of code motion
  • AVAIL available expressions
  • ANTI anticipated expression
  • EARLIEST earliest placement for expressions
  • LATER expressions that can be postponed
  • Compute INSERT and DELETE sets based on the
    data-flow solutions for each basic block
  • They define how to move expressions between basic
    blocks

37
Lazy Code Motion
z a x gt 3
B1
Can we make this better?
B3
z x y y lt 5
B2
z lt 7
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
38
Lazy Code Motion
z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
B5
B4
B7
B6
b x y
B8
Placing computation at these points ensure our
conditions
B9
c x y
Exit
39
Local Information
  • For each block b, compute the local sets
  • DEExpr an expression is downward-exposed
    (locally generated) if it is computed in b and
    its operands are not modified after its last
    computation
  • UEExpr an expression is upward-exposed if it is
    computed in b and its operands are not modified
    before its first computation
  • NotKilled an expression is not killed if none of
    its operands is modified in b
  • f b d
  • a b c
  • d a e

DEExpr a e, b c UEExpr b d, b c
NotKilled b c
40
Local Information
  • What do they imply?
  • DEExpre ? DEExpr(b) ? evaluating e at the end of
    b produces the same result as evaluating it at
    the original position in b
  • UEExpre ? UEExpr(b) ? evaluating e at the entry
    of b produces the same result as evaluating it at
    the original position in b
  • NotKilled e ? NotKilled(b) ? evaluating e at
    either the start or end of b produces the same
    result as evaluating it at the original position
  • f b d
  • a b c
  • d a e

DEExpr a e, b c UEExpr b d, b c
NotKilled b c
41
Global Information
  • Availability
  • AvailIn(n0) Ø
  • AvailIn(b)?x?pred(b)AvailOut(x), b ? n0
  • AvailOut(b)DEExpr(b)?(AvailIn(b)? NotKilled(b))
  • Initialize AvailIn and AvailOut to be the set of
    expressions for all blocks except for the entry
    block n0
  • Interpreting Avail sets
  • e ? AvailOut(b) ? evaluating e at end of b
    produces the same value for e as its most recent
    evaluation, no matter whether the most recent one
    is inside b or not
  • AvailOut tells the compiler how far forward e can
    move

42
Global Information
  • Anticipability
  • Expression e is anticipated at a point p if e is
    certain to be evaluated along all computation
    path leaving p before any re-computation of es
    operands
  • AntOut(nf) Ø
  • AntOut(b)?x?succ(b)AntIn(x), b ? nf
  • AntIn(b)UEExpr(b)?(AntOut(b)?NotKilled(b))
  • Initialize AntOut to be the set of expressions
    for all blocks except for the exit block nf
  • Interpreting Ant sets
  • e ? AntIn(b) ? evaluating e at start of b
    produces the same value for e as evaluating it at
    the original position(later than start of b) with
    no additional overhead
  • AntIn tells the compiler how far backward e can
    move

43
Example
z a x gt 3
B1
B3
z x y y lt 5
B2
z lt 7
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
44
Example Avail
  • AvailIn(b)?x?pred(b)AvailOut(x)
  • AvailOut(b)DEExpr(b)?(AvailIn(b)?NotKilled(b))


z a x gt 3
B1



B3
z x y y lt 5
B2
z lt 7

xy

xy
B5
B4

xy
xy

B7
B6
b x y

xy

B8


B9
c x y
xy

Exit
45
Example Ant
  • AntOut(b)?x?succ(b)AntIn(x)
  • AntIn(b)UEExpr(b)?(AntOut(b)?NotKilled(b))

z a x gt 3
B1

xy

B3
z x y y lt 5
B2
z lt 7

xy
xy
xy
B5
B4

xy
xy
xy
B7
B6
b x y
xy


B8
xy
xy
B9
c x y


Exit

46
Example Avail and Ant


Interesting spots Anticipated but not available
z a x gt 3
B1



xy


B3
z x y y lt 5
B2
z lt 7
xy



xy
xy
xy
xy
B5
B4

xy

xy
xy
xy
xy

B7
B6
b x y

xy

xy


B8

xy

xy
B9
c x y
xy



Exit
47
Example EARLIEST
  • EARLIEST(i,j) AntIn(j) ? AvailOut(i)??(NotKilled
    (i) ? AntOut(i))

z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
48
Placing Expressions
  • Earliest placement
  • For an edge lti,jgt in CFG, an expression e is in
    Earliest (i,j) if and only if the computation can
    legally move to lti,jgt and cannot move to any
    earlier edge
  • EARLIEST(i,j) AntIn(j) ? AvailOut(i)?
  • ??(NotKilled(i) ? AntOut(i))
  • e ? AntIn(j) we can move e to the start of block
    j without generating un-necessary computation
  • e ? AvailOut(i) no previous computation of e is
    available from the exit of i if such an e
    exists, it would make the computation on lti,jgt
    redundant
  • e ? (Killed(i) ?AntOut(i)) we cannot move e
    further upward
  • e ? Killed(i) e cannot be moved to an edge ltx,igt
    with the same value
  • e ? AntOut(i) there is another path starting
    with edge lti,xgt along which e is not evaluated
    with the same value

49
Placing Expressions
  • Earliest placement
  • For an edge lti,jgt in CFG, an expression e is in
    Earliest (i,j) if and only if the computation can
    legally move to lti,jgt and cannot move to any
    earlier edge
  • EARLIEST(i,j) AntIn(j) ? AvailOut(i)?
  • ??(NotKilled(i) ? AntOut(i))
  • EARLIEST(n0,j) AntIn(j) ? AvailOut(n0)?
  • We can never move e before entry point n0 the
    last term is ignored
  • n0 must be the dummy entry point

50
Postpone Evaluations
  • We want to delay the evaluation of expressions as
    long as possible
  • Motivation save register usage
  • There is a limit to this delay
  • Not past the use of the expression
  • Not so far that we end up computing an expression
    that is already evaluated

51
Placing Expressions
  • Later (than earliest) placement
  • An expression e is in LaterIn(k) if evaluation of
    e can be moved through entry to k without losing
    any benefit
  • e ? LaterIn(k) if and only if every path that
    reaches k includes an edge ltp,qgt s.t. e?
    EARLIEST(p,q), and the path from q to k neither
    kills e nor uses e
  • LaterIn(j) ? i?pred(j)LATER(i,j), j?n0?
  • LaterIn(n0) Ø
  • LATER(i,j) (EARLIEST(i,j) ? LaterIn(i))?
    UEExpr(i), i?pred(j) ?
  • An expression e is in LATER(i,j) if evaluation of
    e can be moved (postponed) to CFG edge lti,jgt
  • e ? LATER(i,j) if lti,jgt is its earliest
    placement, or it can be moved to the entry of i
    and there is no evaluation(use) of e in block i

52
Example LATER
  • LaterIn(j) ? i?pred(j)LATER(i,j), j?n0?
  • LATER(i,j) (EARLIEST(i,j) ? LaterIn(i))?
    UEExpr(i), i?pred(j)

?
z a x gt 3
B1
xy
B3
xy
z x y y lt 5
B2
z lt 7
xy
xy
B5
B4
xy
B7
B6
b x y
B8
B9
c x y
Exit
53
Rewriting Code
  • Insert set for each CFG edge
  • The computations that LCM should insert on that
    edge
  • Insert(i,j) LATER(i,j) ? LaterIn(j) ?
  • e ? Insert(i,j) means an evaluation of e should
    be added between block i and block j
  • Three possible places to add
  • Delete set for each block
  • The computations that LCM should delete from that
    block
  • Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
  • The first computation in i is redundant

54
Example INSERT DELETE
  • Insert(i,j) LATER(i,j) ? LaterIn(j) ?
  • Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?

?
z a x gt 3
B1
B3
xy
z x y y lt 5
B2
z lt 7
xy
B5
B4
xy
B7
B6
b x y
B8
LATER
B9
c x y
Exit
55
Rewriting Code
  • Insert set for each CFG edge
  • The computations that LCM should insert on that
    edge
  • Insert(i,j) LATER(i,j) ? LaterIn(j) ?
  • If i has only one successor, insert computations
    at the end of i
  • If j has only one predecessor, insert
    computations at the entry of j
  • Otherwise, split the edge and insert the
    computations in a new block between i and j
  • Delete set for each block
  • The computations that LCM should delete from that
    block
  • Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
  • The first computation in i is redundant remove it

56
Inserting Code
  • Evaluation placement for x ? INSERT(i,j)
  • Three cases
  • succs(i) 1 ? insert at end of i
  • succs(i) gt 1, but preds(j) 1? insert at
    start of j
  • succs(i) gt 1, preds(j) gt 1 ? create new
    block in lti,jgt for x

57
Example INSERT DELETE
  • Insert(i,j) LATER(i,j) ? LaterIn(j) ?
  • Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?

?
INSERT
z a x gt 3
B1
B3
z x y y lt 5
B2
z lt 7
B5
B4
xy
B7
B6
b x y
B8
B9
c x y
Exit
58
Example Rewriting
z a x gt 3
B1
t1 x y z t1 y lt 5
B3
B2
z lt 7
If put a t1xy assignment in B5, xy is fully
redundant in B9
B5
B4
t1 x y
B7
B6
b t1
B8
B9
c t1
Exit
59
Lazy Code Motion
  • Step1 identify the limit of code motion
  • Available expressions
  • Anticipated expressions
  • Step 2 move expression evaluation up
  • Later ones may become redundant
  • Step 3 move expression evaluation down
  • Delay evaluation to minimize register lifetime
  • Step 4 rewrite code

60
Lazy Code Motion
  • A powerful algorithm
  • Finds different forms of redundancy in a unified
    framework
  • Subsumes loop invariant code motion and common
    expression elimination
  • Data-flow analysis
  • Composes several simple data-flow analyses to
    produce a powerful result

61
Summary
  • Scalar optimizations
  • Constant propagation
  • Copy propagation
  • Code motion for loop invariants
  • Partial redundancy elimination
  • Advanced topics
  • EAC Ch10.4 combining multiple optimizations and
    choosing an order other objectives
  • Dragon Ch9.7-9.8 region-based data-flow analysis

62
Next Lecture (after the midterm)
  • Topic register allocation
  • References
  • Dragon Ch8.8
  • EAC Ch13
Write a Comment
User Comments (0)
About PowerShow.com