Control Flow AnalysisOpti II Loop Detection, Unrolling - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Control Flow AnalysisOpti II Loop Detection, Unrolling

Description:

Control Flow Analysis/Opti II. Loop Detection, Unrolling. EECS 483 Lecture 17 ... Wednesday, November 10, 2004 - 1 ... Unravel first P iterations of a loop ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 32
Provided by: scottm3
Category:

less

Transcript and Presenter's Notes

Title: Control Flow AnalysisOpti II Loop Detection, Unrolling


1
Control Flow Analysis/Opti IILoop Detection,
Unrolling
  • EECS 483 Lecture 17
  • University of Michigan
  • Wednesday, November 10, 2004

2
From Last Time Natural Loops
  • Cycle suitable for optimization
  • Discuss opti later
  • 2 properties
  • Single entry point called the header
  • Header dominates all blocks in the loop
  • Must be one way to iterate the loop (ie at least
    1 path back to the header from within the loop)
    called a backedge
  • Backedge detection
  • Edge, x? y where the target (y) dominates the
    source (x)

3
Backedge Example
BE target dominates source E ? 1 No 1 ? 2
No 2 ? 3 No 2 ? 6 No 3 ? 4 No 3 ? 5
No 4 ? 3 Yes 4 ? 5 No 5 ? 3 Yes 5 ? 6
No 6 ? 2 Yes 6 ? X No
Entry
BB1
dom(1) E,1
BB2
dom(2) E,1,2
BB3
dom(3) E,1,2,3
BB4
dom(4) E,1,2,3,4
BB5
dom(5) E,1,2,3,5
BB6
dom(6) E,1,2,6
Exit
In this example, BE edge from higher BB to
lower BB, not always this easy!
4
Loop Detection
  • Identify all backedges using dominance info
  • Each backedge (x ? y) defines a loop
  • Loop header is the backedge target (y)
  • Loop BB basic blocks that comprise the loop
  • All predecessor blocks of x for which control can
    reach x without going through y are in the loop
  • Merge loops with the same header
  • I.e., a loop with 2 continues
  • LoopBackedge LoopBackedge1 LoopBackedge2
  • LoopBB LoopBB1 LoopBB2
  • Important property
  • Header dominates all LoopBB

5
Loop Detection Example
  • Loop detection 3 steps
  • Identify backedges
  • Compute LoopBB
  • Merge loops withthe same header

Entry
BB1
dom(1) E,1
BB2
dom(2) E,1,2
BB3
dom(3) E,1,2,3
Loop1 defined by 6 ? 2 LoopBB
2,3,4,5,6 Loop2 defined by 4 ? 3 LoopBB
3,4 Loop3 defined by 5 ? 3 LoopBB
3,4,5 Merge loops 2,3 LoopBB 3,4,5
Backedges 4?3, 5?3
BB4
dom(4) E,1,2,3,4
BB5
dom(5) E,1,2,3,5
BB6
dom(6) E,1,2,6
Exit
6
Class Problem
Find the loops What are the header(s)? What are
the backedge(s)?
Entry
BB1
BB2
BB3
BB4
BB5
BB6
BB7
Exit
7
Important Parts of a Loop
  • Header, LoopBB
  • Backedges, BackedgeBB
  • Exitedges, ExitBB
  • For each LoopBB, examine each outgoing edge
  • If the edge is to a BB not in LoopBB, then its an
    exit
  • Preheader (Preloop)
  • New block before the header (falls through to
    header)
  • Whenever you invoke the loop, preheader executed
  • Whenever you iterate the loop, preheader NOT
    executed
  • All edges entering header
  • Backedges no change, All others - retarget to
    preheader
  • Postheader (Postloop) - analogous

8
ExitBB/Preheader Example
Note, preheader for blue loop is contained in
yellow loop
Entry
BB1
Pre1
Entry
BB1
BB2
BB2
Pre2
BB3
BB3
BB4
BB4
BB5
BB5
BB6
Exit BB Blue loop BB6 Yellow loop Exit
BB6
Exit
Exit
9
Characteristics of a Loop
  • Nesting (generally within a procedure scope)
  • Inner loop Loop with no loops contained within
    it
  • Outer loop Loop contained within no other loops
  • Nesting depth
  • depth(outer loop) 1
  • depth depth(parent or containing loop) 1
  • Trip count (average trip count)
  • How many times (on average) does the loop iterate
  • for (I0 Ilt100 I) ? trip count 100
  • Ave trip count weight(header) /
    weight(preheader)

10
Trip Count Calculation Example
Calculate the trip counts for all the loops in
the graph
Entry
BB1
20
BB2
Blue loop w(header) w(BB3)
124060700 2000 w(preheader) w(BB2)
60 ( why not 100??? ) avg trip count
2000/60 33.3 Yellow loop w(header)
w(BB2) 8020 100 w(preheader)
w(BB1) 20 avg trip count 100/20 5
1240
60
BB3
700
900
BB4
1100
40
80
200
BB5
60
BB6
20
Exit
11
Loop Induction Variables
  • Induction variables are variables such that every
    time they changes value, they are
    incremented/decremented by some constant
  • Basic induction variable induction variable
    whose only assignments within a loop are of the
    form j j /- C, where C is a constant
  • Primary induction variable basic induction
    variable that controls the loop execution (for
    i0 ilt100 i), i (virtual register holding i)
    is the primary induction variable
  • Derived induction variable variable that is a
    linear function of a basic induction variable

12
Class Problem
Identify the basic, primary, and
derived inductions variables in this loop.
r1 0 r7 A
r2 r1 4 r4 r7 3 r7 r7 1 r1
load(r2) r3 load(r4) r9 r1 r3 r10 r9 gtgt
4 store (r10, r2) r1 r1 4 blt r1 100 Loop
Loop
13
Reducible Flow Graphs
  • A flow graph is reducible if and only if we can
    partition the edges into 2 disjoint groups often
    called forward and back edges with the following
    properties
  • The forward edges form an acyclic graph in which
    every node can be reached from the Entry
  • The back edges consist only of edges whose
    destinations dominate their sources
  • More simply Take a CFG, remove all the
    backedges (x? y where y dominates x), you should
    have a connected, acyclic graph

14
Irreducible Flow Graph Example
In C/C, its not possible to create an
irreducible flow graph without using gotos
Cyclic graphs that are NOT natural loops
cannot be optimized by the compiler
L1 x x 1 if (x) L2 y y 1 if
(y gt 10) goto L3 else L3 z z 1
if (z gt 0) goto L2
bb1
Non-reducible!
bb2
bb3
15
Loop Unrolling
  • Most renowned control flow opti
  • Replicate the body of a loop N-1 times (giving N
    total copies)
  • Loop unrolled N times or Nx unrolled
  • Enable overlap of operations from different
    iterations
  • Increase potential for ILP (instruction level
    parallelism)
  • 3 variants
  • Unroll multiple of known trip count
  • Unroll with remainder loop
  • While loop unroll

Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
16
Loop Unroll Type 1
Counted loop All parms known
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is 1 Initial
value is 0 Final value is 100 Trip count is 100
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 1 blt r2 100 Loop
r1 MEMr2 1 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 2 blt r2 100 Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
Remove r2 increments from first N-1
iterations and update last increment
Remove branch from first N-1 iterations
17
Loop Unroll Type 2
Counted loop Some parms unknown
tc final initial tc tc / increment rem tc
N fin rem increment
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is ? Initial
value is ? Final value is ? Trip count is ?
RemLoop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 fin RemLoop
r1 MEMr2 X r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 (NX) blt r2 Y Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 Y Loop
Remainder loop executes the leftover iterations
Unrolled loop same as Type 1, and is guaranteed
to execute a multiple of N times
18
Loop Unroll Type 3
Non-counted loop Some parms unknown
Just duplicate the body, none of the loop
branches can be removed. Instead they are
converted into conditional breaks
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 beq r2 0 Exit
pointer chasing, loop var modified in a
strange way, etc.
Can apply this to any loop!
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 MEMr2 0 bne r2 0 Loop Exit
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 bne r2 0 Loop
19
Loop Unroll Summary
  • Goals
  • Reduce number of executed branches inside loop
  • Note Type1/Type2 only
  • Enable the overlapped execution of multiple
    iterations
  • Reorder instructions between iterations
  • Enable dataflow optimization across iterations
  • Type 1 is the most effective
  • All intermediate branches removed, least code
    expansion
  • Only applicable to a small fraction of loops

20
Loop Unroll Summary (2)
  • Type 2 is almost as effective
  • All intermediate branches removed
  • Remainder loop is required since trip count not
    known at compile time
  • Need to make sure dont spend much time in rem
    loop
  • Type 3 can be effective
  • No branches eliminated
  • But iteration overlap still possible
  • Always applicable (most loops fall into this
    category!)
  • Use average trip count to guide unroll amount

21
Class Problem
Unroll both the outer loop and inner loop 2x.
Apply the most aggressive style unrolling that
you can, e.g., Type 1 if possible, else Type 2,
else Type 3
for (i0 ilt100 i) j i while (j lt
100) Aj-- Bi 0
22
Loop Peeling
  • Unravel first P iterations of a loop
  • Enable overlap of instructions from the peeled
    iterations with preheader instructions
  • Increase potential for ILP
  • Enables further optimization of main body

Preheader
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 1 bge r2 100 Done
Iteration 1
More iters?
Done
23
Control Flow Opti for Acyclic Code
  • Rather simple transformations with these goals
  • Reduce the number of dynamic branches
  • Make larger basic blocks
  • Reduce code size
  • Classic control flow optimizations
  • Branch to unconditional branch
  • Unconditional branch to branch
  • Branch to next basic block
  • Basic block merging
  • Branch to same target
  • Branch target expansion
  • Unreachable code elimination

24
Acyclic Control Flow Optimizations (1)
1. Branch to unconditional branch
L1 if (a lt b) goto L2 . . . L2 goto L3
L1 if (a lt b) goto L3 . . . L2 goto L3 ? may
be deleted
2. Unconditional branch to branch
L1 if (a lt b) goto L3 goto L4 . . . L2 if (a lt
b) goto L3 ? may be deleted L4
L1 goto L2 . . . L2 if (a lt b) goto L3 L4
25
Acyclic Control Flow Optimizations (2)
3. Branch to next basic block
Branch is unnecessary
. . . L1 if (a lt b) goto L2
. . . L1
BB1
BB1
L2 . . .
L2 . . .
BB2
BB2
4. Basic block merging
Merge BBs when single edge between
. . . L1 L2 . . .
. . . L1
BB1
BB1
L2 . . .
BB2
26
Acyclic Control Flow Optimizations (3)
5. Branch to same target
. . . L1 if (a lt b) goto L2 goto L2
. . . L1 goto L2
6. Branch target expansion
stuff1 L1 stuff2 . . .
stuff1 L1 goto L2
BB1
BB1
. . .
. . .
L2 stuff2 . . .
L2 stuff2 . . .
BB2
BB2
What about expanding a conditional branch? --
Almost the same
27
Unreachable Code Elimination
Algorithm
entry
Mark procedure entry BB visited to_visit
procedure entry BB while (to_visit not empty)
current to_visit.pop() for (each
successor block of current) Mark successor as
visited to_visit successor
Eliminate all unvisited blocks
bb1
bb2
bb3
bb4
bb5
Which BB(s) can be deleted?
28
Class Problem
Maximally optimize the control flow of this code
L1 if (a lt b) goto L11 L2 goto L7 L3 goto
L4 L4 stuff4 L5 if (c lt d) goto L15 L6 goto
L2 L7 if (c lt d) goto L13 L8 goto L12 L9 stuff
9 L10 if (a lt c) goto L3 L11goto L9 L12 goto
L2 L13 stuff 13 L14 if (e lt f) goto L11 L15
stuff 15 L16 rts
29
Profile-based Control Flow Optimization Trace
Selection
  • Trace - Linear collection of basic blocks that
    tend to execute in sequence
  • Likely control flow path
  • Acyclic (outer backedge ok)
  • Side entrance branch into the middle of a trace
  • Side exit branch out of the middle of a trace

10
BB1
80
90
20
BB2
BB3
80
20
BB4
10
BB5
90
10
BB6
10
30
Linearizing a Trace
10 (entry count)
BB1
20 (side exit)
80
BB2
BB3
90 (entry/ exit count)
80
20 (side entrance)
BB4
10 (side exit)
BB5
90
10 (side entrance)
BB6
10 (exit count)
31
Intelligent Trace Layout for Icache Performance
trace1
BB1
Intraprocedural code placement Procedure
positioning Procedure splitting
BB2
trace 2
BB4
BB6
trace 3
BB3
The rest
BB5
Procedure view
Trace view
Write a Comment
User Comments (0)
About PowerShow.com