Title: Control Flow AnalysisOpti II Loop Detection, Unrolling
1Control Flow Analysis/Opti IILoop Detection,
Unrolling
- EECS 483 Lecture 17
- University of Michigan
- Wednesday, November 10, 2004
2From Last Time Natural Loops
- Cycle suitable for optimization
- Discuss opti later
- 2 properties
- Single entry point called the header
- Header dominates all blocks in the loop
- Must be one way to iterate the loop (ie at least
1 path back to the header from within the loop)
called a backedge - Backedge detection
- Edge, x? y where the target (y) dominates the
source (x)
3Backedge Example
BE target dominates source E ? 1 No 1 ? 2
No 2 ? 3 No 2 ? 6 No 3 ? 4 No 3 ? 5
No 4 ? 3 Yes 4 ? 5 No 5 ? 3 Yes 5 ? 6
No 6 ? 2 Yes 6 ? X No
Entry
BB1
dom(1) E,1
BB2
dom(2) E,1,2
BB3
dom(3) E,1,2,3
BB4
dom(4) E,1,2,3,4
BB5
dom(5) E,1,2,3,5
BB6
dom(6) E,1,2,6
Exit
In this example, BE edge from higher BB to
lower BB, not always this easy!
4Loop Detection
- Identify all backedges using dominance info
- Each backedge (x ? y) defines a loop
- Loop header is the backedge target (y)
- Loop BB basic blocks that comprise the loop
- All predecessor blocks of x for which control can
reach x without going through y are in the loop - Merge loops with the same header
- I.e., a loop with 2 continues
- LoopBackedge LoopBackedge1 LoopBackedge2
- LoopBB LoopBB1 LoopBB2
- Important property
- Header dominates all LoopBB
5Loop Detection Example
- Loop detection 3 steps
- Identify backedges
- Compute LoopBB
- Merge loops withthe same header
Entry
BB1
dom(1) E,1
BB2
dom(2) E,1,2
BB3
dom(3) E,1,2,3
Loop1 defined by 6 ? 2 LoopBB
2,3,4,5,6 Loop2 defined by 4 ? 3 LoopBB
3,4 Loop3 defined by 5 ? 3 LoopBB
3,4,5 Merge loops 2,3 LoopBB 3,4,5
Backedges 4?3, 5?3
BB4
dom(4) E,1,2,3,4
BB5
dom(5) E,1,2,3,5
BB6
dom(6) E,1,2,6
Exit
6Class Problem
Find the loops What are the header(s)? What are
the backedge(s)?
Entry
BB1
BB2
BB3
BB4
BB5
BB6
BB7
Exit
7Important Parts of a Loop
- Header, LoopBB
- Backedges, BackedgeBB
- Exitedges, ExitBB
- For each LoopBB, examine each outgoing edge
- If the edge is to a BB not in LoopBB, then its an
exit - Preheader (Preloop)
- New block before the header (falls through to
header) - Whenever you invoke the loop, preheader executed
- Whenever you iterate the loop, preheader NOT
executed - All edges entering header
- Backedges no change, All others - retarget to
preheader - Postheader (Postloop) - analogous
8ExitBB/Preheader Example
Note, preheader for blue loop is contained in
yellow loop
Entry
BB1
Pre1
Entry
BB1
BB2
BB2
Pre2
BB3
BB3
BB4
BB4
BB5
BB5
BB6
Exit BB Blue loop BB6 Yellow loop Exit
BB6
Exit
Exit
9Characteristics of a Loop
- Nesting (generally within a procedure scope)
- Inner loop Loop with no loops contained within
it - Outer loop Loop contained within no other loops
- Nesting depth
- depth(outer loop) 1
- depth depth(parent or containing loop) 1
- Trip count (average trip count)
- How many times (on average) does the loop iterate
- for (I0 Ilt100 I) ? trip count 100
- Ave trip count weight(header) /
weight(preheader)
10Trip Count Calculation Example
Calculate the trip counts for all the loops in
the graph
Entry
BB1
20
BB2
Blue loop w(header) w(BB3)
124060700 2000 w(preheader) w(BB2)
60 ( why not 100??? ) avg trip count
2000/60 33.3 Yellow loop w(header)
w(BB2) 8020 100 w(preheader)
w(BB1) 20 avg trip count 100/20 5
1240
60
BB3
700
900
BB4
1100
40
80
200
BB5
60
BB6
20
Exit
11Loop Induction Variables
- Induction variables are variables such that every
time they changes value, they are
incremented/decremented by some constant - Basic induction variable induction variable
whose only assignments within a loop are of the
form j j /- C, where C is a constant - Primary induction variable basic induction
variable that controls the loop execution (for
i0 ilt100 i), i (virtual register holding i)
is the primary induction variable - Derived induction variable variable that is a
linear function of a basic induction variable
12Class Problem
Identify the basic, primary, and
derived inductions variables in this loop.
r1 0 r7 A
r2 r1 4 r4 r7 3 r7 r7 1 r1
load(r2) r3 load(r4) r9 r1 r3 r10 r9 gtgt
4 store (r10, r2) r1 r1 4 blt r1 100 Loop
Loop
13Reducible Flow Graphs
- A flow graph is reducible if and only if we can
partition the edges into 2 disjoint groups often
called forward and back edges with the following
properties - The forward edges form an acyclic graph in which
every node can be reached from the Entry - The back edges consist only of edges whose
destinations dominate their sources - More simply Take a CFG, remove all the
backedges (x? y where y dominates x), you should
have a connected, acyclic graph
14Irreducible Flow Graph Example
In C/C, its not possible to create an
irreducible flow graph without using gotos
Cyclic graphs that are NOT natural loops
cannot be optimized by the compiler
L1 x x 1 if (x) L2 y y 1 if
(y gt 10) goto L3 else L3 z z 1
if (z gt 0) goto L2
bb1
Non-reducible!
bb2
bb3
15Loop Unrolling
- Most renowned control flow opti
- Replicate the body of a loop N-1 times (giving N
total copies) - Loop unrolled N times or Nx unrolled
- Enable overlap of operations from different
iterations - Increase potential for ILP (instruction level
parallelism) - 3 variants
- Unroll multiple of known trip count
- Unroll with remainder loop
- While loop unroll
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
16Loop Unroll Type 1
Counted loop All parms known
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is 1 Initial
value is 0 Final value is 100 Trip count is 100
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 1 blt r2 100 Loop
r1 MEMr2 1 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 2 blt r2 100 Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
Remove r2 increments from first N-1
iterations and update last increment
Remove branch from first N-1 iterations
17Loop Unroll Type 2
Counted loop Some parms unknown
tc final initial tc tc / increment rem tc
N fin rem increment
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is ? Initial
value is ? Final value is ? Trip count is ?
RemLoop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 fin RemLoop
r1 MEMr2 X r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 (NX) blt r2 Y Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 Y Loop
Remainder loop executes the leftover iterations
Unrolled loop same as Type 1, and is guaranteed
to execute a multiple of N times
18Loop Unroll Type 3
Non-counted loop Some parms unknown
Just duplicate the body, none of the loop
branches can be removed. Instead they are
converted into conditional breaks
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 beq r2 0 Exit
pointer chasing, loop var modified in a
strange way, etc.
Can apply this to any loop!
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 MEMr2 0 bne r2 0 Loop Exit
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 bne r2 0 Loop
19Loop Unroll Summary
- Goals
- Reduce number of executed branches inside loop
- Note Type1/Type2 only
- Enable the overlapped execution of multiple
iterations - Reorder instructions between iterations
- Enable dataflow optimization across iterations
- Type 1 is the most effective
- All intermediate branches removed, least code
expansion - Only applicable to a small fraction of loops
20Loop Unroll Summary (2)
- Type 2 is almost as effective
- All intermediate branches removed
- Remainder loop is required since trip count not
known at compile time - Need to make sure dont spend much time in rem
loop - Type 3 can be effective
- No branches eliminated
- But iteration overlap still possible
- Always applicable (most loops fall into this
category!) - Use average trip count to guide unroll amount
21Class Problem
Unroll both the outer loop and inner loop 2x.
Apply the most aggressive style unrolling that
you can, e.g., Type 1 if possible, else Type 2,
else Type 3
for (i0 ilt100 i) j i while (j lt
100) Aj-- Bi 0
22Loop Peeling
- Unravel first P iterations of a loop
- Enable overlap of instructions from the peeled
iterations with preheader instructions - Increase potential for ILP
- Enables further optimization of main body
Preheader
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 1 bge r2 100 Done
Iteration 1
More iters?
Done
23Control Flow Opti for Acyclic Code
- Rather simple transformations with these goals
- Reduce the number of dynamic branches
- Make larger basic blocks
- Reduce code size
- Classic control flow optimizations
- Branch to unconditional branch
- Unconditional branch to branch
- Branch to next basic block
- Basic block merging
- Branch to same target
- Branch target expansion
- Unreachable code elimination
24Acyclic Control Flow Optimizations (1)
1. Branch to unconditional branch
L1 if (a lt b) goto L2 . . . L2 goto L3
L1 if (a lt b) goto L3 . . . L2 goto L3 ? may
be deleted
2. Unconditional branch to branch
L1 if (a lt b) goto L3 goto L4 . . . L2 if (a lt
b) goto L3 ? may be deleted L4
L1 goto L2 . . . L2 if (a lt b) goto L3 L4
25Acyclic Control Flow Optimizations (2)
3. Branch to next basic block
Branch is unnecessary
. . . L1 if (a lt b) goto L2
. . . L1
BB1
BB1
L2 . . .
L2 . . .
BB2
BB2
4. Basic block merging
Merge BBs when single edge between
. . . L1 L2 . . .
. . . L1
BB1
BB1
L2 . . .
BB2
26Acyclic Control Flow Optimizations (3)
5. Branch to same target
. . . L1 if (a lt b) goto L2 goto L2
. . . L1 goto L2
6. Branch target expansion
stuff1 L1 stuff2 . . .
stuff1 L1 goto L2
BB1
BB1
. . .
. . .
L2 stuff2 . . .
L2 stuff2 . . .
BB2
BB2
What about expanding a conditional branch? --
Almost the same
27Unreachable Code Elimination
Algorithm
entry
Mark procedure entry BB visited to_visit
procedure entry BB while (to_visit not empty)
current to_visit.pop() for (each
successor block of current) Mark successor as
visited to_visit successor
Eliminate all unvisited blocks
bb1
bb2
bb3
bb4
bb5
Which BB(s) can be deleted?
28Class Problem
Maximally optimize the control flow of this code
L1 if (a lt b) goto L11 L2 goto L7 L3 goto
L4 L4 stuff4 L5 if (c lt d) goto L15 L6 goto
L2 L7 if (c lt d) goto L13 L8 goto L12 L9 stuff
9 L10 if (a lt c) goto L3 L11goto L9 L12 goto
L2 L13 stuff 13 L14 if (e lt f) goto L11 L15
stuff 15 L16 rts
29Profile-based Control Flow Optimization Trace
Selection
- Trace - Linear collection of basic blocks that
tend to execute in sequence - Likely control flow path
- Acyclic (outer backedge ok)
- Side entrance branch into the middle of a trace
- Side exit branch out of the middle of a trace
10
BB1
80
90
20
BB2
BB3
80
20
BB4
10
BB5
90
10
BB6
10
30Linearizing a Trace
10 (entry count)
BB1
20 (side exit)
80
BB2
BB3
90 (entry/ exit count)
80
20 (side entrance)
BB4
10 (side exit)
BB5
90
10 (side entrance)
BB6
10 (exit count)
31Intelligent Trace Layout for Icache Performance
trace1
BB1
Intraprocedural code placement Procedure
positioning Procedure splitting
BB2
trace 2
BB4
BB6
trace 3
BB3
The rest
BB5
Procedure view
Trace view