Title: CS 201 Compiler Construction
1CS 201Compiler Construction
Lecture 14 Software Pipelining Circular
Scheduling
2Motivation
- Trace Scheduling uncovers ILP in acyclic segments
of code another technique is needed to exploit
ILP across loop iterations. - 1. Loop Unrolling Unrolling a loop converts
ILP across loop iterations to ILP within a single
iteration that can be exploited using trace
scheduling. - drawback is growth in code size.
- Software Pipelining
- converts ILP across loop iterations to ILP
within a single iteration without significant
growth in code size.
3Software Pipelining
- Software Pipelining
- converts ILP across loop iterations to ILP
within a single iteration without significant
growth in code size. - a single iteration of the transformed loop
contains a single occurrence of each instruction
this is why code growth is less than unrolling. - loop iteration so constructed brings instances
of statements from different loop iterations of
the original loop into the same loop iteration.
4Software Pipelining Contd..
5Software Pipelining Contd..
Ld1
Ld2 Add1
Ld3 Add2 St1
.. Add3 St2
.. St3
.. ..
Ldn-2 ..
Ldn-1 Addn-2 ..
Ldn Addn-1 Stn-2
Addn Stn-1
Stn
Ld1
Ld2 Add1
Ldi1 Addi Sti-1
Addn Stn-1
Stn
Prologue
i 2,n-1
Loop
Epilogue
Prologue Epilogue2 iterations Loop n-2
iterations
6Circular Scheduling
- An algorithm for Software Pipelining that is
suitable for scalar architectures - Limited amount of ILP can be exploited
- Limited number of registers are available
- Assumption register allocation has already been
done - Approach
- Identify idle slots in the instruction schedule
and try to fill them by propagating instructions
across loop iterations - Continue to do the above as long as the schedule
continues to improve - If register allocation needs to be modified to
allow instruction motion, then do so.
7Circular Scheduling Contd..
- Construct a DAG for the loop body.
- Moving an instruction from later iteration to
earlier iteration corresponds to moving an
instruction from top of the DAG to the bottom of
the DAG. - An instruction moved from top of the loop to the
bottom is called a circled instruction. - If each instruction can only circle once circled
instructions form the prologue remaining
instructions form the epilogue loop is executed
N-1 times.
N iterations
8Circular Scheduling Contd..
Prologue
I1
Circled instructions
I2
Loop
N-1 Iterations
I3
Epilogue Non-circled instrns
9Circular Scheduling Contd..
Ramp-Up Ramp-Down Effect
After
Before
10Circular Scheduling Contd..
-- initialization F8 ? C R3 ? 0 R2 ? N -- loop
body Loop F4? 0(R3) R3?R31 F6?F4F8 BNE
R3,R2,Loop ltdelaygt -1(R3)?F6
- for (i0 iltN ii1)
- Xi Xi C
11Circular Scheduling Contd..
-- initialization F8 ? C R3 ? 0 R2 ? N --
prologue R3?R31 BEQ R3,R2,Lend F4?-1(R3) --
loop body Loop F6?F4F8 F4?0(R3) R3?R31 BNE
R3,R2,Loop -2(R3)?F6 -- epilogue Lend F6?F4F8
ltdelaygt -2(R3)?F6
-- initialization F8 ? C R3 ? 0 R2 ? N -- loop
body Loop F4? 0(R3) R3?R31 F6?F4F8 BNE
R3,R2,Loop ltdelaygt -1(R3)?F6
Circled instructions
12Algorithm
- Apply basic block scheduling to the loop if no
stalls present, use the schedule otherwise
continue. - If the loop has no procedure calls
if-statements then perform circular scheduling
otherwise give up. - Select one of the root nodes of the DAG for
cycling choose one on the longest path (simple
heuristic). - Rebuild the DAG assuming recycling has been
performed. - If no stalls are present, use current schedule
else if there are more stalls than before, use
previous schedule else repeat steps 3 4 to
remove additional stalls. - Create prologue epilogue alter the number of
times the loop body is executed. -
13Register Renaming
- Since register allocation is done prior to
circular scheduling, dependences due to register
usage may inhibit code motion. - Solution Perform register renaming during
circular scheduling.
VS
14Register Renaming Contd..
- Identify registers that are not live at the
beginning and the end of the basic block these
registers form the pool of temporary registers
available for temporary usage during renaming. - Ignore dependences due to reuse of registers
during building of the DAG. - Pick instruction schedule.
- If instruction uses a temporary register replace
that register by a new register (from pool) that
was used when the Def corresponding to the Use
was processed. If this is the last use, then put
the register back in the available pool.
15Register Renaming Contd..
- If instruction defines a temporary register a new
register is chosen from the available pool of
registers. - Repeat above steps till the basic block has been
scheduled. - To avoid running out of registers, given two
candidate instructions, select first an
instruction that does not need a new register or
frees up a temporary register. - If renaming fails give up and use previous
schedule.