CS 201 Compiler Construction - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

CS 201 Compiler Construction

Description:

Trace Scheduling uncovers ILP in acyclic segments of code another technique is ... Add1. Ldi 1. Addi. Sti-1. Addn. Stn-1. Stn. i =2,n-1. Loop. Prologue. Epilogue ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 16
Provided by: defau764
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 201 Compiler Construction


1
CS 201Compiler Construction
Lecture 14 Software Pipelining Circular
Scheduling
2
Motivation
  • Trace Scheduling uncovers ILP in acyclic segments
    of code another technique is needed to exploit
    ILP across loop iterations.
  • 1. Loop Unrolling Unrolling a loop converts
    ILP across loop iterations to ILP within a single
    iteration that can be exploited using trace
    scheduling.
  • drawback is growth in code size.
  • Software Pipelining
  • converts ILP across loop iterations to ILP
    within a single iteration without significant
    growth in code size.

3
Software Pipelining
  • Software Pipelining
  • converts ILP across loop iterations to ILP
    within a single iteration without significant
    growth in code size.
  • a single iteration of the transformed loop
    contains a single occurrence of each instruction
    this is why code growth is less than unrolling.
  • loop iteration so constructed brings instances
    of statements from different loop iterations of
    the original loop into the same loop iteration.

4
Software Pipelining Contd..
5
Software Pipelining Contd..
Ld1
Ld2 Add1
Ld3 Add2 St1
.. Add3 St2
.. St3
.. ..
Ldn-2 ..
Ldn-1 Addn-2 ..
Ldn Addn-1 Stn-2
Addn Stn-1
Stn
Ld1
Ld2 Add1
Ldi1 Addi Sti-1
Addn Stn-1
Stn
Prologue
i 2,n-1
Loop
Epilogue
Prologue Epilogue2 iterations Loop n-2
iterations
6
Circular Scheduling
  • An algorithm for Software Pipelining that is
    suitable for scalar architectures
  • Limited amount of ILP can be exploited
  • Limited number of registers are available
  • Assumption register allocation has already been
    done
  • Approach
  • Identify idle slots in the instruction schedule
    and try to fill them by propagating instructions
    across loop iterations
  • Continue to do the above as long as the schedule
    continues to improve
  • If register allocation needs to be modified to
    allow instruction motion, then do so.

7
Circular Scheduling Contd..
  • Construct a DAG for the loop body.
  • Moving an instruction from later iteration to
    earlier iteration corresponds to moving an
    instruction from top of the DAG to the bottom of
    the DAG.
  • An instruction moved from top of the loop to the
    bottom is called a circled instruction.
  • If each instruction can only circle once circled
    instructions form the prologue remaining
    instructions form the epilogue loop is executed
    N-1 times.

N iterations
8
Circular Scheduling Contd..
Prologue
I1
Circled instructions
I2
Loop
N-1 Iterations
I3

Epilogue Non-circled instrns
9
Circular Scheduling Contd..
Ramp-Up Ramp-Down Effect


After
Before
10
Circular Scheduling Contd..
-- initialization F8 ? C R3 ? 0 R2 ? N -- loop
body Loop F4? 0(R3) R3?R31 F6?F4F8 BNE
R3,R2,Loop ltdelaygt -1(R3)?F6
  • for (i0 iltN ii1)
  • Xi Xi C

11
Circular Scheduling Contd..
-- initialization F8 ? C R3 ? 0 R2 ? N --
prologue R3?R31 BEQ R3,R2,Lend F4?-1(R3) --
loop body Loop F6?F4F8 F4?0(R3) R3?R31 BNE
R3,R2,Loop -2(R3)?F6 -- epilogue Lend F6?F4F8
ltdelaygt -2(R3)?F6
-- initialization F8 ? C R3 ? 0 R2 ? N -- loop
body Loop F4? 0(R3) R3?R31 F6?F4F8 BNE
R3,R2,Loop ltdelaygt -1(R3)?F6
Circled instructions
12
Algorithm
  • Apply basic block scheduling to the loop if no
    stalls present, use the schedule otherwise
    continue.
  • If the loop has no procedure calls
    if-statements then perform circular scheduling
    otherwise give up.
  • Select one of the root nodes of the DAG for
    cycling choose one on the longest path (simple
    heuristic).
  • Rebuild the DAG assuming recycling has been
    performed.
  • If no stalls are present, use current schedule
    else if there are more stalls than before, use
    previous schedule else repeat steps 3 4 to
    remove additional stalls.
  • Create prologue epilogue alter the number of
    times the loop body is executed.

13
Register Renaming
  • Since register allocation is done prior to
    circular scheduling, dependences due to register
    usage may inhibit code motion.
  • Solution Perform register renaming during
    circular scheduling.

VS
14
Register Renaming Contd..
  • Identify registers that are not live at the
    beginning and the end of the basic block these
    registers form the pool of temporary registers
    available for temporary usage during renaming.
  • Ignore dependences due to reuse of registers
    during building of the DAG.
  • Pick instruction schedule.
  • If instruction uses a temporary register replace
    that register by a new register (from pool) that
    was used when the Def corresponding to the Use
    was processed. If this is the last use, then put
    the register back in the available pool.

15
Register Renaming Contd..
  • If instruction defines a temporary register a new
    register is chosen from the available pool of
    registers.
  • Repeat above steps till the basic block has been
    scheduled.
  • To avoid running out of registers, given two
    candidate instructions, select first an
    instruction that does not need a new register or
    frees up a temporary register.
  • If renaming fails give up and use previous
    schedule.
Write a Comment
User Comments (0)
About PowerShow.com