Iterative Modulo Scheduling - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Iterative Modulo Scheduling

Description:

Mark Levoy, 'Light field photography and videography' Tomorrow at 2:00 in Newcomb Hall South ... Mark Levoy, 'Digital Michelangelo & Digital Forma Urbis Romae' ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 35
Provided by: KimHaz
Category:

less

Transcript and Presenter's Notes

Title: Iterative Modulo Scheduling


1
Iterative Modulo Scheduling
  • CS 771 Optimizing Compilers
  • Fall 2005 Lecture 16

2
Homework 2
  • Min 84 Max 108

3
Project Proposals
  • 7 proposals for 14 people
  • 1-3 people per project
  • Jikes RVM 5, GCC 2
  • Topics
  • Security 2
  • Fault tolerance 1
  • Software test 1
  • Compiler optimizations 2
  • Architectural support 1

4
Project Proposals Continued
  • Evaluation Criteria (30 points)
  • Overall idea 8-10
  • Demonstrated research potential 6-10
  • Document flow/clarity/style 7-10
  • Deadlines
  • Final presentations last class or exam slot
  • Final paper (10 pages double-column) One day
    before grades are due

5
Midterm
  • Next Thursday (10/27) in class
  • Short answer and problems (just like homework)
  • Will review topics covered on Tuesday (10/25)
  • No LCM (high level concepts are fair game)

6
Upcoming Talks
  • Today at 330 in MEC 205
  • Mark Levoy, Light field photography and
    videography
  • Tomorrow at 200 in Newcomb Hall South
  • Mark Levoy, Digital Michelangelo Digital Forma
    Urbis Romae
  • Tomorrow at 330
  • My CS 696 talk on Solving Tomorrows Computing
    Challenges using Symbiotic Optimization

7
Last Time Global Scheduling
  • Speculation
  • Superblock scheduling
  • Software pipelining
  • Modulo scheduling

8
Recall Software Pipelining
  • Schedule (building block) Schedule (building
    block 2 cycles apart)
  • Cycles for 50 iterations
  • first 5 cycles next 2 additional cycles
  • 5 cycles 49 2 cycles 103 cycles
  • See Rau 92 for general formula

0
1
2
3
L
4
0
1
L
L
1
2
L
L
2
3

L
3
4

S
L
4
5
S

L
5
6
S
L
7

L
8
S
9

10
S
9
Modulo Scheduling
  • A very regular form of software pipelining
  • loop iterations use the same schedule
  • loop iterations are initiated at a constant rate
  • Advantages of modulo scheduling
  • high performance
  • high throughput (steady-state performance)
  • short schedule length (transient-state
    performance)
  • simple form of software pipelining
  • efficient scheduling algorithm
  • simple bounds on throughput and register
    requirements
  • compact code
  • no code replication with sufficient hardware
    support

10
Modulo Scheduling Concepts
0 1 2 3 4 5 6 7
Schedule
Dependence graph
Stage 0
Stage 1
SCSL/II
Stage 2
Stage 3
Trace
Iteration 0
1
2
3
Time 0 1 2 3 4 5 6 7 8 9 10 11 12
ld
Initiation Interval (II)


ld


ld
-


Modulo Reservation Table (MRT)
st
ld
-


st
-
st
-
st
11
Modulo Scheduling Concepts (cont.)
  • Initiation Interval (II)
  • number of cycles between two consecutive
    iterations
  • constant
  • Modulo Reservation Table (MRT)
  • compact table (II rows, 1 column per resource) to
    track resource usage
  • used to find a modulo schedule that satisfies the
    resource constraints of the machine
  • Prologue/Epilogue
  • period of cycles required to fill/drain the
    software pipeline
  • (Stage count 1) II

12
Iterative Modulo Scheduling Approach
  • Overview of algorithm Rau, MICRO 92
  • Compute lower bound on II (MII for Minimum II)
  • due to resource (ResMII for Resource MII)
  • due to latencies or recurrences (RecMII for
    Recurrence MII)
  • Try to find a schedule for MII II
  • If attempt fails, try again with larger II

13
Minimum II due to Resources (ResMII)
  • Dependence Reservation tables
  • Compute ResMII
  • max among all resources of ceiling( of
    resource used / available resource)

time
t0
t1
t2
t3
t0
t1
a
r0
r0
r1
r1
r2
r2
b
resource
r3
r3
1, 2, 3 iterations
r0
r1
due to resources, cannot initiate iterations less
than 2 cycles apart
r2
r3
14
Minimum II due to Resources (ResMII)
  • Dealing with alternatives
  • one operation may be executed on multiple
    functional units with different resource usages
  • Example "move r2 to r1"
  • r1 r2
  • r1 r20
  • r1 r2 1
  • Algorithm (approximation of ResMII)
  • order operations by increasing number of
    alternatives
  • for (each operation, in order) do
  • select alternative that yields the lowest
    ResMII
  • end

15
Recurrence Constraints
  • Dependences
  • Inter-iteration dependences
  • Intra-iteration dependences
  • Anti and output dependences are assumed to have
    been eliminated
  • Recurrence if one iteration has dependence on
    the same operation in a previous iteration
  • Direct or indirect
  • Data or control dependence
  • Distance number of iterations separating the
    two dependent instructions (0same iteration)

16
Minimum II due to Recurrences (RecMII)
  • Dependence Schedule Dependence Schedule
  • Compute Recurrence Minimum II (RecMII)
  • Delay(c) sum of latencies along cycle c
  • Distance(c) sum of dependence distances along
    cycle c
  • smallest RecMII for which RecII Distance(c)
    gt Delay(c), for all cycles
  • max among all cycles of ceiling(Delay(c) /
    Distance(c))

a
a
1
0
3
0
b
b
x dependence distance
17
Effective Dependence Latency
  • What is the latency of a dependence
  • def and use operation in same iteration latency
    of operation
  • def operation in this iteration, use operation in
    x iterations latency - x II
  • Example

Iteration 0
1
2
3
Time 0 1 2 3 4 5 6
ld


ld


ld
1


ld


effective latency between and - a single
iteration is 4 - 12 2 because the dependence
really spans one iteration
18
Iterative Modulo Scheduler
II4
greedy fail increase II
II4
II6
II4
iterative unschedule conflicting ops and
reschedule them later
19
Iterative Modulo Scheduler (cont.)
  • Algorithm
  • II minimum feasible initiation interval
  • while (true) do
  • initialize schedule and budget
  • while (not all operations scheduled and budget
    gt 0)
  • do
  • op highest priority operation
  • min-time earliest scheduling time of op
  • max-time min-time II -1
  • time-slot find timeslot for op betw min
    and max-time
  • schedule op at timeslot, unscheduling all
    conflicting ops
  • budget budget -1
  • od
  • if (scheduled all operations) then break fi
  • II II 1
  • od

20
Iterative Modulo Scheduler (cont.)
  • Benchmark example
  • 1327 loops from the Perfect Club (1002), SPEC-89
    (298), and Livermore (27)
  • compiled for the Cydra5 machine
  • Optimizations
  • load-store elimination
  • recurrence back-substitution
  • IF-conversion

21
What is IF-Conversion?
  • Many branches can be removed if we have
    architectural support for predication
  • Converts control dependences to data dependences

Inst 1 Inst 2 If (A) B2 else B3
Inst 1 Inst 2 (A) Inst 3 (A)
Inst 4 (!A) Inst 5 (!A) Inst 6 Inst 7
Inst 8 Inst 9
B1
B2
B3
Inst 3 Inst 4 Goto B4
Inst 5 Inst 6 Goto B4
Inst 7 Inst 8 Inst 9
B4
22
Code Generation for Modulo Schedules
  • Depending on the machine support, different code
    generation techniques are used
  • rotating registers
  • r1 becomes r2, r2 becomes r3,... r32 becomes r0
  • special branch operation predicated operations
  • special branch manipulates predicates controlling
    the prologue epilogue

23
No Support for Modulo Scheduling
  • Example
  • how many registers are needed?

ld

ld

ld
ld

-
ld
ld
st
-

st
ld
ld
-

st
ld
-
st
ld
-
st
24
Minimum Unrolling due to Reg Renaming
  • Minimum unrolling for registers (Modulo Variable
    Expansion)
  • K min Max over all lifetimes i ceiling((endi -
    starti 1) / II)
  • previous example K min
  • Code generation

ld
A

B
ld
C
-
st
D
25
Loop Trip Count
  • Loop unrolled K times, has SC stages
  • will execute (SC-1) K i
  • Has a loop that is executed L times
  • preconditioning loop will execute M times
  • M L if L lt SC -1
  • M (L - (SC -1) K otherwise
  • modulo scheduled loop then executes
  • (L- M) / K times
  • ( here K3, SC 4)

26
Code without Preconditioning
  • Problems with preconditioning Rau, MICRO 92
  • preconditioned loop is not software pipelined
  • large performance loss if short trip count and
    large unrolling factors
  • code expansion is large
  • Preconditioned loop can be eliminated
  • code expansion is even larger

A1
1x
B1
A2
B1
2x
C1
B2
A3
C1
C1
B2
3x
D1
C2
B3
A1
D1
D1
C2
(3i 1)x
D2
C3
B1
A2
D2
D2
C3
B1
D3
C1
B2
A3
D3
C1
(3i 2)x
D1
C2
B3
D1
D2
C3
D3
C1
B2
D3
D1
C2
D2
27
Rotating Registers
  • Hardware
  • Register ICP Physical Register
  • ICP (iteration control pointer) is decremented at
    the branch of the software pipelined loop
  • Example

ICP 6
ICP 5
ICP 4
ICP 3
ICP 2
28
Allocation of Rotating Registers (cont.)
  • Algorithm Rau et al, PLDI 94
  • place lifetimes in 2-dimensional graph such that
    none lifetimes overlap
  • Once an allocation is found
  • set each def and use of operand using the logical
    register name
  • note that over a lifetime, a new logical name is
    needed each time that the branch is encountered

29
Rotating Registers / No Precondition
  • Code generation
  • no unrolling is needed.
  • preconditioning may still be needed to handle
    trip count smaller than SC
  • may replicate code to avoid preconditioning
  • Example

A
B
A
B
C
B
A
C
C
B
D
C
B
A
D
D
C
D
C
B
D
D
C
D
30
Branch Support for Modulo Scheduling
  • Special branch
  • LC loop count
  • ESC stages to drain pipe
  • ICP for rotating registers
  • Pred guard operations in loop

BRTOP
LC gt 0
false
ESC gt 0
false
ICP -- Pred 1 LC -- branch
ICP -- Pred 0 ESC -- branch
branch not taken
31
Modulo Schedules for While Loops
  • First approach
    No speculation
    across iterations

ld
A

A
B
II 6
B
ld
C
C
-
0
D
A
st
B
D
C
D
A
B
C
D
32
Modulo Schedules for While Loops (cont.)
  • Second approach With speculation across
    iterations
  • non speculative stage D because of store
  • all nonspeculative operations must be after the
    branch
  • search for a modulo schedule as for do-loops

ld
A

II 2
A
B
B
A
C
B
A
ld
C
D
C
B
speculative stages
-
0
A,B
D
C
st
D
D
33
Modulo Schedules for While Loops (cont.)
  • Code generation issues
  • remaining stages of speculatively initiated loop
    iterations are discarded
  • Examples
  • branch is at the end of stage C
  • branch is at the end of stage B

A
B
A
B
C
B
A
C
C
B
D
C
B
A
D
D
C
D
C
B
D
D
C
D
A
B
A
B
C
B
A
C
C
B
D
C
B
A
D
D
C
D
C
B
D
iterations that are kept
D
C
x
D
34
Summary
  • Modulo Scheduling regular form of SWP
  • Generating a schedule
  • Allocating registers for the schedule
  • Effects of hardware support
Write a Comment
User Comments (0)
About PowerShow.com