Software Pipelining By Nagaraju Pothineni 2003CSY0001 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Software Pipelining By Nagaraju Pothineni 2003CSY0001

Description:

Software Pipelining : Compiler loop optimization technique ... Prelude ( ) Instructions before the new loop. Postlude ( ) Instructions after the new loop ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 33
Provided by: Naga8
Category:

less

Transcript and Presenter's Notes

Title: Software Pipelining By Nagaraju Pothineni 2003CSY0001


1
Software PipeliningByNagaraju
Pothineni2003CSY0001
  • A term-paper presentation
  • On

2
Outline
  • Definition
  • Representation of the program
  • Data Dependency Graph (DDG)
  • Unconstrained scheduling
  • Estimating Initiation Interval
  • Modulo Scheduling
  • Kernel Recognition

3
Definition
  • Software Pipelining Compiler loop optimization
    technique that reforms the loop to achieve faster
    execution rate by overlapping the executions of
    iterations.

4
Data Dependency Graph
  • Node Operations
  • Arc Dependency
  • Types of Dependencies
  • True Dependencies (RAW)
  • Anti Dependencies (WAR)
  • Output Dependencies (WAW)
  • Control Dependencies
  • Dependency Vs Conflict

5
Data Dependency Graph
  • Program segment
  • 1. a b c
  • 2. f a - d
  • 3. b c d
  • 4. a b / e

b
c
d
e
1
3
b
a
/4
-2
RAW
f
a
WAR
WAW
6
Data Dependency Graph (For Loops)
  • Two categories of arcs
  • Loop Independent Arc
  • Loop Carried Arc
  • In DFG arcs are Loop Independent Arcs
  • Types of Loops
  • Doall loop
  • Doacross Loop

7
Data Dependency Graph (For Loops)
  • Representation of arcs in the loops
  • An arc a ? b, is annotated with (diff, min) pair
  • diff indicates the dependency between am and b m
    diff
  • min indicates that if am is placed at time t then
    b m diff can be placed no earlier than tmin

8
Scheduling
  • Operations from different iterations are
    scheduled together
  • No need to unroll the loop
  • Find the Repeating pattern Kernel , the new
    loop body
  • Pipeline Executing iterations in parallel

9
Greedy Scheduling
  • Assume no resource constraints
  • Example1

(2,1)
for (i1 iltn i) O1 ai2 ai
1 O2 bi ai2 / 2 O3 ci bi
3 O4 di ci
ITERATIONS I1 1 1 T I2 2 2 1 1 I
I3 3 3 2 2 1 1 M E I5 4 4 3 3
2 2 1 1 I6 4 4 3 3 2
2 . . .
1
Kernel
(0,1)
2
(0,1)
I4 4 4 3 3 2 2 1 1
3
(0,1)
4
DDG
10
Initiation Interval
  • New loop body contains all the operations in the
    original loop
  • Delay between initiation of iterations of new
    loop Initiation Interval (II) or Length of
    Kernel
  • Span of kernel - number of iterations from the
    original loop, in the kernel
  • Effective Initiation Interval - Average time one
    iteration takes to complete
  • EII (II/Iteration_ct)

11
Initiation Interval
  • Kernel does not start and stop as the original
    loop
  • Prelude (?) Instructions before the new loop
  • Postlude (?) Instructions after the new loop
  • Lk ? Km ?
  • L Original Loop
  • K Kernel
  • m (k-n1)/Iteration_ct
  • n span

12
Estimating II
  • Resource Constrained Lower bound
  • Dependency constrained Lower bound

13
Resource Constrained LB
II ?4
14
Dependency constrained LB
  • Dependencies are transitive
  • A path ? with sum of the dif values dif? and sum
    of the min values min? is equivalent to an arc
    with (dif?,min?)

15
Methods of computing II
  • Enumerating cycles
  • Shortest path algorithm
  • Iterative shortest path
  • Linear programming

16
Enumerating cycles
  • Find all the cycles, then maximum of
  • min?/dif? gives the IIdep

17
Shortest path algorithm
  • Find the transitive closure of dependency
    constraints of the graph
  • Uses Floyds all paths shortest path algorithm

18
Transitive closure of a Graph
19
Iterative shortest path
  • Assume an II and find transitive closure
  • If it is not correct, then increment II and try
    again
  • Transitive closure is found using path algebra
  • M mIJ where mIJ gives the number of time
    steps I and J must be separated
  • An arc (diff, min) means operations must be
    separated by at least min-IIdiff

20
Iterative shortest path
  • M2 gives minimum time steps operations are to be
    separated considering paths of length 2
  • Similarly calculate Mi, where i is the maximum
    path length in the graph
  • For MIJ take the best from MIJ, M2IJ, M3IJ,
  • If all MII are non positive, then II is adequate

21
Iterative shortest path (Example)
Incorrect II
Adequate II
22
Linear Programming
  • For each arc from a ? b, write the equality
  • Ma,b ? min II diff
  • Objective function minimize II
  • Solve using LP

23
Modulo Scheduling
  • Basic Scheduling Algorithm
  • Modulo scheduling via hierarchical reduction
  • Path Algebra
  • Predicated Modulo scheduling

24
Modulo Scheduling
  • Generate a Flat Schedule taking into account
    resource conflicts and data dependencies.
  • Identical flat schedules for each iteration
  • Regular pipelining
  • Each original iteration starts after II time
    steps to its previous iteration
  • Results in operations with same Modulo II ,
    scheduled together

25
Modulo Scheduling - Example
Flat schedule
Modulo scheduling with II2
26
Modulo scheduling via Hierarchical reduction
  • DDG is modified.
  • Nodes are strongly connected components of
    original DDG
  • Draw an arc between two nodes, if there is an
    edge in original DDG between the two set of
    nodes.
  • Each strongly connected component is scheduled
    using modulo scheduling
  • Apply List scheduling to modified DDG

27
Modulo scheduling via Hierarchical
reductionExample
  • DDG is modified.
  • Nodes are strongly connected components of
    original DDG
  • Draw an arc between two nodes, if there is an
    edge in original DDG between the two set of
    nodes.
  • Each strongly connected component is scheduled
    using modulo scheduling
  • Apply List scheduling to modified DDG

28
Path Algebra
  • Mathematical formulation of modulo scheduling
  • Construct Matrix M mIJ represents the
    relative position of OJ from OI
  • If the chosen II is feasible, then from the
    matrix generate Flat schedule, else Increment II
    and try again
  • Limitation Resource constraints are considered

29
Predicated Modulo Scheduling
  • Schedules loops containing predicates
  • Resources for all operations in all decisions are
    available
  • Hardware support

30
Kernel Recognition
  • Unroll the loop and note dependencies
  • Schedule operations as early as possible
  • Find a block which is repeating

31
References
  • Software pipelining Vicki H. Allan, Reese B.
    Jones, Randal M. Lev, Stephens J. Allan, ACM,
    Computer surveys, September 1995.

32
Thank You
Write a Comment
User Comments (0)
About PowerShow.com