Enhanced Pipeline Scheduling - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Enhanced Pipeline Scheduling

Description:

x' = x 4 iter. n 1. if(!cc) goto loop. x' = x 4. y ... y = load(x') iter. n 1. Stage 1. Stage 2. Stage 3. Advantages of EPS. We can schedule 'ANY' loops ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 15
Provided by: JCK7
Category:

less

Transcript and Presenter's Notes

Title: Enhanced Pipeline Scheduling


1
Enhanced Pipeline Scheduling
2
Overview of Enhanced Pipeline Scheduling
  • Compiler Scheduling for ILP
  • Basic Idea of EPS and a Generic Example
  • Aggressive DAG scheduling techniques

3
Review of Compiler Instruction Scheduling
  • Extract independent instructions from sequential
    code and group them for parallel execution
  • We expect them to be executed in parallel by H/W
  • Scheduling within BB is not enough to make H/W
    busy
  • Need advanced techniques that schedule beyond BBs
  • Classified into two categories based on code type
  • Acyclic code Global DAG Scheduling
  • Cyclic code Software Pipelining

4
DAG Scheduling
  • Schedule instructions beyond BB boundaries
  • Achieved via code motion (CM) across BBs
  • e.g., speculative CM, join CM, branch CM,
  • There have been many DAG scheduling techniques
  • trace scheduling, superblock scheduling,
    hyperblock scheduling, boosting, global
    scheduling, selective scheduling,

5
Software Pipelining
  • Schedule instructions beyond loop iteration
    boundaries
  • Iterations are overlapped in a pipelined fashion
  • prolog, kernel, and epilog
  • More efficient than unrolling-followed-by-DAG
    scheduling
  • There have been many software pipelining
    techniques...
  • However, there are only two practical techniques
  • modulo scheduling (MS)
  • enhanced pipeline scheduling (EPS)

6
Enhanced Pipeline Scheduling (EPS)
  • A software pipelining technique based on global
    DAG scheduling, which is very different from MS
  • MS destroys the original loop and creates a new
    loop
  • For a given loop, we just repeat DAG scheduling.
  • When instructions are moved across the loop
    back-edge, the pipelining effect takes place.
    We call it cross-iteration code motion (CICM).
  • So, EPS simply defines DAGs in the loop body by
    cutting edges, which are then scheduled globally.

7
A generic EPS example
x x4 iter. n
x x4
Stage 1
x x4
y load(x)
y load(x)
y load(x)
cc (y0)
cc (y0)
cc (y0)
if(!cc) goto loop
if(!cc) goto loop
if(!cc) goto loop
store x _at_A
store x _at_A
store x _at_A
Stage 2
x x4 iter. 1 y load(x) iter. 1 x
x4 iter. 2
x x4 iter. 1 bookkeeping
x x4
x x
x x
x x
Stage 3
x x
y load(x)
y load(x) iter. n
cc (y0) iter. n
x x 4
x x 4 iter. n1
y load(x) iter. n1
cc (y0)
cc (y0)
x x 4 iter. n2
if(!cc) goto loop
if(!cc) goto loop
if(!cc) goto loop
store x _at_A
store x _at_A
store x _at_A
8
Advantages of EPS
  • We can schedule ANY loops
  • Loops with arbitrary control flows
  • Loops whose trip counts are not constants
  • e.g., pointer-chasing loops
  • Outer loops
  • Unstructured loops
  • due to its code-motion-based pipelining
  • Can achieve tight, variable II for multi-path
    loops
  • Particularly useful for optimizing
    integer code

9
Global DAG Scheduling
  • We can use any global scheduling techniques for
    scheduling of DAGs in each stage of EPS, but
  • we use selective scheduling (most aggressive)
  • All-path speculative code motion
  • Join code motion
  • Unification
  • Renaming
  • Forward substitution

10
Speculation, join code motion
yww
IF cc0
IF cc0
yww
yz
yz
uy1
uy1
zx1
zx1
Speculation
Join code motion
11
Renaming
... add r1,1,r2 ... sub r3,2,r1 ...
... sub r3,2,r1 ... add r1,1,r2 ... mov
r1,r1 ...
add r1,2,r2
add r1,2,r2
add r2,2,r3
mov r2,r2
add r2,2,r3
12
Forward-substitution
... mov r1, r2 ... add r2,1,r3 ...
... add r1,1,r3 ... mov r1, r2 ...
13
Unification
  • Simplest form moving an instr. below a hammock
    to the above of the hammock
  • Selective scheduling can do more sophisticated
    form of unification

A
y x 1 A
xload() y x 1
xload()
z x 1
z y
y x 1 B
B
14
Selective Scheduling
  • All these techniques are well merged into a
    single, powerful global scheduling algorithm
  • Can extract more useful parallel instructions
    (even w/o profiling)
  • When combined with EPS, it can maximize the
    scheduling power of EPS
  • References
  • Parallelizing non-numerical code with selective
    scheduling and software pipelining ACM TOPLAS
    Nov. 1997
  • Unroll-based copy elimination for EPS IEEE TC
    Sep. 2002
  • Split-Path EPS IEEE TPDS May 2003
Write a Comment
User Comments (0)
About PowerShow.com