Enhanced Pipeline Scheduling - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Enhanced Pipeline Scheduling

Description:

x' = x 4 iter. n 1. if(!cc) goto loop. x' = x 4. y ... y = load(x') iter. n 1. Stage 1. Stage 2. Stage 3. Advantages of EPS. We can schedule 'ANY' loops ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 15

Provided by: JCK7

Category:

more less

Transcript and Presenter's Notes

Title: Enhanced Pipeline Scheduling

1
Enhanced Pipeline Scheduling
2
Overview of Enhanced Pipeline Scheduling

Compiler Scheduling for ILP
Basic Idea of EPS and a Generic Example
Aggressive DAG scheduling techniques

3
Review of Compiler Instruction Scheduling

Extract independent instructions from sequential
code and group them for parallel execution
We expect them to be executed in parallel by H/W
Scheduling within BB is not enough to make H/W
busy
Need advanced techniques that schedule beyond BBs
Classified into two categories based on code type
Acyclic code Global DAG Scheduling
Cyclic code Software Pipelining

4
DAG Scheduling

Schedule instructions beyond BB boundaries
Achieved via code motion (CM) across BBs
e.g., speculative CM, join CM, branch CM,
There have been many DAG scheduling techniques
trace scheduling, superblock scheduling,
hyperblock scheduling, boosting, global
scheduling, selective scheduling,

5
Software Pipelining

Schedule instructions beyond loop iteration
boundaries
Iterations are overlapped in a pipelined fashion
prolog, kernel, and epilog
More efficient than unrolling-followed-by-DAG
scheduling
There have been many software pipelining
techniques...
However, there are only two practical techniques
modulo scheduling (MS)
enhanced pipeline scheduling (EPS)

6
Enhanced Pipeline Scheduling (EPS)

A software pipelining technique based on global
DAG scheduling, which is very different from MS
MS destroys the original loop and creates a new
loop
For a given loop, we just repeat DAG scheduling.
When instructions are moved across the loop
back-edge, the pipelining effect takes place.
We call it cross-iteration code motion (CICM).
So, EPS simply defines DAGs in the loop body by
cutting edges, which are then scheduled globally.

7
A generic EPS example
x x4 iter. n
x x4
Stage 1
x x4
y load(x)
y load(x)
y load(x)
cc (y0)
cc (y0)
cc (y0)
if(!cc) goto loop
if(!cc) goto loop
if(!cc) goto loop
store x _at_A
store x _at_A
store x _at_A
Stage 2
x x4 iter. 1 y load(x) iter. 1 x
x4 iter. 2
x x4 iter. 1 bookkeeping
x x4
x x
x x
x x
Stage 3
x x
y load(x)
y load(x) iter. n
cc (y0) iter. n
x x 4
x x 4 iter. n1
y load(x) iter. n1
cc (y0)
cc (y0)
x x 4 iter. n2
if(!cc) goto loop
if(!cc) goto loop
if(!cc) goto loop
store x _at_A
store x _at_A
store x _at_A
8
Advantages of EPS

We can schedule ANY loops
Loops with arbitrary control flows
Loops whose trip counts are not constants
e.g., pointer-chasing loops
Outer loops
Unstructured loops
due to its code-motion-based pipelining
Can achieve tight, variable II for multi-path
loops
Particularly useful for optimizing
integer code

9
Global DAG Scheduling

We can use any global scheduling techniques for
scheduling of DAGs in each stage of EPS, but
we use selective scheduling (most aggressive)
All-path speculative code motion
Join code motion
Unification
Renaming
Forward substitution

10
Speculation, join code motion
yww
IF cc0
IF cc0
yww
yz
yz
uy1
uy1
zx1
zx1
Speculation
Join code motion
11
Renaming
... add r1,1,r2 ... sub r3,2,r1 ...
... sub r3,2,r1 ... add r1,1,r2 ... mov
r1,r1 ...
add r1,2,r2
add r1,2,r2
add r2,2,r3
mov r2,r2
add r2,2,r3
12
Forward-substitution
... mov r1, r2 ... add r2,1,r3 ...
... add r1,1,r3 ... mov r1, r2 ...
13
Unification

Simplest form moving an instr. below a hammock
to the above of the hammock
Selective scheduling can do more sophisticated
form of unification

A
y x 1 A
xload() y x 1
xload()
z x 1
z y
y x 1 B
B
14
Selective Scheduling

All these techniques are well merged into a
single, powerful global scheduling algorithm
Can extract more useful parallel instructions
(even w/o profiling)
When combined with EPS, it can maximize the
scheduling power of EPS
References
Parallelizing non-numerical code with selective
scheduling and software pipelining ACM TOPLAS
Nov. 1997
Unroll-based copy elimination for EPS IEEE TC
Sep. 2002
Split-Path EPS IEEE TPDS May 2003