Adapting Convergent Scheduling Using Machine Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Adapting Convergent Scheduling Using Machine Learning

Description:

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson , Una-May O Reilly , Martin Martin , and Saman Amarasinghe – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 35

Provided by: MarkS307

Learn more at: https://groups.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adapting Convergent Scheduling Using Machine Learning

1
Adapting Convergent Scheduling Using Machine
Learning

Diego Puppin, Mark Stephenson, Una-May
OReilly, Martin Martin, and Saman Amarasinghe

Institute for Information Science and
Technologies, Italy Massachusetts Institute of
Technology, USA
2
Outline

This talk shows how one can apply machine
learning techniques to find good phase orderings
for an instruction scheduler
First, Ill introduce the scheduler that we are
interested in improving
Then, Ill discuss genetic programming
Then, Ill present experimental results

3
Clustered Architectures

Memory and registers separated into clusters
RAW
Clustered VLIWs
When scheduling, we try to co-locate data with
computation

4
Convergent Scheduling

Convergent scheduling passes are symmetric
Each pass takes as input a preference map and
outputs a preference map
Passes are modular and can be applied in any
order

5
Convergent SchedulingPreference Maps

Each entry is a weight
The weights correspond to the confidence of a
space-time assignment for a given instruction

6
Example Dependence Graph

Four clusters
High confidence
Low confidence

7
Placement Propagation
8
Critical Path Strengthening
9
Path Propagation
10
Parallelism Distribute
11
Path Propagation
12
Communication Reduction
13
Path Propagation
14
Final Schedule
15
Convergent Scheduling

Classical scheduling passes make absolute
decisions that cant be undone
Convergent scheduling passes make soft decisions
in the form of preferences
Mistakes made early on can be undone
Passes dont impose order!

Pass
Pass
16
Double-Edged Sword

The good news convergent scheduling does not
constrain phase order
Nice interface makes writing and integrating
passes easy
The bad news convergent scheduling does not
constrain phase order
Limitless number of phase orders to consider,
some of which are much better than others

17
Our Proposal

Use genetic programming to automatically search
for a phase ordering thats catered to a given
Architecture
Compiler
Our inspiration comes from Coopers work Cooper
et al., LCTES 1999

18
Genetic Programming

Searching algorithm analogous to Darwinian
evolution
Maintain a population of expressions

(sequence INITTIME (sequence PLACE (if
imbalanced LOAD COMM)))
19
Genetic Programming

Searching algorithm analogous to Darwinian
evolution
Maintain a population of expressions
Selection
The fittest expressions in the population are
more likely to reproduce
Reproduction
Crossing over subexpressions of two expressions
Mutation

20
General Flow

Randomly generated initial population

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
21
General Flow

Compiler is modified to use the given expression
as the phase ordering
Each expression is evaluated by compiling and
running the benchmark(s)
Fitness is the relative speedup over our original
phase ordering on the benchmark(s)

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
22
General Flow

Just as with Natural Selection, the fittest
individuals are more likely to survive

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
23
General Flow

Use crossover and mutation to generate new
expressions
And thus, generate new and hopefully improved
phase orderings

Create initial population (initial solutions)
Evaluation
done?
Selection
Create Variants
24
Experimental Setup

We use an in-house VLIW compiler (SUIF,
MachSUIF) and simulator
Compiler and simulator are parameterized so we
can easily change VLIW configurations
Experiments presented here are for clustered
architectures
Details of the architectures are in the paper

25
Convergent Scheduling Heuristics

Noise Introduction
Initial Time Assignment
Preplacement
Critical Path Strengthening
Communication Minimization
Parallelism Distribution
Load Balance
Dependence Enforcement
Assignment Strengthening
Functional Unit Distribution
Push to first cluster
Critical Path Distance
Cluster Creation
Register Pressure Reduction in Time
Register Pressure Reduction in Space

26
Hand-Tuned Results4-cluster VLIW, Rich
Interconnect
27
Results4-cluster VLIW, Limited Interconnect
28
Training an Improved Sequence

Goal find a sequence that works well for all the
benchmarks in the last graph (vmul, rbsorf, yuv,
etc.)
Train a sequence using these benchmarks then
For each expression in the population compile and
run all the benchmarks, take the average speedup
as fitness

29
The Schedule

Evolved sequence is much more conservative in
communication
inittime ?func ?dep ?func ?load ?func ?dep ?func
?comm ?dep ?func ?comm ?place
func reduces weights of instructions on
overloaded clusters
dep increases probability that dependent
instruction scheduled nearby
comm tries to keep neighboring instructions in
same cluster

30
Results4-cluster VLIW, Limited Interconnect
31
ResultsLeave-One-Out Cross Validation
32
Summary of Results

When we changed the architecture, the hand-tuned
sequence failed
UAS and PCC outperform convergent scheduling
Our GP system found a sequence that usually
outperforms UAS and PCC
Cross validation suggests that it is possible to
find a general-purpose sequence

33
Running Time

Using about 20 machines in a small cluster of
workstations it takes about 2 days to evolve a
sequence
This is a one-time process!
Performed by the compiler vendor

34
Disappointing Result

Unfortunately, sequences with conditionals are
weeded out of the GP selection process
Our system rewards parsimony
Convergent scheduling passes make soft decisions,
so running an extra pass may not be detrimental
Wed like to get to the bottom of this unexpected
result

35
Conclusions

Using GP were able to find architecture-specific,
application-independent sequences
We can quickly retune the compiler when
The architecture changes
The compiler itself changes

36
(No Transcript)
37
Implemented Tests

Write a Comment

User Comments (0)