John Cavazos - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

John Cavazos

Description:

Models to Predict Good Compiler Optimizations John Cavazos Dept of Computer & Information Sciences University of Delaware Several prediction model generated for ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 30

Provided by: webEecsU77

Category:

more less

Transcript and Presenter's Notes

Title: John Cavazos

1
Models to Predict Good Compiler Optimizations

John Cavazos
Dept of Computer Information Sciences
University of Delaware

2
Whole-Program Autotuning
STEPS

Characterize each function
Prediction models ranks optimization seqs
Apply top sequences

3
Train and Test Model

Train on kernels
Generate training data (inputs, outputs)
Automatically construct a model
Can be expensive, but can be done offline
Supervised learning problem
Test on whole programs
Extract features (code characteristics)
Model predicts optimizations to apply

4
Constructing Models

Trained on data from Random Search
200 evaluations for each benchmark
Leave-one-out cross validation
Regression and Support Vector Machines

5
Are models predictive?
6
Solution Overview
Static Program Features Source code or
IR Dynamic Program Features Running time
7
Solution Overview
Static Program Features Source code or
IR Dynamic Program Features Running time
Static Program Features From source code or
IR Dynamic Program Features From runtime

Sequence Predictor
Speedup Predictor
Tournament Predictor

8
Characterization of 181.mcf
9
Characterization of 181.mcf
Problem Greater number of memory accesses per
instruction than average
10
Sequence Predictor
11
Speedup Predictor
12
Tournament Predictor
13
Open64 Optimizations

Control 63 Open64 optimizations
Loop optimizations
Unrolling
Interchange
Fussion / Fission
Prefetching
Traditional optimizations
PRE, copy prop, strength reduction, CSE, etc.

14
Experimental Setup

HPCToolkit / PAPI 3.6
Intel Quad _at_2GHz with 8GB RAM
Open64 Compiler version 4.2
Baseline -Ofast
200 randomly generated sequences
Benchmarks
25 hot functions in kernels from Linpack, NAS,
and Polybench
16 programs from MiBench

15
Experimental Setup

3 Prediction Models
Sequence Predictor
Speedup Predictor
Tournament Predictor
Machine Learning Algorithms
Regression
Support Vector Machine (SVM)

16
Regression (10 evals)
AVG Sequence 9, Tournament 21, Speedup 25
17
SVM (10 evals)
AVG Sequence 9, Tournament 15, Speedup 23,
18
Future work

Apply to whole applications
Applying to different compilers and architectures
Compare different characterization methods
Source code vs dynamic characterization

19
Across Different Machines

Machine 1
4 of Intel Core2 Quad CPU Q9650 _at_ 3.00GHz
RAM 8 GB
Cache size 6144 KB
Machine 2
4 of Intel Core2 Quad CPU Q9300 _at_ 2.50GHz
RAM 4 GB
Cache size 3072 KB
Machine 3
4 of Intel Xeon CPU E5335 _at_ 2.00GHz
RAM 2 GB
Cache size 4096 KB

20
MiBench Across Machines
21
Case Study PoCC

Experimental setup
Intel Xeon E5620 _at_2.4 GHz
16 hardware threads
Baseline ICC fast
768 randomly generated sequences
PoCC (Polyhedral Compiler Collection)
Unrolling, Tiling, Loop fuse, Auto
parallelization
Polybench (28 kernels)
Speedup Predictor (Regression / SVM)

22
PoCC (10 evals)
18.26
20.67
14.44
AVERAGE Random 3.1X, SVM 5.8X, LR 5.9X, Best
6.1X
23
Model Evaluation Summary

Comparison of 3 prediction models with 2 machine
learning algorithms on kernels
Newly proposed and evaluated models (speedup
predictor and tournament predictor) outperformed
state-of-the-art predictor
Applying speedup predictor trained with kernels
to MiBench
For seen sequences 5.4 for regression, 4.6 for
SVM
For unseen sequences 5.1 for regression, 2.1
for SVM

Sequence Predictor
Speedup Predictor
Tournament Predictor
Regression
8.7
25.0
20.7
SVM
8.8
22.5
14.6
24
Regression/SVM - MiBench (Speedup Predictor / 10
evals)

Regression seen sequences 5.4, unseen
sequences 5.1
SVM seen sequences 4.6, unseen sequences 2.2

25
Dynamic Program Features
26
Optimizations
Phase
List of Optimizations
OPT
align-padding, ptr-opt, swp, unroll-size
WOPT
aggcm, aggstr-reduction, const-pre,
copy-propagate, bdce, dce-aggressive,
dce-global, hoisting,iv-elimination, spre,
value-numbering, dse-aggressive,unroll,
canon-expr, aggcm-threshold, combine, intrinsic,
mem-opnds, fold2const
LNO
optimize-cache, lego, prefetch-stores,prefetch-ahe
ad, interchange, pure, fusion, hoistif,
blocking-size, ecspct,fission, fission-inner-regis
ter-limit, full-unroll, full-unroll-size,
fusion-peeling-limit, outer-unroll-max,
sclrze, outer-unroll-further, max-depth,
outer-unroll-prod-max, shackle, svr, cse,
preferred-doacross-tile size, prefetch-cache-fact
or, vintr, split-tile, olf-ub, unswitch,
lego-local, call-info, prefetch,
apply-illegal-xform-directives
CG
unroll-fully, gcm, loop-opt
IPA
aggr-cprop, cprop, dce, dve
27
PoCC (1 evals)
AVERAGE- SVM 188.9, LR 256.53
28
Whole-Program Autotuning