John Cavazos - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

John Cavazos

Description:

Models to Predict Good Compiler Optimizations John Cavazos Dept of Computer & Information Sciences University of Delaware Several prediction model generated for ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 30
Provided by: webEecsU77
Category:

less

Transcript and Presenter's Notes

Title: John Cavazos


1
Models to Predict Good Compiler Optimizations
  • John Cavazos
  • Dept of Computer Information Sciences
  • University of Delaware

2
Whole-Program Autotuning
STEPS
  • Characterize each function
  • Prediction models ranks optimization seqs
  • Apply top sequences

3
Train and Test Model
  • Train on kernels
  • Generate training data (inputs, outputs)
  • Automatically construct a model
  • Can be expensive, but can be done offline
  • Supervised learning problem
  • Test on whole programs
  • Extract features (code characteristics)
  • Model predicts optimizations to apply

4
Constructing Models
  • Trained on data from Random Search
  • 200 evaluations for each benchmark
  • Leave-one-out cross validation
  • Regression and Support Vector Machines

5
Are models predictive?
6
Solution Overview
Static Program Features Source code or
IR Dynamic Program Features Running time
7
Solution Overview
Static Program Features Source code or
IR Dynamic Program Features Running time
Static Program Features From source code or
IR Dynamic Program Features From runtime
  • Sequence Predictor
  • Speedup Predictor
  • Tournament Predictor

8
Characterization of 181.mcf
9
Characterization of 181.mcf
Problem Greater number of memory accesses per
instruction than average
10
Sequence Predictor
11
Speedup Predictor
12
Tournament Predictor
13
Open64 Optimizations
  • Control 63 Open64 optimizations
  • Loop optimizations
  • Unrolling
  • Interchange
  • Fussion / Fission
  • Prefetching
  • Traditional optimizations
  • PRE, copy prop, strength reduction, CSE, etc.

14
Experimental Setup
  • HPCToolkit / PAPI 3.6
  • Intel Quad _at_2GHz with 8GB RAM
  • Open64 Compiler version 4.2
  • Baseline -Ofast
  • 200 randomly generated sequences
  • Benchmarks
  • 25 hot functions in kernels from Linpack, NAS,
    and Polybench
  • 16 programs from MiBench

15
Experimental Setup
  • 3 Prediction Models
  • Sequence Predictor
  • Speedup Predictor
  • Tournament Predictor
  • Machine Learning Algorithms
  • Regression
  • Support Vector Machine (SVM)

16
Regression (10 evals)
AVG Sequence 9, Tournament 21, Speedup 25
17
SVM (10 evals)
AVG Sequence 9, Tournament 15, Speedup 23,
18
Future work
  • Apply to whole applications
  • Applying to different compilers and architectures
  • Compare different characterization methods
  • Source code vs dynamic characterization

19
Across Different Machines
  • Machine 1
  • 4 of Intel Core2 Quad CPU Q9650 _at_ 3.00GHz
  • RAM 8 GB
  • Cache size 6144 KB
  • Machine 2
  • 4 of Intel Core2 Quad CPU Q9300 _at_ 2.50GHz
  • RAM 4 GB
  • Cache size 3072 KB
  • Machine 3
  • 4 of Intel Xeon CPU E5335 _at_ 2.00GHz
  • RAM 2 GB
  • Cache size 4096 KB

20
MiBench Across Machines
21
Case Study PoCC
  • Experimental setup
  • Intel Xeon E5620 _at_2.4 GHz
  • 16 hardware threads
  • Baseline ICC fast
  • 768 randomly generated sequences
  • PoCC (Polyhedral Compiler Collection)
  • Unrolling, Tiling, Loop fuse, Auto
    parallelization
  • Polybench (28 kernels)
  • Speedup Predictor (Regression / SVM)

22
PoCC (10 evals)
18.26
20.67
14.44
AVERAGE Random 3.1X, SVM 5.8X, LR 5.9X, Best
6.1X
23
Model Evaluation Summary
  • Comparison of 3 prediction models with 2 machine
    learning algorithms on kernels
  • Newly proposed and evaluated models (speedup
    predictor and tournament predictor) outperformed
    state-of-the-art predictor
  • Applying speedup predictor trained with kernels
    to MiBench
  • For seen sequences 5.4 for regression, 4.6 for
    SVM
  • For unseen sequences 5.1 for regression, 2.1
    for SVM

Sequence Predictor
Speedup Predictor
Tournament Predictor
Regression
8.7
25.0
20.7
SVM
8.8
22.5
14.6
24
Regression/SVM - MiBench (Speedup Predictor / 10
evals)
  • Regression seen sequences 5.4, unseen
    sequences 5.1
  • SVM seen sequences 4.6, unseen sequences 2.2

25
Dynamic Program Features
26
Optimizations
Phase
List of Optimizations
OPT
align-padding, ptr-opt, swp, unroll-size
WOPT
aggcm, aggstr-reduction, const-pre,
copy-propagate, bdce, dce-aggressive,
dce-global, hoisting,iv-elimination, spre,
value-numbering, dse-aggressive,unroll,
canon-expr, aggcm-threshold, combine, intrinsic,
mem-opnds, fold2const
LNO
optimize-cache, lego, prefetch-stores,prefetch-ahe
ad, interchange, pure, fusion, hoistif,
blocking-size, ecspct,fission, fission-inner-regis
ter-limit, full-unroll, full-unroll-size,
fusion-peeling-limit, outer-unroll-max,
sclrze, outer-unroll-further, max-depth,
outer-unroll-prod-max, shackle, svr, cse,
preferred-doacross-tile size, prefetch-cache-fact
or, vintr, split-tile, olf-ub, unswitch,
lego-local, call-info, prefetch,
apply-illegal-xform-directives
CG
unroll-fully, gcm, loop-opt
IPA
aggr-cprop, cprop, dce, dve
27
PoCC (1 evals)
AVERAGE- SVM 188.9, LR 256.53
28
Whole-Program Autotuning
  • Current Solution
  • Outline hot functions
  • Tune hot function in isolation
  • Integrate tuned hot function back in application
  • Disadvantage
  • Does not account for code interactions

29
Proposed solution
  • Tune several hot functions at same time
  • Cannot afford random exploration
  • Performance model ranks variants
  • Apply only predicted best optimizations
Write a Comment
User Comments (0)
About PowerShow.com