Automated Parameter Setting Based on Runtime Prediction: - PowerPoint PPT Presentation

About This Presentation
Title:

Automated Parameter Setting Based on Runtime Prediction:

Description:

Frank Hutter, Univ. of British Columbia, Vancouver, Canada ... Performance depends crucially on parameter setting. New application/algorithm: ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 36
Provided by: Ana55
Category:

less

Transcript and Presenter's Notes

Title: Automated Parameter Setting Based on Runtime Prediction:


1
Automated Parameter Setting Based on Runtime
Prediction
  • Towards an Instance-Aware Problem Solver

Frank Hutter, Univ. of British Columbia,
Vancouver, Canada Youssef Hamadi, Microsoft
Research, Cambridge, UK
2
Motivation(1) Why automated parameter setting ?
  • We want to use the best available heuristic for a
    problem
  • Strong domain-specific heuristics in tree search
  • Domain knowledge helps to pick good heuristics
  • But maybe you dont know the domain ahead of time
    ...
  • Local search parameters must be tuned
  • Performance depends crucially on parameter
    setting
  • New application/algorithm
  • Restart parameter tuning from scratch
  • Waste of time both for researchers and
    practicioners
  • Comparability
  • Is algorithm A faster than algorithm B because
    they spent more time tuning it ? ?

3
Motivation(2) operational scenario
  • CP solver has to solve instances from a variety
    of domains
  • Domains not known a priori
  • Solver should automatically use best strategy for
    each instance
  • Want to learn from instances we solve

Frank Hutter
Frank Hutter
4
Overview
  • Previous work on runtime prediction we base
    onLeyton-Brown, Nudelman et al. 02 04
  • Part I Automated parameter setting based on
    runtime prediction
  • Part II Incremental learning for runtime
    prediction in a priori unknown domains
  • Experiments
  • Conclusions

5
Previous work on runtime prediction for algorithm
selection
  • General approach
  • Portfolio of algorithms
  • For each instance, choose the algorithm that
    promises to be fastest
  • Examples
  • Lobjois and Lemaître, AAAI98 CSP
  • Mostly propagations of different complexity
  • Leyton-Brown et al., CP02 Combinatorial
    auctions
  • CPLEX 2 other algorithms (which were thought
    incompetitive)
  • Nudelman et al., CP04 SAT
  • Many tree-search algorithms from last SAT
    competition
  • On average considerably faster than each single
    algorithm

6
Runtime prediction Basics (1 algorithm)
Leyton-Brown, Nudelman et al. 02 04
  • Training Given a set of t instances z1,...,zt
  • For each instance zi
  • Compute features xi (xi1,...,xim)
  • Run algorithm to get its runtime yi
  • Collect (xi ,yi) pairs
  • Learn function f X ! R (features ! runtime), yi
    ? f (xi)
  • Test Given a new instance zt1
  • Compute features xt1
  • Predict runtime yt1 f(xt1)

Expensive
Cheap
7
Runtime prediction Linear regression
Leyton-Brown, Nudelman et al. 02 04
  • The learned function f has to be linear in the
    features xi (xi1,...,xim)
  • yi ¼ f(xi) ?j1..m (xij wj) xi w
  • The learning problem thus reduces to fitting the
    weights w w1,...,wm
  • To grasp the vast different in runtime better,
    estimate the logarithm of runtime e.g. yi 5 ?
    runtime is 105 sec

8
Runtime prediction Feature engineering
Leyton-Brown, Nudelman et al. 02 04
  • Features can be computed quickly (in seconds)
  • Basic properties like vars, clauses, ratio
  • Estimates of search space size
  • Linear programming bounds
  • Local search probes
  • Linear functions are not very powerful
  • But you can use the same methodology to learn
    more complex functions
  • Let ? (?1,...,?q) be arbitrary combinations of
    the features x1,...,xm (so-called basis
    functions)
  • Learn linear function of basis functions f(?)
    ? w
  • Basis functions used in Nudelman et al. 04
  • Original features xi
  • Pairwise products of features xi xj
  • Only subset of these (drop useless basis
    functions)

9
Algorithm selection based on runtime
predictionLeyton-Brown, Nudelman et al. 02
04
  • Given n different algorithms A1,...,An
  • Training
  • Learn n separate functions fj? ! R, j1...n
  • Test
  • Predict runtime yjt1 fj(?t1) for each of the
    algorithms
  • Choose algorithm Aj with minimal yjt1

Really Expensive
Cheap
10
Overview
  • Previous work on runtime prediction we base on
    Leyton-Brown, Nudelman et al. 02 04
  • Part I Automated parameter setting based on
    runtime prediction
  • Part II Incremental learning for runtime
    prediction in a priori unknown domains
  • Experiments
  • Conclusions

11
Parameter setting based on runtime prediction
Finding the best default parameter setting for a
problem class Generate special purpose code
Minton 93 Minimize estimated error Kohavi
John 95 Racing algorithm Birattari et al.
02 Local search Hutter 04 Experimental
design Adenso-Daz Laguna 05 Decision trees
Srivastava Mediratta, 05
Runtime prediction for algorithm selection on a
per-instance base Predict runtime for each
algorithm and pick the best Leyton-Brown,
Nudelman et al. 02 04
Runtime prediction for setting parameters on a
per-instance base
12
Naive application of runtime prediction for
parameter setting
  • Given one algorithm with n different parameter
    settings P1,...,Pn
  • Training
  • Learn n separate functions fj? ! R, j1...n
  • Test
  • Predict runtime yjt1 fj(?t1) for each of the
    parameter settings
  • Run algorithm with setting Pj with minimal yjt1

Too expensive
Fairly Cheap
  • If there are too many parameter configurations
  • Cannot run each parameter setting on each
    instance
  • Need to generalize (cf. human parameter tuning)
  • With separate functions there is no way to
    generalize

13
Generalization by parameter sharing
  • Naive approach n separate functions.
  • Information on theruntime of setting icannot
    inform predictions for setting j ? i
  • Our approach 1 single function.
  • Information on theruntime of setting ican
    inform predictions for setting i ? j

14
Application of runtime prediction for parameter
setting
  • View the parameters as additional features, learn
    a single function
  • Training Given a set of instances z1,...,zt
  • For each instance zi
  • Compute features xi
  • Pick some parameter settings p1,...,pn
  • Run algorithm with settings p1,...,pn to get
    runtimes y1i ,...,yni
  • Basic functions ?1i, ..., ?ni include the
    parameter settings
  • Collect pairs (?ji,yji) (n data points per
    instance)
  • Only learn a single function g? ! R
  • Test Given a new instance zt1
  • Compute features xt1
  • Search over parameter settings pj. Evaluation
    compute ?jt1, check g(?jt1)
  • Run with best predicted parameter setting p

Moderately Expensive
Cheap
15
Summary of automated parameter setting based on
runtime prediction
  • Learn a single function that maps features and
    parameter settings to runtime
  • Given a new instance
  • Compute the features (they are fix)
  • Search for the parameter setting that minimizes
    predicted runtime for these features

16
Overview
  • Previous work on runtime prediction we base on
    Leyton-Brown, Nudelman et al. 02 04
  • Part I Automated parameter setting based on
    runtime prediction
  • Part II Incremental learning for runtime
    prediction in a priori unknown domains
  • Experiments
  • Conclusions

17
Problem setting Incremental learning for
multiple domains
Frank Hutter
Frank Hutter
18
Solution Sequential Bayesian Linear Regression
  • Update knowledge as new data arrivesprobabilit
    y distribution over weights w
  • Incremental (one (xi, yi) pair at a time)
  • Seemlessly integrate this new data
  • Optimal yields same result as a batch approach
  • Efficient
  • Computation 1 matrix inversion per update
  • Memory can drop data we integrated
  • Robust
  • Simple to implement (3 lines of Matlab)
  • Provides estimates of uncertainty in prediction

19
What are uncertainty estimates?
20
Sequential Bayesian linear regression intuition
  • Instead of predicting a single runtime y, use a
    probability distribution P(Y)
  • The mean of P(Y) is exactly the prediction of the
    non-Bayesian approach, but we get uncertainty
    estimates

Uncertainty of prediction
P(Y)
Log. runtime Y
Mean predicted runtime
21
Sequential Bayesian linear regression technical
  • Standard linear regression
  • Training given training data ?1n, y1n, fit the
    weights w such that y1n ¼ ?1n w
  • Prediction yn1 ?n1 w
  • Bayesian linear regression
  • Training Given training data ?1n, y1n, infer
    probability distribution P(w?1n, y1n) / P(w)
    ?i P(yi?i, w)
  • Prediction P(yn1?n1, ?1n, y1n) s
    P(yn1w, ?n1) P(w?1n, y1n) dw
  • Knowledge about the weights Gaussian (?w, ?w)

22
Sequential Bayesian linear regression visualized
P(wi)
  • Start with a prior P(w) with very high
    uncertainty
  • First data point (?1,y1)
  • P(w?1, y1) / P(w) P(y1?1,w)

Weight wi
P(y1?1,w)
Weight wi
P(wi?1, y1)
23
Summary of incremental learning for runtime
prediction
  • Have a probability distribution over the weights
  • Start with a Gaussian prior, incremetally update
    it with more data
  • Given the Gaussian weight distribution, the
    predictions are also Gaussians
  • We know how uncertain our predictions are
  • For new domains, we will be very uncertain and
    only grow more confident after having seen a
    couple of data points

Frank Hutter
Frank Hutter
24
Overview
  • Previous work on runtime prediction we base on
    Leyton-Brown, Nudelman et al. 02 04
  • Part I Automated parameter setting based on
    runtime prediction
  • Part II Incremental learning for runtime
    prediction in a priori unknown domains
  • Experiments
  • Conclusions

25
Domain for our experiments
  • SAT
  • Best studied NP-hard problem
  • Good features already exist Nudelman et al.04
  • Lots of benchmarks
  • Stochastic Local Search (SLS)
  • Runtime prediction has never been done for SLS
    before
  • Parameter tuning is very important for SLS
  • Parameters are often continuous
  • SAPS algorithm Hutter, Tompkins, Hoos 02
  • Still amongst the state-of-the-art
  • Default setting not always best
  • Well, I also know it well -)
  • But the approach is applicable to about anything
    whenever we can compute features!!

26
Stochastic Local Search for SATScaling and
Probabilistic Smoothing (SAPS)Hutter, Tompkins,
Hoos 02
  • Clause weighting algorithm for SAT, was
    state-of-the-art in 2002
  • Start with all clause weights set to 1
  • Hillclimbing until you hit a local minimum
  • In local minima
  • Scaling scale weights of unsatisfied clauses wc
    Ã ? wc
  • Probabilistic smoothing with probability
    Psmooth, smooth all clause weights wc à ? wc
    (1-?) average wc
  • Default parameter setting (?, ?, Psmooth)
    (1.3,0.8,0.05)
  • Psmooth and ? are very closely related

27
Benchmark instances
  • Only satisfiable instances!
  • SAT04rand SAT 04 competition instances
  • mix mix of lots of different domains from
    SATLIB random, graph colouring, blocksworld,
    inductive inference, logistics, ...

28
Adaptive parameter setting vs. SAPS default on
SAT04rand
  • Trained on mix and used to choose parameters for
    SAT04rand
  • ? 2 0.5,0.6,0.7,0.8
  • ? 2 1.1,1.2,1.3
  • For SAPS steps ? time
  • Adaptive variant on average 2.5 times faster than
    default
  • But default is not strong here

29
Where uncertainty helps in practice qualitative
differences in training test set
  • Trained on mix, tested on SAT04rand

Estimates of uncertaintyof prediction
Optimal prediction
30
Where uncertainty helps in practice (2)Zoomed
to predictions with low uncertainty
Optimal prediction
31
Overview
  • Previous work on runtime prediction we base on
    Leyton-Brown, Nudelman et al. 02 04
  • Part I Automated parameter setting based on
    runtime prediction
  • Part II Incremental learning for runtime
    prediction in a priori unknown domains
  • Experiments
  • Conclusions

32
Conclusions
  • Automated parameter tuning is needed and feasible
  • Algorithm experts waste their time on it
  • Solver can automatically choose appropriate
    heuristics based on instance characteristics
  • Such a solver could be used in practice
  • Learns incrementally from the instances it solves
  • Uncertainty estimates prevent catastrophic errors
    in estimates for new domains

33
Future work along these lines
  • Increase predictive performance
  • Better features
  • More powerful ML algorithms
  • Active learning
  • Run most informative probes for new domains (need
    the uncertainty estimates)
  • Use uncertainty
  • Pick algorithm with maximal probability of
    success (not the one with minimal expected
    runtime!)
  • More domains
  • Tree search algorithms
  • CP

34
Future work along related lines
  • If there are no features
  • Local search in parameter space to find the best
    default parameter setting Hutter 04
  • If we can change strategies while running the
    algorithm
  • Reinforment learning for algorithm
    selectionLagoudakis Littman 00
  • Low knowledge algorithm control Carchrae and
    Beck 05

35
The End
  • Thanks to
  • Youssef Hamadi
  • Kevin Leyton-Brown
  • Eugene Nudelman
  • You for your attention ?

36
Related work (1)Finding the best default
parameters
  • Find single parameter setting that minimizes
    expected runtime for a whole class of problems
  • Generate special purpose code Minton 93
  • Minimize estimated error Kohavi John 95
  • Racing algorithm Birattari et al. 02
  • Local search Hutter 04
  • Experimental design Adenso-Daz Laguna 05
  • Decision trees Srivastava Mediratta, 05

37
Related work (2) Algorithm selection on a
per-instance base
  • Examine instance, choose algorithm that will work
    well for it
  • Estimate size of DPLL search tree for each
    algorithm Lobjois and Lemaître, 98
  • Sillito 00
  • Predict runtime for each algorithmLeyton-Brown,
    Nudelman et al. 02 04

38
Predictive accuracy
  • Trained and validated on 100 uf100 instances,
    1000 runs each
  • Tested on 100 different uf100 instances, 1000
    runs each

39
My research so far
  • Stochastic Local Search
  • SAT (SAPS algorithm)
  • RNA Secondary Structure Design
  • Most Probable Explanation in Graphical Models
  • Particle Filtering
  • Model-based diagnosis for Mars Rovers
  • Automated Parameter Tuning
  • Already during MSc for tuning ILS algorithm
  • Employing Machine Learning

This talk
Write a Comment
User Comments (0)
About PowerShow.com