Title: Automated Parameter Setting Based on Runtime Prediction:
1Automated Parameter Setting Based on Runtime
Prediction
- Towards an Instance-Aware Problem Solver
Frank Hutter, Univ. of British Columbia,
Vancouver, Canada Youssef Hamadi, Microsoft
Research, Cambridge, UK
2Motivation(1) Why automated parameter setting ?
- We want to use the best available heuristic for a
problem - Strong domain-specific heuristics in tree search
- Domain knowledge helps to pick good heuristics
- But maybe you dont know the domain ahead of time
... - Local search parameters must be tuned
- Performance depends crucially on parameter
setting - New application/algorithm
- Restart parameter tuning from scratch
- Waste of time both for researchers and
practicioners - Comparability
- Is algorithm A faster than algorithm B because
they spent more time tuning it ? ?
3Motivation(2) operational scenario
- CP solver has to solve instances from a variety
of domains - Domains not known a priori
- Solver should automatically use best strategy for
each instance - Want to learn from instances we solve
Frank Hutter
Frank Hutter
4Overview
- Previous work on runtime prediction we base
onLeyton-Brown, Nudelman et al. 02 04 - Part I Automated parameter setting based on
runtime prediction - Part II Incremental learning for runtime
prediction in a priori unknown domains - Experiments
- Conclusions
5Previous work on runtime prediction for algorithm
selection
- General approach
- Portfolio of algorithms
- For each instance, choose the algorithm that
promises to be fastest - Examples
- Lobjois and Lemaître, AAAI98 CSP
- Mostly propagations of different complexity
- Leyton-Brown et al., CP02 Combinatorial
auctions - CPLEX 2 other algorithms (which were thought
incompetitive) - Nudelman et al., CP04 SAT
- Many tree-search algorithms from last SAT
competition - On average considerably faster than each single
algorithm
6Runtime prediction Basics (1 algorithm)
Leyton-Brown, Nudelman et al. 02 04
- Training Given a set of t instances z1,...,zt
- For each instance zi
- Compute features xi (xi1,...,xim)
- Run algorithm to get its runtime yi
- Collect (xi ,yi) pairs
- Learn function f X ! R (features ! runtime), yi
? f (xi) - Test Given a new instance zt1
- Compute features xt1
- Predict runtime yt1 f(xt1)
Expensive
Cheap
7Runtime prediction Linear regression
Leyton-Brown, Nudelman et al. 02 04
- The learned function f has to be linear in the
features xi (xi1,...,xim) - yi ¼ f(xi) ?j1..m (xij wj) xi w
- The learning problem thus reduces to fitting the
weights w w1,...,wm - To grasp the vast different in runtime better,
estimate the logarithm of runtime e.g. yi 5 ?
runtime is 105 sec
8Runtime prediction Feature engineering
Leyton-Brown, Nudelman et al. 02 04
- Features can be computed quickly (in seconds)
- Basic properties like vars, clauses, ratio
- Estimates of search space size
- Linear programming bounds
- Local search probes
- Linear functions are not very powerful
- But you can use the same methodology to learn
more complex functions - Let ? (?1,...,?q) be arbitrary combinations of
the features x1,...,xm (so-called basis
functions) - Learn linear function of basis functions f(?)
? w - Basis functions used in Nudelman et al. 04
- Original features xi
- Pairwise products of features xi xj
- Only subset of these (drop useless basis
functions)
9Algorithm selection based on runtime
predictionLeyton-Brown, Nudelman et al. 02
04
- Given n different algorithms A1,...,An
- Training
- Learn n separate functions fj? ! R, j1...n
- Test
- Predict runtime yjt1 fj(?t1) for each of the
algorithms - Choose algorithm Aj with minimal yjt1
Really Expensive
Cheap
10Overview
- Previous work on runtime prediction we base on
Leyton-Brown, Nudelman et al. 02 04 - Part I Automated parameter setting based on
runtime prediction - Part II Incremental learning for runtime
prediction in a priori unknown domains - Experiments
- Conclusions
11Parameter setting based on runtime prediction
Finding the best default parameter setting for a
problem class Generate special purpose code
Minton 93 Minimize estimated error Kohavi
John 95 Racing algorithm Birattari et al.
02 Local search Hutter 04 Experimental
design Adenso-Daz Laguna 05 Decision trees
Srivastava Mediratta, 05
Runtime prediction for algorithm selection on a
per-instance base Predict runtime for each
algorithm and pick the best Leyton-Brown,
Nudelman et al. 02 04
Runtime prediction for setting parameters on a
per-instance base
12Naive application of runtime prediction for
parameter setting
- Given one algorithm with n different parameter
settings P1,...,Pn - Training
- Learn n separate functions fj? ! R, j1...n
- Test
- Predict runtime yjt1 fj(?t1) for each of the
parameter settings - Run algorithm with setting Pj with minimal yjt1
Too expensive
Fairly Cheap
- If there are too many parameter configurations
- Cannot run each parameter setting on each
instance - Need to generalize (cf. human parameter tuning)
- With separate functions there is no way to
generalize
13Generalization by parameter sharing
- Naive approach n separate functions.
- Information on theruntime of setting icannot
inform predictions for setting j ? i
- Our approach 1 single function.
- Information on theruntime of setting ican
inform predictions for setting i ? j
14Application of runtime prediction for parameter
setting
- View the parameters as additional features, learn
a single function - Training Given a set of instances z1,...,zt
- For each instance zi
- Compute features xi
- Pick some parameter settings p1,...,pn
- Run algorithm with settings p1,...,pn to get
runtimes y1i ,...,yni - Basic functions ?1i, ..., ?ni include the
parameter settings - Collect pairs (?ji,yji) (n data points per
instance) - Only learn a single function g? ! R
- Test Given a new instance zt1
- Compute features xt1
- Search over parameter settings pj. Evaluation
compute ?jt1, check g(?jt1) - Run with best predicted parameter setting p
Moderately Expensive
Cheap
15Summary of automated parameter setting based on
runtime prediction
- Learn a single function that maps features and
parameter settings to runtime - Given a new instance
- Compute the features (they are fix)
- Search for the parameter setting that minimizes
predicted runtime for these features
16Overview
- Previous work on runtime prediction we base on
Leyton-Brown, Nudelman et al. 02 04 - Part I Automated parameter setting based on
runtime prediction - Part II Incremental learning for runtime
prediction in a priori unknown domains - Experiments
- Conclusions
17Problem setting Incremental learning for
multiple domains
Frank Hutter
Frank Hutter
18Solution Sequential Bayesian Linear Regression
- Update knowledge as new data arrivesprobabilit
y distribution over weights w - Incremental (one (xi, yi) pair at a time)
- Seemlessly integrate this new data
- Optimal yields same result as a batch approach
- Efficient
- Computation 1 matrix inversion per update
- Memory can drop data we integrated
- Robust
- Simple to implement (3 lines of Matlab)
- Provides estimates of uncertainty in prediction
19What are uncertainty estimates?
20Sequential Bayesian linear regression intuition
- Instead of predicting a single runtime y, use a
probability distribution P(Y) - The mean of P(Y) is exactly the prediction of the
non-Bayesian approach, but we get uncertainty
estimates
Uncertainty of prediction
P(Y)
Log. runtime Y
Mean predicted runtime
21Sequential Bayesian linear regression technical
- Standard linear regression
- Training given training data ?1n, y1n, fit the
weights w such that y1n ¼ ?1n w - Prediction yn1 ?n1 w
- Bayesian linear regression
- Training Given training data ?1n, y1n, infer
probability distribution P(w?1n, y1n) / P(w)
?i P(yi?i, w)
- Prediction P(yn1?n1, ?1n, y1n) s
P(yn1w, ?n1) P(w?1n, y1n) dw
- Knowledge about the weights Gaussian (?w, ?w)
22Sequential Bayesian linear regression visualized
P(wi)
- Start with a prior P(w) with very high
uncertainty - First data point (?1,y1)
- P(w?1, y1) / P(w) P(y1?1,w)
Weight wi
P(y1?1,w)
Weight wi
P(wi?1, y1)
23Summary of incremental learning for runtime
prediction
- Have a probability distribution over the weights
- Start with a Gaussian prior, incremetally update
it with more data - Given the Gaussian weight distribution, the
predictions are also Gaussians - We know how uncertain our predictions are
- For new domains, we will be very uncertain and
only grow more confident after having seen a
couple of data points
Frank Hutter
Frank Hutter
24Overview
- Previous work on runtime prediction we base on
Leyton-Brown, Nudelman et al. 02 04 - Part I Automated parameter setting based on
runtime prediction - Part II Incremental learning for runtime
prediction in a priori unknown domains - Experiments
- Conclusions
25Domain for our experiments
- SAT
- Best studied NP-hard problem
- Good features already exist Nudelman et al.04
- Lots of benchmarks
- Stochastic Local Search (SLS)
- Runtime prediction has never been done for SLS
before - Parameter tuning is very important for SLS
- Parameters are often continuous
- SAPS algorithm Hutter, Tompkins, Hoos 02
- Still amongst the state-of-the-art
- Default setting not always best
- Well, I also know it well -)
- But the approach is applicable to about anything
whenever we can compute features!!
26Stochastic Local Search for SATScaling and
Probabilistic Smoothing (SAPS)Hutter, Tompkins,
Hoos 02
- Clause weighting algorithm for SAT, was
state-of-the-art in 2002 - Start with all clause weights set to 1
- Hillclimbing until you hit a local minimum
- In local minima
- Scaling scale weights of unsatisfied clauses wc
à ? wc - Probabilistic smoothing with probability
Psmooth, smooth all clause weights wc à ? wc
(1-?) average wc - Default parameter setting (?, ?, Psmooth)
(1.3,0.8,0.05) - Psmooth and ? are very closely related
27Benchmark instances
- Only satisfiable instances!
- SAT04rand SAT 04 competition instances
- mix mix of lots of different domains from
SATLIB random, graph colouring, blocksworld,
inductive inference, logistics, ...
28Adaptive parameter setting vs. SAPS default on
SAT04rand
- Trained on mix and used to choose parameters for
SAT04rand - ? 2 0.5,0.6,0.7,0.8
- ? 2 1.1,1.2,1.3
- For SAPS steps ? time
- Adaptive variant on average 2.5 times faster than
default - But default is not strong here
29Where uncertainty helps in practice qualitative
differences in training test set
- Trained on mix, tested on SAT04rand
Estimates of uncertaintyof prediction
Optimal prediction
30Where uncertainty helps in practice (2)Zoomed
to predictions with low uncertainty
Optimal prediction
31Overview
- Previous work on runtime prediction we base on
Leyton-Brown, Nudelman et al. 02 04 - Part I Automated parameter setting based on
runtime prediction - Part II Incremental learning for runtime
prediction in a priori unknown domains - Experiments
- Conclusions
32Conclusions
- Automated parameter tuning is needed and feasible
- Algorithm experts waste their time on it
- Solver can automatically choose appropriate
heuristics based on instance characteristics - Such a solver could be used in practice
- Learns incrementally from the instances it solves
- Uncertainty estimates prevent catastrophic errors
in estimates for new domains
33Future work along these lines
- Increase predictive performance
- Better features
- More powerful ML algorithms
- Active learning
- Run most informative probes for new domains (need
the uncertainty estimates) - Use uncertainty
- Pick algorithm with maximal probability of
success (not the one with minimal expected
runtime!) - More domains
- Tree search algorithms
- CP
34Future work along related lines
- If there are no features
- Local search in parameter space to find the best
default parameter setting Hutter 04 - If we can change strategies while running the
algorithm - Reinforment learning for algorithm
selectionLagoudakis Littman 00 - Low knowledge algorithm control Carchrae and
Beck 05
35The End
- Thanks to
- Youssef Hamadi
- Kevin Leyton-Brown
- Eugene Nudelman
- You for your attention ?
36Related work (1)Finding the best default
parameters
- Find single parameter setting that minimizes
expected runtime for a whole class of problems - Generate special purpose code Minton 93
- Minimize estimated error Kohavi John 95
- Racing algorithm Birattari et al. 02
- Local search Hutter 04
- Experimental design Adenso-Daz Laguna 05
- Decision trees Srivastava Mediratta, 05
37Related work (2) Algorithm selection on a
per-instance base
- Examine instance, choose algorithm that will work
well for it - Estimate size of DPLL search tree for each
algorithm Lobjois and Lemaître, 98 - Sillito 00
- Predict runtime for each algorithmLeyton-Brown,
Nudelman et al. 02 04
38Predictive accuracy
- Trained and validated on 100 uf100 instances,
1000 runs each - Tested on 100 different uf100 instances, 1000
runs each
39My research so far
- Stochastic Local Search
- SAT (SAPS algorithm)
- RNA Secondary Structure Design
- Most Probable Explanation in Graphical Models
- Particle Filtering
- Model-based diagnosis for Mars Rovers
- Automated Parameter Tuning
- Already during MSc for tuning ILS algorithm
- Employing Machine Learning
This talk