Title: The Bayesian Optimization Algorithm with Substructural Local Search
1The Bayesian Optimization Algorithm with
Substructural Local Search
- Claudio Lima, Martin Pelikan, Kumara Sastry,
Martin Butz, David Goldberg, and Fernando Lobo
2Overview
- Motivation
- Bayesian Optimization Algorithm (BOA)
- Modeling fitness in BOA
- Substructural Neighborhoods
- BOA with Substuctural Hillclimbing
- Results
- Conclusions
- Future Work
3Motivation
- Probabilistic models of EDAs allow better
recombination of subsolutions - Get we can more from these models? Yes!
- Efficiency enhancement on EDAs
- Evaluation relaxation
- Local search in substructural neighborhoods
4Bayesian Optimization Algorithm
- Pelikan, Goldberg, and Cantú-Paz (1999)
- Use Bayesian networks to model good solutions
- Model structure gt acyclic directed graph
- Nodes represent variables
- Edges represent conditional dependencies
- Model parameters gt conditional probabilities
- Conditional Probability Tables based on the
observed frequencies - Local structures Decision Trees or Graphs
5Learning a Bayesian Network
- Start with an empty network (independence
assumption) - Perform operation that improves the metric the
most - Edge addition, edge removal, edge reversal
- Metric quantifies the likelihood of the model wrt
data (good solutions) - Stop when no more improvement is possible
6A 3-bit Example
Model Structure
Model Parameters
Directed Acyclic Graph
Conditional Probability Tables
Decision Trees
X2
X2
X3
0
1
X3
P(x11) 0.20
X1
0
1
P(x11) 0.15
P(x11) 0.45
7Modeling Fitness in BOA
- Bayesian networks extended to store a surrogate
fitness model (Pelikan Sastry,2004) - The surrogate fitness is learned from a
proportion of the population... - ...and is used to estimate the fitness of the
remaining individuals (therefore reducing evals)
8The same 3-bit Example
X2
0
1
X3
P(X11) 0.20 f(X10) -0.48 f(X11) 0.54
0
1
P(X11) 0.15 f(X10) -0.55 f(X11) 0.47
P(X11) 0.45 f(X10) -0.52 f(X11) 0.62
Estimated fitness
9Why Substructural Neighborhoods?
- An efficient mutation operator should search in
the correct neighborhood - Oftentimes this is done by incorportaring domain-
or problem-specific knowledge - However, efficiency typically does not generalize
beyond a small number of applications - Bitwise local search have more general
applicability but with inferior results
10Substructural Neighborhoods
- Neighborhoods defined by the probabilistic model
of EDAs - Exploits the underlying problem structure while
not loosing generality of application - Exploration of neighborhoods respect dependencies
between variables - If X1X2X3 form a linkage group, the
neighborhood considered will be 000, 001, 010,
..., 111
11Substructural Local Search
- For uniformly-scaled decomposable problems,
substructural local search scales as 0(2km1.5)
(Sastry Goldberg, 2004) - Bitwise hillclimber O( mk log(m) )
- Extended Compact GA with substructural local
search is more robust than either
single-operator-based aproaches (Lima et al.,
2005)
12Substructural Neighborhoods in BOA
- Model is more complex than in eCGA
- What is a linkage group? Which dependencies to
consider? Is order relevant? - Example topology of 3 different substructural
neighborhoods for variable X2
13BOA Substructural Hillclimbing
- After model sampling each offspring undergoes
local search with a certain probability pls - Current model is used to define the neighborhoods
- Choice of best subsolutions gt surrogate fitness
model - Cost of performing local search is then minimal
14Substructural Hillclimbing in BOA
15Substructural Hillclimbing in BOA
- Use reverse ancestral ordering of variables
- 2 different versions of the substructural
hillclimber (step 3) - Evaluated fitness
- Estimated fitness
- Result of local search is evaluated
16Experiments
- Additively decomposable problems
- Two important bounds Onemax and concatenated
k-bit traps - Many things in between
17Onemax Results (l50)
18Onemax Results (l50)
- Correctness of substructural neighborhoohs is not
relevant... - ...but the choice of subsolutions relies on the
accuracy of the surrogate fitness model - More important, the acceptance of the best
subsolutions depends also on the surrogate, if
using estimated fitness
1910x5-bit trap Results (l50)
2010x5-bit trap Results (l50)
- Correct identification of problem substructure is
crucial - Different versions of the hillclimber perform
similar (for small pls) - Cost of using evaluated fitness increases
significatively with pls (and with problem size) - Phase transition in the population size required
21Scalability Results (5-bit traps)
22Scalability Results (5-bit traps)
- Substancial speedups are obtained (?6 for l140)
- Speedup scales as O(l0.45) for llt80
- For bigger problem sizes the speedup is more
moderate - pls5x10-4 adequate for range of problems tested,
but optimal proportion should decrease for higher
problem sizes
23More on Scalability...
24Scalability Issues
- Optimal proportion of local search slowly
decreases with problem size - Exploration of substructural neighborhoods is
sensitive to the accuracy of model structure - Spurious linkage size grows with problem size
- BOAs sampling ability is not affected because
conditional probabilities nearly express
independence between spurious and linked variables
25Future Work
- Model optimal proportion of local search pls
- Get more accurate model structures
- Only accept pairwise depedencies that improve
metric beyond some threshold (significance test) - Study the improvement function of the metric
- Consider other neighborhood topologies
- Consider overlapping substructures
26Conclusions
- Incorporation of substructural local search in
BOA leads to significant speedups - Use of surrogate fitness in local search provides
effective learning of substructures with minimal
cost on evals. - The importance of designing and hybridizing
competent operators have been empirically
demonstrated