Explicit Modelling in Metaheuristic Optimization - PowerPoint PPT Presentation

About This Presentation

Title:

Explicit Modelling in Metaheuristic Optimization

Description:

EDA pseudocode. Initialize a probability model, Q(x) ... EDA Example 2. Mutual Information Maximization for Input Clustering (MIMIC) ... – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 28

Provided by: ITEE

Category:

more less

Transcript and Presenter's Notes

Title: Explicit Modelling in Metaheuristic Optimization

1
Explicit Modelling in Metaheuristic Optimization

Dr Marcus Gallagher
School of Information Technology and Electrical
Engineering
University of Queensland Q. 4072
marcusg_at_itee.uq.edu.au

Talk outline
Optimization, heuristics and metaheuristics.
Estimation of Distribution (optimization)
algorithms (EDAs) a brief overview.
A framework for describing EDAs.
Other modelling approaches in metaheuristics.
Summary

3
Hard Optimization Problems

Goal Find
where S is often multi-dimensional real-valued
or binary
Many classes of optimization problems (and
algorithms) exist.
When might it be worthwhile to consider
metaheuristic or machine learning approaches?

Finding an exact solution is intractable.
Limited knowledge of f()
No derivative information.
May be discontinuous, noisy,
Evaluating f() is expensive in terms of time or
cost.
f() is known or suspected to contain nasty
features
Many local minima, plateaus, ravines.
The search space is high-dimensional.

What is the practical goal of (global)
optimization?
There exists a goal (e.g. to find as small a
value of f() as possible), there exist resources
(e.g. some number of trials), and the problem is
how to use these resources in an optimal way.
A. Torn and A. Zilinskas, Global Optimisation.
Springer-Verlag, 1989. Lecture Notes in Computer
Science, Vol. 350.

6
Heuristics

Heuristic (or approximate) algorithms aim to find
a good solution to a problem in a reasonable
amount of computation time but with no
guarantee of goodness or efficiency (cf.
exact or complete algorithms).
Broad classes of heuristics
Constructive methods
Local search methods

7
Metaheuristics

Metaheuristics are (roughly) high-level
strategies that combinine lower-level techniques
for exploration and exploitation of the search
space.
An overarching term to refer to algorithms
including Evolutionary Algorithms, Simulated
Annealing, Tabu Search, Ant Colony, Particle
Swarm, Cross-Entropy,
C. Blum and A. Roli. Metaheuristics in
Combinatorial Optimization Overview and
Conceptual Comparison. ACM Computing Surveys,
35(3), 2003, pp. 268-308.

8
Learning/Modelling for Optimization

Most optimization algorithms make some (explicit
or implicit) assumptions about the nature of f().
Many algorithms vary their behaviour during
execution (e.g. simulated annealing).
In some optimization algorithms the search is
adaptive
Future search points evaluated depend on previous
points searched (and/or their f() values,
derivatives of f() etc).
Learning/modelling can be implicit (e.g, adapting
the step-size in gradient descent, population in
an EA).
or explicit examples from optimization
literature
Nelder-Mead simplex algorithm.
Response surfaces (metamodelling, surrogate
function).

9
EDAs Probabilistic Modelling for Optimization

Based on the use of (unsupervised) density
estimators/generative statistical models.
Idea is to convert the optimization problem into
a search over probability distributions.
P. Larranaga and J. A. Lozano (eds.). Estimation
of Distribution Algorithms a new tool for
evolutionary computation. Kluwer Academic
Publishers, 2002.
The probabilistic model is in some sense an
explicit model of (currently) promising regions
of the search space.

10
EDAs toy example
11
EDAs toy example
12
GAs and EDAs compared

GA pseudocode
Initialize the population, X(t)
Evaluate the objective function for each point
Selection()
Crossover()
Mutation()
? Form new population X(t1)
While !(terminate()) Goto 2

13
GAs and EDAs compared

EDA pseudocode
Initialize a probability model, Q(x)
Create a population of points by sampling from
Q(x)
Evaluate the objective function for each point
Update Q(x) using selected population and f()
values
While !(terminate()) Goto 2

14
EDA Example 1

Population-based Incremental Learning (PBIL)
S. Baluja, R. Caruana. Removing the Genetics from
the Standard Genetic Algorithm. ICML95.

p1 Pr(x11)
p2 Pr(x21)
pn Pr(xn1)
15
EDA Example 2

Mutual Information Maximization for Input
Clustering (MIMIC)
J. De Bonet, C. Isbell and P. Viola. MIMIC
Finding optima by estimating probability
densities. Advances in Neural Information
Processing Systems, vol.9, 1997.

16
EDA Example 3

Combining Optimizers with Mutual Information
Trees (COMIT)
S. Baluja and S. Davies. Using optimal
dependency-trees for combinatorial optimization
learning the structure of the search space.
Proc. ICML97.
Uses a tree-structured graphical model
Model can be constructed in O(n2) time using a
variant of the minimum spanning tree algorithm.
Model is optimal, given the restrictions, in the
sense that the Kullback-Liebler divergence
between the model and a full joint distribution
is minimized.

17
EDA Example 4

Bayesian Optimization Algorithm (BOA)
M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA
The Bayesian optimization algorithm. In Proc.
GECCO99.
Bayesian network model where nodes can have at
most k parents.
Greedy search over the Bayesian Dirichlet
equivalence metric to find the network structure.

18
Further work on EDAs

EDAs have also been developed
For problems with continuous and mixed variables.
That use mixture models and kernel estimators -
allowing for the modelling of multi-modal
distributions.
and more!

19
A framework to describe building and adapting a
probabilistic model for optimization

See
M. Gallagher and M. Frean. Population-Based
Continuous Optimization, Probabilistic Modelling
and Mean Shift. To appear, Evolutionary
Computation, 2005.
Consider a continuous EDA with model
Consider a Boltzmann distribution over f(x)

As T?0, P(x) tends towards a set of impulse
spikes over the global optima.
Now, we have a probability distribution that we
know the form of, Q(x) and we would like to
modify it to be close to P(x). KL divergence
Let Q(x) be a Gaussian try and minimize K via
gradient descent with respect to the mean
parameter of Q(x).

The gradient becomes
An approximation to the integral is to use a
sample of x from Q(x)

The algorithm update rule is then
Similar ideas can be found in
A. Berny. Statistical Machine Learning and
Combinatorial Optimization. In L. Kallel et al.
eds, Theoretical Aspects of Evolutionary
Computation, pp. 287-306. Springer. 2001.
M. Toussaint. On the evolution of phenotypic
exploration distributions. In C. Cotta et al.
eds, Foundations of Genetic Algorithms (FOGA
VII), pp. 169-182. Morgan Kaufmann. 2003.

23
Some insights

The derived update rule is closely related to
those found in Evolution Strategies and a version
of PBIL for continuous spaces.
It is possible to view these existing algorithms
as approximately doing KL minimization.
The objective function appears explicitly in this
update rule (no selection).

24
Other Research in Learning/Modelling for
Optimization

J. A. Boyan and A. W. Moore. Learning Evaluation
Functions to Improve Optimization by Local
Search. Journal of Machine Learning Research 12,
2000.
B. Anderson, A. Moore and D. Cohn. A
Nonparametric Approach to Noisy and Costly
Optimization. International Conference on
Machine Learning, 2000.
D. R. Jones. A Taxonomy of Global Optimization
Methods Based on Response Surfaces. Journal of
Global Optimization 21(4)345-383, 2001.
Reinforcement learning
R. J. Williams (1992). Simple statistical
gradient-following algorithms for connectionist
reinforcement learning. Machine Learning,
8229-256.
V. V. Miagkikh and W. F. Punch III, An Approach
to Solving Combinatorial Optimization Problems
Using a Population of Reinforcement Learning
Agents, Genetic and Evolutionary Computation
Conf.(GECCO-99), p.1358-1365, 1999.

25
Summary

The field of metaheuristics (including
Evolutionary Computation) has produced
A large variety of optimization algorithms
Demonstrated good performance on a range of
real-world problems.
Metaheuristics are considerably more general
can even be applied when there isnt a true
objective function (coevolution).
Can evolve non-numerical objects.

26
Summary

EDAs take an explicit modelling approach to
optimization.
Existing statistical models and model-fitting
algorithms can be employed.
Potential for solving challenging problems.
Model can be more easily visualized/interpreted
than a dynamic population in a conventional EA.
Although the field is highly active, it is still
relatively immature
Improve quality of experimental results.
Make sure research goals are well-defined.
Lots of preliminary ideas, but lack of
comparative/followup research.
Difficult to keep up with the literature and see
connections with other fields.