Explicit Modelling in Metaheuristic Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Explicit Modelling in Metaheuristic Optimization

Description:

EDA pseudocode. Initialize a probability model, Q(x) ... EDA Example 2. Mutual Information Maximization for Input Clustering (MIMIC) ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 28
Provided by: ITEE
Category:

less

Transcript and Presenter's Notes

Title: Explicit Modelling in Metaheuristic Optimization


1
Explicit Modelling in Metaheuristic Optimization
  • Dr Marcus Gallagher
  • School of Information Technology and Electrical
    Engineering
  • University of Queensland Q. 4072
  • marcusg_at_itee.uq.edu.au

2
  • Talk outline
  • Optimization, heuristics and metaheuristics.
  • Estimation of Distribution (optimization)
    algorithms (EDAs) a brief overview.
  • A framework for describing EDAs.
  • Other modelling approaches in metaheuristics.
  • Summary

3
Hard Optimization Problems
  • Goal Find
  • where S is often multi-dimensional real-valued
    or binary
  • Many classes of optimization problems (and
    algorithms) exist.
  • When might it be worthwhile to consider
    metaheuristic or machine learning approaches?

4
  • Finding an exact solution is intractable.
  • Limited knowledge of f()
  • No derivative information.
  • May be discontinuous, noisy,
  • Evaluating f() is expensive in terms of time or
    cost.
  • f() is known or suspected to contain nasty
    features
  • Many local minima, plateaus, ravines.
  • The search space is high-dimensional.

5
  • What is the practical goal of (global)
    optimization?
  • There exists a goal (e.g. to find as small a
    value of f() as possible), there exist resources
    (e.g. some number of trials), and the problem is
    how to use these resources in an optimal way.
  • A. Torn and A. Zilinskas, Global Optimisation.
    Springer-Verlag, 1989. Lecture Notes in Computer
    Science, Vol. 350.

6
Heuristics
  • Heuristic (or approximate) algorithms aim to find
    a good solution to a problem in a reasonable
    amount of computation time but with no
    guarantee of goodness or efficiency (cf.
    exact or complete algorithms).
  • Broad classes of heuristics
  • Constructive methods
  • Local search methods

7
Metaheuristics
  • Metaheuristics are (roughly) high-level
    strategies that combinine lower-level techniques
    for exploration and exploitation of the search
    space.
  • An overarching term to refer to algorithms
    including Evolutionary Algorithms, Simulated
    Annealing, Tabu Search, Ant Colony, Particle
    Swarm, Cross-Entropy,
  • C. Blum and A. Roli. Metaheuristics in
    Combinatorial Optimization Overview and
    Conceptual Comparison. ACM Computing Surveys,
    35(3), 2003, pp. 268-308.

8
Learning/Modelling for Optimization
  • Most optimization algorithms make some (explicit
    or implicit) assumptions about the nature of f().
  • Many algorithms vary their behaviour during
    execution (e.g. simulated annealing).
  • In some optimization algorithms the search is
    adaptive
  • Future search points evaluated depend on previous
    points searched (and/or their f() values,
    derivatives of f() etc).
  • Learning/modelling can be implicit (e.g, adapting
    the step-size in gradient descent, population in
    an EA).
  • or explicit examples from optimization
    literature
  • Nelder-Mead simplex algorithm.
  • Response surfaces (metamodelling, surrogate
    function).

9
EDAs Probabilistic Modelling for Optimization
  • Based on the use of (unsupervised) density
    estimators/generative statistical models.
  • Idea is to convert the optimization problem into
    a search over probability distributions.
  • P. Larranaga and J. A. Lozano (eds.). Estimation
    of Distribution Algorithms a new tool for
    evolutionary computation. Kluwer Academic
    Publishers, 2002.
  • The probabilistic model is in some sense an
    explicit model of (currently) promising regions
    of the search space.

10
EDAs toy example
11
EDAs toy example
12
GAs and EDAs compared
  • GA pseudocode
  • Initialize the population, X(t)
  • Evaluate the objective function for each point
  • Selection()
  • Crossover()
  • Mutation()
  • ? Form new population X(t1)
  • While !(terminate()) Goto 2

13
GAs and EDAs compared
  • EDA pseudocode
  • Initialize a probability model, Q(x)
  • Create a population of points by sampling from
    Q(x)
  • Evaluate the objective function for each point
  • Update Q(x) using selected population and f()
    values
  • While !(terminate()) Goto 2

14
EDA Example 1
  • Population-based Incremental Learning (PBIL)
  • S. Baluja, R. Caruana. Removing the Genetics from
    the Standard Genetic Algorithm. ICML95.

p1 Pr(x11)
p2 Pr(x21)
pn Pr(xn1)
15
EDA Example 2
  • Mutual Information Maximization for Input
    Clustering (MIMIC)
  • J. De Bonet, C. Isbell and P. Viola. MIMIC
    Finding optima by estimating probability
    densities. Advances in Neural Information
    Processing Systems, vol.9, 1997.

16
EDA Example 3
  • Combining Optimizers with Mutual Information
    Trees (COMIT)
  • S. Baluja and S. Davies. Using optimal
    dependency-trees for combinatorial optimization
    learning the structure of the search space.
    Proc. ICML97.
  • Uses a tree-structured graphical model
  • Model can be constructed in O(n2) time using a
    variant of the minimum spanning tree algorithm.
  • Model is optimal, given the restrictions, in the
    sense that the Kullback-Liebler divergence
    between the model and a full joint distribution
    is minimized.

17
EDA Example 4
  • Bayesian Optimization Algorithm (BOA)
  • M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA
    The Bayesian optimization algorithm. In Proc.
    GECCO99.
  • Bayesian network model where nodes can have at
    most k parents.
  • Greedy search over the Bayesian Dirichlet
    equivalence metric to find the network structure.

18
Further work on EDAs
  • EDAs have also been developed
  • For problems with continuous and mixed variables.
  • That use mixture models and kernel estimators -
    allowing for the modelling of multi-modal
    distributions.
  • and more!

19
A framework to describe building and adapting a
probabilistic model for optimization
  • See
  • M. Gallagher and M. Frean. Population-Based
    Continuous Optimization, Probabilistic Modelling
    and Mean Shift. To appear, Evolutionary
    Computation, 2005.
  • Consider a continuous EDA with model
  • Consider a Boltzmann distribution over f(x)

20
  • As T?0, P(x) tends towards a set of impulse
    spikes over the global optima.
  • Now, we have a probability distribution that we
    know the form of, Q(x) and we would like to
    modify it to be close to P(x). KL divergence
  • Let Q(x) be a Gaussian try and minimize K via
    gradient descent with respect to the mean
    parameter of Q(x).

21
  • The gradient becomes
  • An approximation to the integral is to use a
    sample of x from Q(x)

22
  • The algorithm update rule is then
  • Similar ideas can be found in
  • A. Berny. Statistical Machine Learning and
    Combinatorial Optimization. In L. Kallel et al.
    eds, Theoretical Aspects of Evolutionary
    Computation, pp. 287-306. Springer. 2001.
  • M. Toussaint. On the evolution of phenotypic
    exploration distributions. In C. Cotta et al.
    eds, Foundations of Genetic Algorithms (FOGA
    VII), pp. 169-182. Morgan Kaufmann. 2003.

23
Some insights
  • The derived update rule is closely related to
    those found in Evolution Strategies and a version
    of PBIL for continuous spaces.
  • It is possible to view these existing algorithms
    as approximately doing KL minimization.
  • The objective function appears explicitly in this
    update rule (no selection).

24
Other Research in Learning/Modelling for
Optimization
  • J. A. Boyan and A. W. Moore. Learning Evaluation
    Functions to Improve Optimization by Local
    Search. Journal of Machine Learning Research 12,
    2000.
  • B. Anderson, A. Moore and D. Cohn. A
    Nonparametric Approach to Noisy and Costly
    Optimization. International Conference on
    Machine Learning, 2000.
  • D. R. Jones. A Taxonomy of Global Optimization
    Methods Based on Response Surfaces. Journal of
    Global Optimization 21(4)345-383, 2001.
  • Reinforcement learning
  • R. J. Williams (1992). Simple statistical
    gradient-following algorithms for connectionist
    reinforcement learning. Machine Learning,
    8229-256.
  • V. V. Miagkikh and W. F. Punch III, An Approach
    to Solving Combinatorial Optimization Problems
    Using a Population of Reinforcement Learning
    Agents, Genetic and Evolutionary Computation
    Conf.(GECCO-99), p.1358-1365, 1999.

25
Summary
  • The field of metaheuristics (including
    Evolutionary Computation) has produced
  • A large variety of optimization algorithms
  • Demonstrated good performance on a range of
    real-world problems.
  • Metaheuristics are considerably more general
  • can even be applied when there isnt a true
    objective function (coevolution).
  • Can evolve non-numerical objects.

26
Summary
  • EDAs take an explicit modelling approach to
    optimization.
  • Existing statistical models and model-fitting
    algorithms can be employed.
  • Potential for solving challenging problems.
  • Model can be more easily visualized/interpreted
    than a dynamic population in a conventional EA.
  • Although the field is highly active, it is still
    relatively immature
  • Improve quality of experimental results.
  • Make sure research goals are well-defined.
  • Lots of preliminary ideas, but lack of
    comparative/followup research.
  • Difficult to keep up with the literature and see
    connections with other fields.

27
The End!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com