Search and Optimization Methods

About This Presentation

Title:

Search and Optimization Methods

Description:

This chapter is about finding the models and parameters that ... No calculation of derivatives... EM for two-component Gaussian mixture. Tricky to maximize... – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 23

Provided by: madi67

Learn more at: http://www.stat.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Search and Optimization Methods

1
Search and Optimization Methods
Based in part on Chapter 8 of Hand, Manilla,
Smyth David Madigan
2
Introduction

This chapter is about finding the models and
parameters that minimize a general score function
S
Often have to conduct a parameter search for each
visited model
The number of possible structures can be immense.
For example, there are 3.6 ? 1013 undirected
graphical models with 10 vertices

3
Greedy Search
1. Initialize. Chose an initial state Mk 2.
Iterate. Evaluate the score function at all
adjacent states and move to the best one 3.
Stopping Criterion. Repeat step 2 until no
further improvement can be made. 4. Multiple
Restarts. Repeat 1-3 from different starting
points and choose the best solution found.
4
Systematic Search Heuristics
Breadth-first, Depth-first, Beam-search, etc.
5
Parameter Optimization
Finding the parameters ? that minimize a score
function S(?) is usually equivalent to the
problem of minimizing a complicated function in a
high-dimensional space Define the gradient
function is S as When closed form solutions to
?S(?)0 exist, no need for numerical methods.
6
Gradient-Based Methods
1. Initialize. Choose an initial value for ?
?0 2. Iterate. Starting with i0, let ?i1 ?i
?i vi where v is the direction of the next step
and lambda is the distance. Generally choose v to
be a direction that improves the score 3.
Convergence. Repeat step 2 until S appears to
have reached a local minimum. 4. Multiple
Restarts. Repeat steps 1-3 from different initial
starting points and choose the best minimum found.
7
(No Transcript)
8
Univariate Optimization
Let g(?)S(?). Newton-Raphson proceeds as
follows. Suppose g(?s)0. Then
9
1-D Gradient-Descent

? usually chosen to be quite small
Special case of NR where 1/g(?i) is replaced by
a constant

10
Multivariate Case
Curse-of-Dimensionality again. For example,
suppose S is defined on a d-dimensional unit
hypercube. Suppose we know that all components of
? are less than 1/2 at the optimum. if d1, have
eliminated half the parameter space if d2, have
eliminated 1/4 of the parameter space if d20,
have eliminated 1/1,000,000 of the parameter
space!
11
Multivariate Gradient Descent

-g(?i) points in the direction of steepest
descent
Guaranteed to converge if ? small enough
Essentially the same as the back-propagation
method used in NNs
Can replace ? with second-derivative information
(quasi-Newton uses approx).

12
Simplex Search Method
Evaluates d1 points arranged in a
hyper-tetrahedron For example, with d2,
evaluates S at the vertices of an equilateral
triangle Reflect the triangle in the side
opposite the vertex with the highest value Repeat
until oscillation occurs, then half the sides of
the triangle No calculation of derivatives...
13

14
(No Transcript)
15
EM for two-component Gaussian mixture
Tricky to maximize
16
EM for two-component Gaussian mixture, cont.
This is the E-step. Does a soft assignment of
observations to mixture components
17
EM for two-component Gaussian mixture Algorithm
18
(No Transcript)
19
EM with Missing Data
Let Q(H) denote a probability distribution for
the missing data
This is a lower bound on l(?)
20
EM (continued)
In the E-Step, Max is achieved when In the
M-Step, need to maximize
21
EM Normal Mixture Example
22
EM Normal Mixture Example

Write a Comment

User Comments (0)