Unconstrained Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Unconstrained Optimization

Description:

Unconstrained Optimization. Rong Jin. Recap. Gradient ascent/descent ... Limited-memory Quasi-Newton method. Gradient ascent. Free Software ... – PowerPoint PPT presentation

Number of Views:190
Avg rating:3.0/5.0
Slides: 23
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Unconstrained Optimization


1
Unconstrained Optimization
  • Rong Jin

2
Recap
  • Gradient ascent/descent
  • Simple algorithm, only requires the first order
    derivative
  • Problem difficulty in determining the step size
  • Small step size ? slow convergence
  • Large step size ? oscillation or bubbling

3
Recap Newton Method
  • Univariate Newton method
  • Mulvariate Newton method
  • Guarantee to converge when the objective function
    is convex/concave

Hessian matrix
4
Recap
  • Problem with standard Newton method
  • Computing inverse of Hessian matrix H is
    expensive (O(n3))
  • The size of Hessian matrix H can be very large
    (O(n2))
  • Quasi-Newton method (BFGS)
  • Approximate the inverse of Hessian matrix H with
    another matrix B
  • Avoid the difficulty in computing inverse of H
  • However, still have problem when the size of B is
    large
  • Limited memory Quasi-Newton method (L-BFGS)
  • Storing a set of vectors instead of matrix B
  • Avoid the difficulty in computing the inverse of
    H
  • Avoid the difficulty in storing the large-size B

5
Recap
V-Fast
Standard Newton method O(n3)
Small
Quasi Newton method (BFGS) O(n2)
Medium
Fast
Limited-memory Quasi Newton method (L-BFGS) O(n)
Large
R-Fast
6
Empirical Study Learning Conditional
Exponential Model
Dataset Iterations Time (s)
Rule 350 4.8
Rule 81 1.13
Lex 1545 114.21
Lex 176 20.02
Summary 3321 190.22
Summary 69 8.52
Shallow 14527 85962.53
Shallow 421 2420.30
Dataset Instances Features
Rule 29,602 246
Lex 42,509 135,182
Summary 24,044 198,467
Shallow 8,625,782 264,142
Limited-memory Quasi-Newton method
Gradient ascent
7
Free Software
  • http//www.ece.northwestern.edu/nocedal/software.
    html
  • L-BFGS
  • L-BFGSB

8
Conjugate Gradient
  • Another Great Numerical Optimization Method !

9
Linear Conjugate Gradient Method
  • Consider optimizing the quadratic function
  • Conjugate vectors
  • The set of vector p1, p2, , pl is said to be
    conjugate with respect to a matrix A if
  • Important property
  • The quadratic function can be optimized by simply
    optimizing the function along individual
    direction in the conjugate set.
  • Optimal solution
  • ?k is the minimizer along the kth conjugate
    direction

10
Example
  • Minimize the following function
  • Matrix A
  • Conjugate direction
  • Optimization
  • First direction, x1 x2x
  • Second direction, x1 - x2x
  • Solution x1 x21

11
How to Efficiently Find a Set of Conjugate
Directions
  • Iterative procedure
  • Given conjugate directions p1,p2,, pk-1
  • Set pk as follows
  • Theorem The direction generated in the above
    step is conjugate to all previous directions
    p1,p2,, pk-1, i.e.,
  • Note compute the k direction pk only requires
    the previous direction pk-1

12
Nonlinear Conjugate Gradient
  • Even though conjugate gradient is derived for a
    quadratic objective function, it can be applied
    directly to other nonlinear functions
  • Guarantee convergence if the objective is
    convex/concave
  • Variants
  • Fletcher-Reeves conjugate gradient (FR-CG)
  • Polak-Ribiere conjugate gradient (PR-CG)
  • More robust than FR-CG
  • Compared to Newton method
  • The first order method
  • Usually less efficient than Newton method
  • However, it is simple to implement

13
Empirical Study Learning Conditional
Exponential Model
Dataset Iterations Time (s)
Rule 142 1.93
Rule 81 1.13
Lex 281 21.72
Lex 176 20.02
Summary 537 31.66
Summary 69 8.52
Shallow 2813 16251.12
Shallow 421 2420.30
Dataset Instances Features
Rule 29,602 246
Lex 42,509 135,182
Summary 24,044 198,467
Shallow 8,625,782 264,142
Limited-memory Quasi-Newton method
Conjugate Gradient (PR)
14
Free Software
  • http//www.ece.northwestern.edu/nocedal/software.
    html
  • CG

15
When Should We Use Which Optimization Technique
  • Using Newton method if you can find a package
  • Using conjugate gradient if you have to implement
    it
  • Using gradient ascent/descent if you are lazy

16
Logarithm Bound Algorithms
  • To maximize
  • Start with a guess
  • Do it for t 1, 2, , T
  • Compute
  • Find a decoupling function
  • Find optimal solution

17
Logarithm Bound Algorithm
18
Logarithm Bound Algorithm
  • Start with initial guess x0
  • Come up with a lower bounded function ?(x) ?
    f(x) f(x0)
  • Touch point ?(x0) 0
  • Optimal solution x1 for ?(x)
  • Repeat the above procedure

19
Logarithm Bound Algorithm
Optimal Point
  • Start with initial guess x0
  • Come up with a lower bounded function ?(x) ?
    f(x) f(x0)
  • Touch point ?(x0) 0
  • Optimal solution x1 for ?(x)
  • Repeat the above procedure
  • Converge to the optimal point

20
Property of Concave Functions
  • For any concave function

21
Important Inequality
  • log(x), -exp(x) are concave functions
  • Therefore

22
Expectation-Maximization Algorithm
  • Derive the EM algorithm for Hierarchical Mixture
    Model
  • Log-likelihood of training data
Write a Comment
User Comments (0)
About PowerShow.com