Unconstrained Optimization - PowerPoint PPT Presentation

About This Presentation

Title:

Unconstrained Optimization

Description:

Unconstrained Optimization. Rong Jin. Recap. Gradient ascent/descent ... Limited-memory Quasi-Newton method. Gradient ascent. Free Software ... – PowerPoint PPT presentation

Number of Views:190

Avg rating:3.0/5.0

Slides: 23

Provided by: rong7

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Unconstrained Optimization

1
Unconstrained Optimization

Rong Jin

2
Recap

Gradient ascent/descent
Simple algorithm, only requires the first order
derivative
Problem difficulty in determining the step size
Small step size ? slow convergence
Large step size ? oscillation or bubbling

3
Recap Newton Method

Univariate Newton method
Mulvariate Newton method
Guarantee to converge when the objective function
is convex/concave

Hessian matrix
4
Recap

Problem with standard Newton method
Computing inverse of Hessian matrix H is
expensive (O(n3))
The size of Hessian matrix H can be very large
(O(n2))
Quasi-Newton method (BFGS)
Approximate the inverse of Hessian matrix H with
another matrix B
Avoid the difficulty in computing inverse of H
However, still have problem when the size of B is
large
Limited memory Quasi-Newton method (L-BFGS)
Storing a set of vectors instead of matrix B
Avoid the difficulty in computing the inverse of
H
Avoid the difficulty in storing the large-size B

5
Recap
V-Fast
Standard Newton method O(n3)
Small
Quasi Newton method (BFGS) O(n2)
Medium
Fast
Limited-memory Quasi Newton method (L-BFGS) O(n)
Large
R-Fast
6
Empirical Study Learning Conditional
Exponential Model
Dataset Iterations Time (s)
Rule 350 4.8
Rule 81 1.13
Lex 1545 114.21
Lex 176 20.02
Summary 3321 190.22
Summary 69 8.52
Shallow 14527 85962.53
Shallow 421 2420.30
Dataset Instances Features
Rule 29,602 246
Lex 42,509 135,182
Summary 24,044 198,467
Shallow 8,625,782 264,142
Limited-memory Quasi-Newton method
Gradient ascent
7
Free Software

http//www.ece.northwestern.edu/nocedal/software.
html
L-BFGS
L-BFGSB

8
Conjugate Gradient

Another Great Numerical Optimization Method !

9
Linear Conjugate Gradient Method

Consider optimizing the quadratic function
Conjugate vectors
The set of vector p1, p2, , pl is said to be
conjugate with respect to a matrix A if
Important property
The quadratic function can be optimized by simply
optimizing the function along individual
direction in the conjugate set.
Optimal solution
?k is the minimizer along the kth conjugate
direction

10
Example

Minimize the following function
Matrix A
Conjugate direction
Optimization
First direction, x1 x2x
Second direction, x1 - x2x
Solution x1 x21

11
How to Efficiently Find a Set of Conjugate
Directions

Iterative procedure
Given conjugate directions p1,p2,, pk-1
Set pk as follows
Theorem The direction generated in the above
step is conjugate to all previous directions
p1,p2,, pk-1, i.e.,
Note compute the k direction pk only requires
the previous direction pk-1

12
Nonlinear Conjugate Gradient

Even though conjugate gradient is derived for a
quadratic objective function, it can be applied
directly to other nonlinear functions
Guarantee convergence if the objective is
convex/concave
Variants
Fletcher-Reeves conjugate gradient (FR-CG)
Polak-Ribiere conjugate gradient (PR-CG)
More robust than FR-CG
Compared to Newton method
The first order method
Usually less efficient than Newton method
However, it is simple to implement

13
Empirical Study Learning Conditional
Exponential Model
Dataset Iterations Time (s)
Rule 142 1.93
Rule 81 1.13
Lex 281 21.72
Lex 176 20.02
Summary 537 31.66
Summary 69 8.52
Shallow 2813 16251.12
Shallow 421 2420.30
Dataset Instances Features
Rule 29,602 246
Lex 42,509 135,182
Summary 24,044 198,467
Shallow 8,625,782 264,142
Limited-memory Quasi-Newton method
Conjugate Gradient (PR)
14
Free Software