Model Characteristics And Practical Solvers In Nonlinear Programming - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Model Characteristics And Practical Solvers In Nonlinear Programming

Description:

We assume that all nonlinear functions are sufficiently smooth and differentiable. ... KNITRO: Byrd, Nocedal, Waltz, ... For Convex NLP: Mosek ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 56

Provided by: arnestolb

Category:

more less

Transcript and Presenter's Notes

Title: Model Characteristics And Practical Solvers In Nonlinear Programming

1
Model Characteristics And Practical Solvers
InNonlinear Programming

Arne Stolbjerg Drud

2
General NLP Model

Minimize f(x)
Subject to
g(x) b
lb x ub
Other forms can be useful for special cases, e.g.
inequality formulations for convex models.
We assume that all nonlinear functions are
sufficiently smooth and differentiable.
Notation n dim(x) and m dim(g)

3
Modeling Environment

Models are formulated in a modeling system such
as AMPL, AIMMS, GAMS, ...
Function values, first and second derivatives and
their sparsety patterns are easily available
(without user intervention).
Function and derivatives are fairly cheap.

4
Optimality Conditions

Lagrange Function
L(x,y) f(x) yT(g(x)-b)
Optimality Conditions
Primal g(x) b, lb x ub
Dual df/dx yTdg/dx - 0, -8 y 8
The orthogonality operator - is defined
componentwise as
if xi lbi then elseif xi ubi then
else

5
Perturbed Optimality Conditions

The Optimality Conditions are usually solved
using some perturbation method related to Newtons
Method.
Evaluate all nonlinear terms and derivatives at a
base point x0,y0 and solve for ?x,?y
Primal g dg/dx?x b
Dual df/dx d2f/dx2?x y0Td2g/dx2?x
y0Tdg/dx ?yTdg/dx - 0

6
The KKT Matrix
7
Sources of Difficulty

Combinatorics
The Combinatorial nature of the orthogonality
operator which bounds are active and which are
not and therefore which dual constraints are
active and which are not.
Nonlinearity
The Nonlinear nature of the objective and
constraints local properties will in general not
hold globally.

8
Nonlinearity
9
Handling Combinatorics

Interior Point Methods
Bounds are penalized using a log-barrier
function.
A large equality constrained problem is solved
for a series of penalty values.
Active Set Methods
The set of active inequalities bounds is guessed.
A small equality constrained sub-problem is
solved to find an improving point.
The guessed set of active inequalities is updated
iteratively.

10
Handling Nonlinearity

General Methods Iterate using Linear or
Quadratic Approximations
Special Methods Numerous methods tailored to
take advantage of available information, special
forms, etc, e.g.
Sum of Square models can use special
approximations (Gauss-Newton).
Models without bounds can avoid combinatorics.
Models without equality constraints can avoid
projections.
QPs are different from general NLPs

11
Interior Point Methods

Lagrange Function
L(x,y) f(x) yT(g(x)-b)
µ?i(log(xi-lbi)log(ubi-xi))
Optimality Conditions
Primal g(x) b, lb x ub
Dual df/dxi yTdg/dxi
µ((xi-lbi)-1-(ubi-xi )-1) 0, -8 y 8

12
Interior Point KKT Matrix

where Di µ((xi-lbi)-2(ubi-xi )-2) is a
diagonal matrix with non-negative entries.

13
Degree of Nonlinearity
Nonconvex NLP
Convex NLP
Convex QP
LP
14
Interior Point LP

Use diagonal matrix D as block-pivot and we get
the usual interior point matrix AD-1AT in y-space

15
Interior Point Convex QP
16
Interior Point Convex QP

A block-pivot on QD gives the reduced matrix M
A(QD)-1AT in y-space.
In practice Compute a Cholesky factorization of
QD LLT (QD is positive definite) and
compute M (A L-T) (A L-T)T.
The ordering for sparse Cholesky factorization of
both QD and M can be done once.
The density of M will depend on the off-diagonal
elements in Q

17
IP Separable Convex QP

If the QP is convex and separable, e.g. sum of
square models, then Q and (QD) are diagonal.
All numerical work is equivalent to the numerical
work in LP.
Convergence is as fast as for LP.

18
Interior Point Convex NLP
19
Interior Point Convex NLP

The usual block-pivot matrix HD is positive
definite.
A block-pivot on HD gives the reduced matrix M
J(HD)-1JT in y-space.
All work is exactly as in the convex QP case
(except for repeated evaluation of J and H).
All sparse ordering can be done once.
Same convergence properties as for LP.

20
Interior Point Nonconvex NLP
21
Nonconvex NLP Technical Difficulties

The block-pivot matrix HD is NOT necessarily
positive definite and a Cholesky factorization
may not exist.
The factorization of the indefinite KKT system
must be done as one system including both primal
and dual parts
The factorization result in an LD1LT
factorization where D1 is block-diagonal with 1
by 1 and 2 by 2 blocks.

22
Nonconvex NLP Mathematical Problems

The model may have several locally optimal
solutions.
The constraints may have infeasible points with
local minima for various measures of
infeasibility, even if the model is feasible.
The direction derived from the KKT system may not
be a descend direction.
No reason to expect fast convergence for interior
point methods.

23
More Technical Difficulties

Extra proximity-terms can be added to D to
make approximation model more convex and force
descend directions.
Interia-methods can be used to test if HD
projected on the constraints is positive
definite.
Changes in D require re-factorization.
Sparse ordering cannot be done a priori.
Re-factorization is expensive.

24
Software based on IP Methods

For General NLP
LOQO Shanno, Vanderbei, Benson, ...
IPOPT Weachter, Biegler.
KNITRO Byrd, Nocedal, Waltz, ...
For Convex NLP
Mosek
For Convex QP and Convex Quadratic Constraints
Cplex with barrier and QP option

25
IP Software LOQO

Adds a proximity term to the constraints giving
the KKT matrix an extra nonzero block in lower
right hand corner. Should improve stability.
Factorization uses only 1 by 1 pivots (positive
and negative).
User can select ordering as primal-first or
dual-first. Standard ordering methods are used
within each group.

26
IP Software IPOPT

Freely available via COIN-OR.
Based on 3rd party factorization software
(Harwell Subroutine Library).

27
IP Software KNITRO

Based on 3rd party factorization software
(Harwell Subroutine Library).
Commercially available from Ziena Optimization.
Available with AMPL, GAMS, and stand-alone.
Has alternative active set methods based on LP
methods for guessing active constraints and
equality constrained QPs for search directions.
Can use Conjugate Gradients instead of
factorization for solving KKT system.

28
Comparison of IP Software

LOCO, IPOPT, and KNITRO are mostly used with
AMPL.
Published comparisons based on the CUTE-R set of
models show solvers to be similar ratings
changes frequently.
Difficulties are related to
Slow Analyze/Factorize for the KKT matrix.
Newton steps are not good gt very many iterations
with short steps on some non-convex models.
Several CUTE-R models seem to be constructed to
hurt the competition.

29
Active Set Methods

The variables are partitioned into 3 sets
At Lower Bound (LB), At Upper Bound (UB), and
Between Bounds (BB).
The dual constraints are as a consequence
partitioned into
Binding as , Binding as , Binding as
Fixed variables and non-binding constraints are
temporarily ignored, a smaller model is used to
determine a search direction, and the sets are
updated.

30
Active Set Methods
31
Active Set Methods
32
Active Set Methods
33
Active Set Methods
34
Active Set Methods

The Active Set Optimality Conditions
Have Fewer Variables
Have Fewer Dual Constraints and only Equalities
Are Symmetric
A direction ?x, ?y is computed based on the
smaller system and a step is made.
The sets BB, LB, and UB are updated if necessary.

35
Active Set Strategies

Dual Equalities with Accurate or Approximate H?
Many active set methods date to a period when 2nd
derivatives were not easily available.
Reduced Hessians (see later) may be cheaper to
work with than complete Hessians.
How are the Active Set Optimality Conditions
solved?
Direct factorization or partitioning methods.
Updating or Refactorization.

36
Active Set Strategies

How is the step ?x, ?y used?
Minor iterations using the same linearized KKT
system.
Major iterations using accumulated ?x, ?y from
minor iterations.
Step at Major iterations use some Merit Function
(usually based on primal information only).
When are the sets BB, LB, and UB updated?
Variables must be moved from BB to LB or UB when
they reach a bound.
Variables can be moved from LB or UB to BB when
dual inequalities are violated. Number of
variables to move and test frequency vary.

37
Active Set Optimality Conditions

Direct Factorization
The Symmetric Indefinite system is factorized as
LDLT where D is block triangular with 1 by 1
and 2 by 2 blocks and L is lower triangular.
Updates when LB, BB, and UB changes.

38
Active Set Optimality Conditions

Partitioning
J is partitioned in B, a non-singular Basis, and
S Superbasis.
B and BT are used as Block Pivots.
The B-parts of H are eliminated.

39
Active Set Optimality Conditions

B-1S not formed explicitly.
The Reduced Hessian, RHSS, is complicated,
symmetric, and dense.
RHSS can be approximated using Quasi-Newton
methods.
Simplex-like updates used for B-1, projection
updates used for RHSS.

40
Software based on Active Sets

MINOS and SNOPT, Saunders, Gill, ...
CONOPT, Drud
Many others, especially based on dense linear
algebra for small-scale models

41
SNOPT

How are the Active Set Optimality Conditions
solved?
Partitioning Primal constraints are satisfied
accurately using Basis and Simplex technology.
Phase-1 procedure.
Overall Hessian is not used -- Reduced Hessian is
approximated using Quasi-Newton methods with
updates between Major iterations.
Reduced Hessian stored as dense lower triangular
matrix gt space can be a problem.
Minor/Major iterations.

42
SNOPT

How is the step ?x, ?y used?
Standard minor/major iteration approach.
Merit function needs accurate solution to primal
constraints to guarantie descend.
How are the sets BB, LB, and UB updated?
Standard use of Basis Superbasis.
B is updated as in Simplex when basis changes.
RH is projected when basis changes.
Diagonal added when RH is increased.

43
CONOPT

How are the Reduced Optimality Conditions solved?
Partitioning Primal constraints are satisfied
accurately using Basis and Simplex technology (as
SNOPT).
Special fast SLP phase with H 0 is used when
it is considered acceptable, i.e. bounds are
more important than curvature.
Uses either one Hessian evaluation per major
iteration or one Hessian-vector evaluation per
minor iteration. Selection based on
availability and expected work.

44
CONOPT

How are the Reduced Optimality Conditions solved
(continued)?
Reduced Hessian approximated using Quasi-Newton
methods (as SNOPT), but updated at appropriate
minor iterations.
Conjugate Gradients are used in the xs space when
Reduced Hessian becomes too large. Diagonal
preconditioning is used if available.
Dual equalities are solved using minor
iterations stop when expected change in
objective is good.

45
CONOPT

How is ?x, ?y used?
?x and ?y are accumulated over minor iterations
(as SNOPT).
?xS is used directly. ?xB is used as initial
point in a Newton process with B-1 to ensure
feasibility of the actual Primal constraints (as
opposed to the linearized Primal constraints).
Newton Process requires many primal function
evaluations.
Step is determined using actual objective
function (Primal feasibility means that Merit
function is not needed).

46
SNOPT
Merit function
Initial point
Tangent plane
Final Point
Constraint
47
CONOPT
Initial point
Tangent plane
Newton process restores feasibility
Constraint
True objective tested in linesearch
48
CONOPT

How are the sets BB, LB, and UB updated?
Similar to SNOPT but details are different.
More aggressive movement of variables from LB/UB
to BB, motivated by cheaper and more frequent
updates of Reduced Hessian (every minor
iteration).

49
Comparison SNOPT/CONOPT

Models with fairly linear constraints are good
for SNOPT.
CONOPTs SLP procedure good for Easy models.
The Newton process in CONOPT is costly but useful
for very nonlinear models gt CONOPT more
reliable.
Many superbasics cannot be handled by SNOPT
(space).
Large unpredictable variation in relative
solution time.

50
Comparison IP/Active Set

Active Set Methods usually good for models with
few degrees of freedom, Interior Point for models
with many degrees of freedom.
Relative solution times can vary by more than a
factor 100 both ways.
Our ability to predict which solver will be best
is fairly limited.

51
GAMS/Globallib Performance profile ignoring
objective value
52
GAMS/Globallib Performance profile - best
objective only
53
Limitations for NLP models

Models with over 100,000 variables equations
have been solved, but models with 10,000 can be
unsolvable.
Most models below 1000 varequ should be solvable
unless poorly formulated.
We can usually iterate or move on vary large
models (500,000 VarEqu), at least for Active Set
methods. Convergence is the problem.
We have no way to predict if a model can be
solved or not.

54
Personal Favorites

Do easy things with cheap algorithms.
Recognize when in trouble and switch algorithm.
Change from Active Set to Interior Point.
Easy use of Multi-Processor Run two different
solvers in parallel.

55
Conclusions

We can solve many fairly large NLP models.
Solution times are often over 10 times larger
than for comparable LPs.
Reliability is considerably lower than for LP.
Good model formulation is very important for
reliability and speed.
Large variation between solvers. No good
automatic solver selection.
Future Linear Algebra, Numerical Analysis,
Pre-processing, and better model formulation.