Title: Model Characteristics And Practical Solvers In Nonlinear Programming
1Model Characteristics And Practical Solvers
InNonlinear Programming
2General NLP Model
- Minimize f(x)
- Subject to
- g(x) b
- lb x ub
- Other forms can be useful for special cases, e.g.
inequality formulations for convex models. - We assume that all nonlinear functions are
sufficiently smooth and differentiable. - Notation n dim(x) and m dim(g)
3Modeling Environment
- Models are formulated in a modeling system such
as AMPL, AIMMS, GAMS, ... - Function values, first and second derivatives and
their sparsety patterns are easily available
(without user intervention). - Function and derivatives are fairly cheap.
4Optimality Conditions
- Lagrange Function
- L(x,y) f(x) yT(g(x)-b)
- Optimality Conditions
- Primal g(x) b, lb x ub
- Dual df/dx yTdg/dx - 0, -8 y 8
- The orthogonality operator - is defined
componentwise as - if xi lbi then elseif xi ubi then
else
5Perturbed Optimality Conditions
- The Optimality Conditions are usually solved
using some perturbation method related to Newtons
Method. - Evaluate all nonlinear terms and derivatives at a
base point x0,y0 and solve for ?x,?y - Primal g dg/dx?x b
- Dual df/dx d2f/dx2?x y0Td2g/dx2?x
y0Tdg/dx ?yTdg/dx - 0
6The KKT Matrix
7Sources of Difficulty
- Combinatorics
- The Combinatorial nature of the orthogonality
operator which bounds are active and which are
not and therefore which dual constraints are
active and which are not. - Nonlinearity
- The Nonlinear nature of the objective and
constraints local properties will in general not
hold globally.
8Nonlinearity
9Handling Combinatorics
- Interior Point Methods
- Bounds are penalized using a log-barrier
function. - A large equality constrained problem is solved
for a series of penalty values. - Active Set Methods
- The set of active inequalities bounds is guessed.
- A small equality constrained sub-problem is
solved to find an improving point. - The guessed set of active inequalities is updated
iteratively.
10Handling Nonlinearity
- General Methods Iterate using Linear or
Quadratic Approximations - Special Methods Numerous methods tailored to
take advantage of available information, special
forms, etc, e.g. - Sum of Square models can use special
approximations (Gauss-Newton). - Models without bounds can avoid combinatorics.
- Models without equality constraints can avoid
projections. - QPs are different from general NLPs
11Interior Point Methods
- Lagrange Function
- L(x,y) f(x) yT(g(x)-b)
- µ?i(log(xi-lbi)log(ubi-xi))
- Optimality Conditions
- Primal g(x) b, lb x ub
- Dual df/dxi yTdg/dxi
- µ((xi-lbi)-1-(ubi-xi )-1) 0, -8 y 8
12Interior Point KKT Matrix
- where Di µ((xi-lbi)-2(ubi-xi )-2) is a
diagonal matrix with non-negative entries.
13Degree of Nonlinearity
Nonconvex NLP
Convex NLP
Convex QP
LP
14Interior Point LP
- Use diagonal matrix D as block-pivot and we get
the usual interior point matrix AD-1AT in y-space
15Interior Point Convex QP
16Interior Point Convex QP
- A block-pivot on QD gives the reduced matrix M
A(QD)-1AT in y-space. - In practice Compute a Cholesky factorization of
QD LLT (QD is positive definite) and
compute M (A L-T) (A L-T)T. - The ordering for sparse Cholesky factorization of
both QD and M can be done once. - The density of M will depend on the off-diagonal
elements in Q
17IP Separable Convex QP
- If the QP is convex and separable, e.g. sum of
square models, then Q and (QD) are diagonal. - All numerical work is equivalent to the numerical
work in LP. - Convergence is as fast as for LP.
18Interior Point Convex NLP
19Interior Point Convex NLP
- The usual block-pivot matrix HD is positive
definite. - A block-pivot on HD gives the reduced matrix M
J(HD)-1JT in y-space. - All work is exactly as in the convex QP case
(except for repeated evaluation of J and H). - All sparse ordering can be done once.
- Same convergence properties as for LP.
20Interior Point Nonconvex NLP
21Nonconvex NLP Technical Difficulties
- The block-pivot matrix HD is NOT necessarily
positive definite and a Cholesky factorization
may not exist. - The factorization of the indefinite KKT system
must be done as one system including both primal
and dual parts - The factorization result in an LD1LT
factorization where D1 is block-diagonal with 1
by 1 and 2 by 2 blocks.
22Nonconvex NLP Mathematical Problems
- The model may have several locally optimal
solutions. - The constraints may have infeasible points with
local minima for various measures of
infeasibility, even if the model is feasible. - The direction derived from the KKT system may not
be a descend direction. - No reason to expect fast convergence for interior
point methods.
23More Technical Difficulties
- Extra proximity-terms can be added to D to
make approximation model more convex and force
descend directions. - Interia-methods can be used to test if HD
projected on the constraints is positive
definite. - Changes in D require re-factorization.
- Sparse ordering cannot be done a priori.
Re-factorization is expensive.
24Software based on IP Methods
- For General NLP
- LOQO Shanno, Vanderbei, Benson, ...
- IPOPT Weachter, Biegler.
- KNITRO Byrd, Nocedal, Waltz, ...
- For Convex NLP
- Mosek
- For Convex QP and Convex Quadratic Constraints
- Cplex with barrier and QP option
25IP Software LOQO
- Adds a proximity term to the constraints giving
the KKT matrix an extra nonzero block in lower
right hand corner. Should improve stability. - Factorization uses only 1 by 1 pivots (positive
and negative). - User can select ordering as primal-first or
dual-first. Standard ordering methods are used
within each group.
26IP Software IPOPT
- Freely available via COIN-OR.
- Based on 3rd party factorization software
(Harwell Subroutine Library).
27IP Software KNITRO
- Based on 3rd party factorization software
(Harwell Subroutine Library). - Commercially available from Ziena Optimization.
- Available with AMPL, GAMS, and stand-alone.
- Has alternative active set methods based on LP
methods for guessing active constraints and
equality constrained QPs for search directions. - Can use Conjugate Gradients instead of
factorization for solving KKT system.
28Comparison of IP Software
- LOCO, IPOPT, and KNITRO are mostly used with
AMPL. - Published comparisons based on the CUTE-R set of
models show solvers to be similar ratings
changes frequently. - Difficulties are related to
- Slow Analyze/Factorize for the KKT matrix.
- Newton steps are not good gt very many iterations
with short steps on some non-convex models. - Several CUTE-R models seem to be constructed to
hurt the competition.
29Active Set Methods
- The variables are partitioned into 3 sets
- At Lower Bound (LB), At Upper Bound (UB), and
Between Bounds (BB). - The dual constraints are as a consequence
partitioned into - Binding as , Binding as , Binding as
- Fixed variables and non-binding constraints are
temporarily ignored, a smaller model is used to
determine a search direction, and the sets are
updated.
30Active Set Methods
31Active Set Methods
32Active Set Methods
33Active Set Methods
34Active Set Methods
- The Active Set Optimality Conditions
- Have Fewer Variables
- Have Fewer Dual Constraints and only Equalities
- Are Symmetric
- A direction ?x, ?y is computed based on the
smaller system and a step is made. - The sets BB, LB, and UB are updated if necessary.
35Active Set Strategies
- Dual Equalities with Accurate or Approximate H?
- Many active set methods date to a period when 2nd
derivatives were not easily available. - Reduced Hessians (see later) may be cheaper to
work with than complete Hessians. - How are the Active Set Optimality Conditions
solved? - Direct factorization or partitioning methods.
- Updating or Refactorization.
36Active Set Strategies
- How is the step ?x, ?y used?
- Minor iterations using the same linearized KKT
system. - Major iterations using accumulated ?x, ?y from
minor iterations. - Step at Major iterations use some Merit Function
(usually based on primal information only). - When are the sets BB, LB, and UB updated?
- Variables must be moved from BB to LB or UB when
they reach a bound. - Variables can be moved from LB or UB to BB when
dual inequalities are violated. Number of
variables to move and test frequency vary.
37Active Set Optimality Conditions
- Direct Factorization
- The Symmetric Indefinite system is factorized as
LDLT where D is block triangular with 1 by 1
and 2 by 2 blocks and L is lower triangular. - Updates when LB, BB, and UB changes.
38Active Set Optimality Conditions
- Partitioning
- J is partitioned in B, a non-singular Basis, and
S Superbasis. - B and BT are used as Block Pivots.
- The B-parts of H are eliminated.
39Active Set Optimality Conditions
- B-1S not formed explicitly.
- The Reduced Hessian, RHSS, is complicated,
symmetric, and dense. - RHSS can be approximated using Quasi-Newton
methods. - Simplex-like updates used for B-1, projection
updates used for RHSS.
40Software based on Active Sets
- MINOS and SNOPT, Saunders, Gill, ...
- CONOPT, Drud
- Many others, especially based on dense linear
algebra for small-scale models
41SNOPT
- How are the Active Set Optimality Conditions
solved? - Partitioning Primal constraints are satisfied
accurately using Basis and Simplex technology.
Phase-1 procedure. - Overall Hessian is not used -- Reduced Hessian is
approximated using Quasi-Newton methods with
updates between Major iterations. - Reduced Hessian stored as dense lower triangular
matrix gt space can be a problem. - Minor/Major iterations.
42SNOPT
- How is the step ?x, ?y used?
- Standard minor/major iteration approach.
- Merit function needs accurate solution to primal
constraints to guarantie descend. - How are the sets BB, LB, and UB updated?
- Standard use of Basis Superbasis.
- B is updated as in Simplex when basis changes.
- RH is projected when basis changes.
- Diagonal added when RH is increased.
43CONOPT
- How are the Reduced Optimality Conditions solved?
- Partitioning Primal constraints are satisfied
accurately using Basis and Simplex technology (as
SNOPT). - Special fast SLP phase with H 0 is used when
it is considered acceptable, i.e. bounds are
more important than curvature. - Uses either one Hessian evaluation per major
iteration or one Hessian-vector evaluation per
minor iteration. Selection based on
availability and expected work.
44CONOPT
- How are the Reduced Optimality Conditions solved
(continued)? - Reduced Hessian approximated using Quasi-Newton
methods (as SNOPT), but updated at appropriate
minor iterations. - Conjugate Gradients are used in the xs space when
Reduced Hessian becomes too large. Diagonal
preconditioning is used if available. - Dual equalities are solved using minor
iterations stop when expected change in
objective is good.
45CONOPT
- How is ?x, ?y used?
- ?x and ?y are accumulated over minor iterations
(as SNOPT). - ?xS is used directly. ?xB is used as initial
point in a Newton process with B-1 to ensure
feasibility of the actual Primal constraints (as
opposed to the linearized Primal constraints). - Newton Process requires many primal function
evaluations. - Step is determined using actual objective
function (Primal feasibility means that Merit
function is not needed).
46SNOPT
Merit function
Initial point
Tangent plane
Final Point
Constraint
47CONOPT
Initial point
Tangent plane
Newton process restores feasibility
Constraint
True objective tested in linesearch
48CONOPT
- How are the sets BB, LB, and UB updated?
- Similar to SNOPT but details are different.
- More aggressive movement of variables from LB/UB
to BB, motivated by cheaper and more frequent
updates of Reduced Hessian (every minor
iteration).
49Comparison SNOPT/CONOPT
- Models with fairly linear constraints are good
for SNOPT. - CONOPTs SLP procedure good for Easy models.
- The Newton process in CONOPT is costly but useful
for very nonlinear models gt CONOPT more
reliable. - Many superbasics cannot be handled by SNOPT
(space). - Large unpredictable variation in relative
solution time.
50Comparison IP/Active Set
- Active Set Methods usually good for models with
few degrees of freedom, Interior Point for models
with many degrees of freedom. - Relative solution times can vary by more than a
factor 100 both ways. - Our ability to predict which solver will be best
is fairly limited.
51GAMS/Globallib Performance profile ignoring
objective value
52GAMS/Globallib Performance profile - best
objective only
53Limitations for NLP models
- Models with over 100,000 variables equations
have been solved, but models with 10,000 can be
unsolvable. - Most models below 1000 varequ should be solvable
unless poorly formulated. - We can usually iterate or move on vary large
models (500,000 VarEqu), at least for Active Set
methods. Convergence is the problem. - We have no way to predict if a model can be
solved or not.
54Personal Favorites
- Do easy things with cheap algorithms.
- Recognize when in trouble and switch algorithm.
- Change from Active Set to Interior Point.
- Easy use of Multi-Processor Run two different
solvers in parallel.
55Conclusions
- We can solve many fairly large NLP models.
- Solution times are often over 10 times larger
than for comparable LPs. - Reliability is considerably lower than for LP.
- Good model formulation is very important for
reliability and speed. - Large variation between solvers. No good
automatic solver selection. - Future Linear Algebra, Numerical Analysis,
Pre-processing, and better model formulation.