Nonlinear Optimization Methods

Options

Table 1 summarizes the options available in the NLO system.

Table 1: NLO Options

Option Description
Optimization Specifications
TECHNIQUE= Minimization technique
UPDATE= Update technique
LINESEARCH= Line-search method
LSPRECISION= Line-search precision
HESCAL= Type of Hessian scaling
INHESSIAN= Start for approximated Hessian
RESTART= Iteration number for update restart
Termination Criteria Specifications
MAXFUNC= Maximum number of function calls
MAXITER= Maximum number of iterations
MINITER= Minimum number of iterations
MAXTIME= Upper limit seconds of CPU time
ABSCONV= Absolute function convergence criterion
ABSFCONV= Absolute function convergence criterion
ABSGCONV= Absolute gradient convergence criterion
ABSXCONV= Absolute parameter convergence criterion
FCONV= Relative function convergence criterion
FCONV2= Relative function convergence criterion
GCONV= Relative gradient convergence criterion
XCONV= Relative parameter convergence criterion
FSIZE= Used in FCONV, GCONV criterion
XSIZE= Used in XCONV criterion
Step Length Options
DAMPSTEP= Damped steps in line search
MAXSTEP= Maximum trust region radius
INSTEP= Initial trust region radius
Printed Output Options
PALL Display (almost) all printed optimization-related output
PHISTORY Display optimization history
PHISTPARMS Display parameter estimates in each iteration
PSHORT Reduce some default optimization-related output
PSUMMARY Reduce most default optimization-related output
NOPRINT Suppress all printed optimization-related output


These options are described in alphabetical order.

ABSCONV=r
ABSTOL=r

specifies an absolute function convergence criterion. For minimization, termination requires f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis less-than-or-equal-to r. The default value of r is the negative square root of the largest double-precision value, which serves only as a protection against overflows.

ABSFCONV=r[n]
ABSFTOL=r[n]

specifies an absolute function convergence criterion. For all techniques except NMSIMP, termination requires a small change of the function value in successive iterations:

StartAbsoluteValue f left-parenthesis theta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis minus f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

The same formula is used for the NMSIMP technique, but theta Superscript left-parenthesis k right-parenthesis is defined as the vertex with the lowest function value, and theta Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex with the highest function value in the simplex. The default value is r=0. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated.

ABSGCONV=r[n]
ABSGTOL=r[n]

specifies an absolute gradient convergence criterion. Termination requires the maximum absolute gradient element to be small:

max Underscript j Endscripts StartAbsoluteValue g Subscript j Baseline left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

This criterion is not used by the NMSIMP technique. The default value is r equals 1 normal upper E minus 5. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated.

ABSXCONV=r[n]
ABSXTOL=r[n]

specifies an absolute parameter convergence criterion. For all techniques except NMSIMP, termination requires a small Euclidean distance between successive parameter vectors,

parallel-to theta Superscript left-parenthesis k right-parenthesis Baseline minus theta Superscript left-parenthesis k minus 1 right-parenthesis Baseline parallel-to less-than-or-equal-to sans-serif-italic r

For the NMSIMP technique, termination requires either a small length alpha Superscript left-parenthesis k right-parenthesis of the vertices of a restart simplex,

alpha Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

or a small simplex size,

delta Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

where the simplex size delta Superscript left-parenthesis k right-parenthesis is defined as the L1 distance from the simplex vertex xi Superscript left-parenthesis k right-parenthesis with the smallest function value to the other n simplex points theta Subscript l Superscript left-parenthesis k right-parenthesis Baseline not-equals xi Superscript left-parenthesis k right-parenthesis:

delta Superscript left-parenthesis k right-parenthesis Baseline equals sigma-summation Underscript theta Subscript l Baseline not-equals y Endscripts parallel-to theta Subscript l Superscript left-parenthesis k right-parenthesis Baseline minus xi Superscript left-parenthesis k right-parenthesis parallel-to

The default is r equals 1 normal upper E minus 8 for the NMSIMP technique and r=0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.

DAMPSTEP[=r]

specifies that the initial step length value alpha Superscript left-parenthesis 0 right-parenthesis for each line search (used by the QUANEW, HYQUAN, CONGRA, or NEWRAP technique) cannot be larger than r times the step length value used in the former iteration. If the DAMPSTEP option is specified but r is not specified, the default is r=2. The DAMPSTEP=r option can prevent the line-search algorithm from repeatedly stepping into regions where some objective functions are difficult to compute or where they could lead to floating point overflows during the computation of objective functions and their derivatives. The DAMPSTEP=r option can save time-costly function calls during the line searches of objective functions that result in very small steps.

FCONV=r[n]
FTOL=r[n]

specifies a relative function convergence criterion. For all techniques except NMSIMP, termination requires a small relative change of the function value in successive iterations,

StartFraction StartAbsoluteValue f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis theta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue Over max left-parenthesis StartAbsoluteValue f left-parenthesis theta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue comma FSIZE right-parenthesis EndFraction less-than-or-equal-to sans-serif-italic r

where FSIZE is defined by the FSIZE= option. The same formula is used for the NMSIMP technique, but theta Superscript left-parenthesis k right-parenthesis is defined as the vertex with the lowest function value, and theta Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex with the highest function value in the simplex. The default value may depend on the procedure. In most cases, you can use the PALL option to find it.

FCONV2=r[n]
FTOL2=r[n]

specifies another function convergence criterion.

For all techniques except NMSIMP, termination requires a small predicted reduction

d f Superscript left-parenthesis k right-parenthesis Baseline almost-equals f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline plus s Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis

of the objective function. The predicted reduction

StartLayout 1st Row 1st Column d f Superscript left-parenthesis k right-parenthesis 2nd Column equals 3rd Column minus g Superscript left-parenthesis k right-parenthesis upper T Baseline s Superscript left-parenthesis k right-parenthesis minus one-half s Superscript left-parenthesis k right-parenthesis upper T Baseline upper H Superscript left-parenthesis k right-parenthesis Baseline s Superscript left-parenthesis k right-parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column minus one-half s Superscript left-parenthesis k right-parenthesis upper T Baseline g Superscript left-parenthesis k right-parenthesis 3rd Row 1st Column Blank 2nd Column less-than-or-equal-to 3rd Column sans-serif-italic r EndLayout

is computed by approximating the objective function f by the first two terms of the Taylor series and substituting the Newton step

s Superscript left-parenthesis k right-parenthesis Baseline equals minus left-bracket upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline g Superscript left-parenthesis k right-parenthesis

For the NMSIMP technique, termination requires a small standard deviation of the function values of the n plus 1 simplex vertices theta Subscript l Superscript left-parenthesis k right-parenthesis, l equals 0 comma ellipsis comma n, StartRoot StartFraction 1 Over n plus 1 EndFraction sigma-summation Underscript l Endscripts left-bracket f left-parenthesis theta Subscript l Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus ModifyingAbove f With bar left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis right-bracket squared EndRoot less-than-or-equal-to sans-serif-italic r, where ModifyingAbove f With bar left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis equals StartFraction 1 Over n plus 1 EndFraction sigma-summation Underscript l Endscripts f left-parenthesis theta Subscript l Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis. If there are n Subscript a c t boundary constraints active at theta Superscript left-parenthesis k right-parenthesis, the mean and standard deviation are computed only for the n plus 1 minus n Subscript a c t unconstrained vertices.

The default value is r equals 1 normal upper E minus 6 for the NMSIMP technique and r=0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.

FSIZE=r

specifies the FSIZE parameter of the relative function and relative gradient termination criteria. The default value is r=0. For more information, see the FCONV= and GCONV= options.

GCONV=r[n]
GTOL=r[n]

specifies a relative gradient convergence criterion. For all techniques except CONGRA and NMSIMP, termination requires that the normalized predicted function reduction be small,

StartFraction g left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis Superscript upper T Baseline left-bracket upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline g left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis Over max left-parenthesis StartAbsoluteValue f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue comma FSIZE right-parenthesis EndFraction less-than-or-equal-to sans-serif-italic r

where FSIZE is defined by the FSIZE= option. For the CONGRA technique (where a reliable Hessian estimate H is not available), the following criterion is used:

StartFraction parallel-to g left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Superscript 2 Baseline parallel-to s left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Over parallel-to g left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus g left-parenthesis theta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Baseline max left-parenthesis StartAbsoluteValue f left-parenthesis theta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue comma FSIZE right-parenthesis EndFraction less-than-or-equal-to sans-serif-italic r

This criterion is not used by the NMSIMP technique. The default value is r equals 1 normal upper E minus 8. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.

HESCAL=0 StartAbsoluteValue 1 EndAbsoluteValue 2 vertical-bar 3
HS=0 StartAbsoluteValue 1 EndAbsoluteValue 2 vertical-bar 3

specifies the scaling version of the Hessian matrix used in NRRIDG, TRUREG, NEWRAP, or DBLDOG optimization.

If HS is not equal to 0, the first iteration and each restart iteration sets the diagonal scaling matrix upper D Superscript left-parenthesis 0 right-parenthesis Baseline equals normal d normal i normal a normal g left-parenthesis d Subscript i Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis,

d Subscript i Superscript left-parenthesis 0 right-parenthesis Baseline equals StartRoot max left-parenthesis StartAbsoluteValue upper H Subscript i comma i Superscript left-parenthesis 0 right-parenthesis Baseline EndAbsoluteValue comma epsilon right-parenthesis EndRoot

where upper H Subscript i comma i Superscript left-parenthesis 0 right-parenthesis are the diagonal elements of the Hessian. In every other iteration, the diagonal scaling matrix upper D Superscript left-parenthesis 0 right-parenthesis Baseline equals normal d normal i normal a normal g left-parenthesis d Subscript i Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis is updated depending on the HS option:

HS=0

specifies that no scaling is done.

HS=1

specifies the Moré (1978) scaling update:

d Subscript i Superscript left-parenthesis k plus 1 right-parenthesis Baseline equals max left-bracket d Subscript i Superscript left-parenthesis k right-parenthesis Baseline comma StartRoot max left-parenthesis StartAbsoluteValue upper H Subscript i comma i Superscript left-parenthesis k right-parenthesis Baseline EndAbsoluteValue comma epsilon right-parenthesis EndRoot right-bracket
HS=2

specifies the Dennis, Gay, and Welsch (1981) scaling update:

d Subscript i Superscript left-parenthesis k plus 1 right-parenthesis Baseline equals max left-bracket 0.6 asterisk d Subscript i Superscript left-parenthesis k right-parenthesis Baseline comma StartRoot max left-parenthesis StartAbsoluteValue upper H Subscript i comma i Superscript left-parenthesis k right-parenthesis Baseline EndAbsoluteValue comma epsilon right-parenthesis EndRoot right-bracket
HS=3

specifies that d Subscript i is reset in each iteration:

d Subscript i Superscript left-parenthesis k plus 1 right-parenthesis Baseline equals StartRoot max left-parenthesis StartAbsoluteValue upper H Subscript i comma i Superscript left-parenthesis k right-parenthesis Baseline EndAbsoluteValue comma epsilon right-parenthesis EndRoot

In each scaling update, epsilon is the relative machine precision. The default value is HS=0. Scaling of the Hessian can be time-consuming in the case where general linear constraints are active.

INHESSIAN[=r]
INHESS[=r]

specifies how the initial estimate of the approximate Hessian is defined for the quasi-Newton techniques QUANEW and DBLDOG. There are two alternatives:

  • If you do not use the r specification, the initial estimate of the approximate Hessian is set to the Hessian at theta Superscript left-parenthesis 0 right-parenthesis.

  • If you do use the r specification, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix r upper I.

By default, if you do not specify the option INHESSIAN=r, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix r upper I, where the scalar r is computed from the magnitude of the initial gradient.

INSTEP=r

reduces the length of the first trial step during the line search of the first iterations. For highly nonlinear objective functions, such as the EXP function, the default initial radius of the trust-region algorithm TRUREG or DBLDOG or the default step length of the line-search algorithms can result in arithmetic overflows. If this occurs, you should specify decreasing values of 0 less-than sans-serif-italic r less-than 1 such as INSTEP=1E–1, INSTEP=1E–2, INSTEP=1E–4, and so on, until the iteration starts successfully.

  • For trust-region algorithms (TRUREG, DBLDOG), the INSTEP= option specifies a factor sans-serif-italic r greater-than 0 for the initial radius normal upper Delta Superscript left-parenthesis 0 right-parenthesis of the trust region. The default initial trust-region radius is the length of the scaled gradient. This step corresponds to the default radius factor of sans-serif-italic r equals 1.

  • For line-search algorithms (NEWRAP, CONGRA, QUANEW), the INSTEP= option specifies an upper bound for the initial step length for the line search during the first five iterations. The default initial step length is sans-serif-italic r equals 1.

  • For the Nelder-Mead simplex algorithm, using TECH=NMSIMP, the INSTEP=r option defines the size of the start simplex.

LINESEARCH=i
LIS=i

specifies the line-search method for the CONGRA, QUANEW, and NEWRAP optimization techniques. For an introduction to line-search techniques, see Fletcher (1987). The value of i can be 1 comma ellipsis comma 8. For CONGRA, QUANEW and NEWRAP, the default value is sans-serif-italic i equals 2.

LIS=1

specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is similar to one used by the Harwell subroutine library.

LIS=2

specifies a line-search method that needs more function than gradient calls for quadratic and cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option.

LIS=3

specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option.

LIS=4

specifies a line-search method that needs the same number of function and gradient calls for stepwise extrapolation and cubic interpolation.

LIS=5

specifies a line-search method that is a modified version of LIS=4.

LIS=6

specifies golden section line search (Polak 1971), which uses only function values for linear approximation.

LIS=7

specifies bisection line search (Polak 1971), which uses only function values for linear approximation.

LIS=8

specifies the Armijo line-search technique (Polak 1971), which uses only function values for linear approximation.

LSPRECISION=r
LSP=r

specifies the degree of accuracy that should be obtained by the line-search algorithms LIS=2 and LIS=3. Usually an imprecise line search is inexpensive and successful. For more difficult optimization problems, a more precise and expensive line search may be necessary (Fletcher 1987). The second line-search method (which is the default for the NEWRAP, QUANEW, and CONGRA techniques) and the third line-search method approach exact line search for small LSPRECISION= values. If you have numerical problems, you should try to decrease the LSPRECISION= value to obtain a more precise line search. The default values are shown in Table 2.

Table 2: Line Search Precision Defaults

TECH= UPDATE= LSP Default
QUANEW DBFGS, BFGS r = 0.4
QUANEW DDFP, DFP r = 0.06
CONGRA All r = 0.1
NEWRAP No update r = 0.9


For more information, see Fletcher (1987).

MAXFUNC=i
MAXFU=i

specifies the maximum number i of function calls in the optimization process. The default values are

  • TRUREG, NRRIDG, NEWRAP: 125

  • QUANEW, DBLDOG: 500

  • CONGRA: 1000

  • NMSIMP: 3000

Note that the optimization can terminate only after completing a full iteration. Therefore, the number of function calls that is actually performed can exceed the number that is specified by the MAXFUNC= option.

MAXITER=i
MAXIT=i

specifies the maximum number i of iterations in the optimization process. The default values are

  • TRUREG, NRRIDG, NEWRAP: 50

  • QUANEW, DBLDOG: 200

  • CONGRA: 400

  • NMSIMP: 1000

These default values are also valid when i is specified as a missing value.

MAXSTEP=r[n]

specifies an upper bound for the step length of the line-search algorithms during the first n iterations. By default, r is the largest double-precision value and n is the largest integer available. Setting this option can improve the speed of convergence for the CONGRA, QUANEW, and NEWRAP techniques.

MAXTIME=r

specifies an upper limit of r seconds of CPU time for the optimization process. The default value is the largest floating-point double representation of your computer. Note that the time specified by the MAXTIME= option is checked only once at the end of each iteration. Therefore, the actual running time can be much longer than that specified by the MAXTIME= option. The actual running time includes the rest of the time needed to finish the iteration and the time needed to generate the output of the results.

MINITER=i
MINIT=i

specifies the minimum number of iterations. The default value is 0. If you request more iterations than are actually needed for convergence to a stationary point, the optimization algorithms can behave strangely. For example, the effect of rounding errors can prevent the algorithm from continuing for the required number of iterations.

NOPRINT

suppresses the output. (See procedure documentation for availability of this option.)

PALL

displays all optional output for optimization. (See procedure documentation for availability of this option.)

PHISTORY

displays the optimization history. (See procedure documentation for availability of this option.)

PHISTPARMS

display parameter estimates in each iteration. (See procedure documentation for availability of this option.)

PINIT

displays the initial values and derivatives (if available). (See procedure documentation for availability of this option.)

PSHORT

restricts the amount of default output. (See procedure documentation for availability of this option.)

PSUMMARY

restricts the amount of default displayed output to a short form of iteration history and notes, warnings, and errors. (See procedure documentation for availability of this option.)

RESTART=i greater-than 0
REST=i greater-than 0

specifies that the QUANEW or CONGRA algorithm is restarted with a steepest descent/ascent search direction after, at most, i iterations. Default values are as follows:

  • CONGRA UPDATE=PB: restart is performed automatically, i is not used.

  • CONGRA UPDATEnot-equalsPB: i equals min left-parenthesis 10 n comma 80 right-parenthesis, where n is the number of parameters.

  • QUANEW i is the largest integer available.

TECHNIQUE=value
TECH=value

specifies the optimization technique. Valid values are as follows:

CONGRA

performs a conjugate-gradient optimization, which can be more precisely specified with the UPDATE= option and modified with the LINESEARCH= option. When you specify this option, UPDATE=PB by default.

DBLDOG

performs a version of double-dogleg optimization, which can be more precisely specified with the UPDATE= option. When you specify this option, UPDATE=DBFGS by default.

NMSIMP

performs a Nelder-Mead simplex optimization.

NONE

does not perform any optimization. This option can be used as follows:

  • to perform a grid search without optimization

  • to compute estimates and predictions that cannot be obtained efficiently with any of the optimization techniques

NEWRAP

performs a Newton-Raphson optimization that combines a line-search algorithm with ridging. The line-search algorithm LIS=2 is the default method.

NRRIDG

performs a Newton-Raphson optimization with ridging.

QUANEW

performs a quasi-Newton optimization, which can be defined more precisely with the UPDATE= option and modified with the LINESEARCH= option. This is the default estimation method.

TRUREG

performs a trust region optimization.

UPDATE=method
UPD=method

specifies the update method for the QUANEW, DBLDOG, or CONGRA optimization technique. Not every update method can be used with each optimizer.

Valid methods are as follows:

BFGS

performs the original Broyden, Fletcher, Goldfarb, and Shanno (BFGS) update of the inverse Hessian matrix.

DBFGS

performs the dual BFGS update of the Cholesky factor of the Hessian matrix. This is the default update method.

DDFP

performs the dual Davidon, Fletcher, and Powell (DFP) update of the Cholesky factor of the Hessian matrix.

DFP

performs the original DFP update of the inverse Hessian matrix.

PB

performs the automatic restart update method of Powell (1977); Beale (1972).

FR

performs the Fletcher-Reeves update (Fletcher 1987).

PR

performs the Polak-Ribiere update (Fletcher 1987).

CD

performs a conjugate-descent update of Fletcher (1987).

XCONV=r[n]
XTOL=r[n]

specifies the relative parameter convergence criterion. For all techniques except NMSIMP, termination requires a small relative parameter change in subsequent iterations.

StartFraction max Underscript j Endscripts StartAbsoluteValue theta Subscript j Superscript left-parenthesis k right-parenthesis Baseline minus theta Subscript j Superscript left-parenthesis k minus 1 right-parenthesis Baseline EndAbsoluteValue Over max left-parenthesis StartAbsoluteValue theta Subscript j Superscript left-parenthesis k right-parenthesis Baseline EndAbsoluteValue comma StartAbsoluteValue theta Subscript j Superscript left-parenthesis k minus 1 right-parenthesis Baseline EndAbsoluteValue comma XSIZE right-parenthesis EndFraction less-than-or-equal-to sans-serif-italic r

For the NMSIMP technique, the same formula is used, but theta Subscript j Superscript left-parenthesis k right-parenthesis is defined as the vertex with the lowest function value and theta Subscript j Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex with the highest function value in the simplex. The default value is sans-serif-italic r equals 1 normal upper E minus 8 for the NMSIMP technique and sans-serif-italic r equals 0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated.

XSIZE=r greater-than 0

specifies the XSIZE parameter of the relative parameter termination criterion. The default value is sans-serif-italic r equals 0. For more information, see the XCONV= option.

Last updated: June 19, 2025