Shared Concepts

Optimization Parameters

This section applies to actions in the following action sets: gam, mixed, nonlinear, phreg, and regression.

This section describes parameters that are typically available for the actions in this book that perform optimizations.

The following notation is used to describe the subparameters. denotes the vector of parameters for the optimization and is its ith element. The objective function being minimized, its gradient vector, and its Hessian matrix are denoted as , , and , respectively. The gradient with respect to the ith parameter is denoted as . Superscripts in parentheses denote the iteration count; for example, is the value of the objective function at iteration k.

absConv=r absTol=r

specifies an absolute function convergence criterion. For minimization, termination requires r, where is the vector of parameters in the optimization and is the objective function. The default value of r is the negative square root of the largest double-precision value, which serves only as a protection against overflows.

absFconv=r absFtol=r

specifies an absolute function difference convergence criterion. For all techniques except NMSIMP, termination requires a small change of the function value in successive iterations:

StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

Here, is the vector of parameters in the optimization and is the objective function. The same formula is used for the NMSIMP technique, but is defined as the vertex that has the lowest function value and is defined as the vertex that has the highest function value in the simplex.

absFconvN=n absFtolN=n

specifies the number of successive iterations for which the absFconv subparameter criterion must be satisfied before the process can be terminated. By default, absFconvN=0. The only SAS Viya action that supports this method is the nlmod action.

absGconv=r absGtol=r

specifies an absolute gradient convergence criterion. Termination requires the maximum absolute gradient element to be small:

max Underscript j Endscripts StartAbsoluteValue g Subscript j Baseline left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

Here, is the vector of parameters in the optimization and is the gradient of the objective function with respect to the jth parameter. This criterion is not used by the NMSIMP technique. By default, absGconv=1E–5.

absGconvN=n absGtolN=n

specifies the number of successive iterations for which the absGconv criterion must be satisfied before the process can be terminated. By default, absGconvN=0. The only SAS Viya action that supports this method is the nlmod action.

absXconv=r absXtol=r

specifies an absolute parameter convergence criterion: For all techniques except NMSIMP, termination requires a small Euclidean distance between successive parameter vectors,

parallel-to bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline minus bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline parallel-to less-than-or-equal-to sans-serif-italic r

For the NMSIMP technique, termination requires either a small length of the vertices of a restart simplex,

alpha Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

or a small simplex size,

delta Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

where the simplex size is defined as the L1 distance from the simplex vertex that has the smallest function value to the other p simplex points :

delta Superscript left-parenthesis k right-parenthesis Baseline equals sigma-summation Underscript bold-italic beta Subscript l Baseline not-equals y Endscripts parallel-to bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline minus bold-italic xi Superscript left-parenthesis k right-parenthesis parallel-to

The default is r = 1E–8 for the NMSIMP technique and r = 0 otherwise.

corrections=m

specifies the number of the stored quasi-Newton update histories, which is referred to as the number of corrections, for the LBFGS technique. The m value is usually set as small as 3. In general, larger values improve convergence speed and solution quality for the LBFGS technique. However, for many problems, after reaching a certain threshold, the improvement can stall and the memory usage and solver’s per-iteration computation cost can increase. In practice, you can set high values for small or medium problems and small values for large problems. By default, corrections=20.

fConv=r fTol=r

specifies a relative function difference convergence criterion. For all techniques except NMSIMP, termination requires a small relative change of the function value in successive iterations,

StartFraction StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue Over StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

Here, denotes the vector of parameters that participate in the optimization, and is the objective function. The same formula is used for the NMSIMP technique, but is defined as the vertex that has the lowest function value and is defined as the vertex that has the highest function value in the simplex.

The default value is r= where is the machine precision, which is the smallest double-precision floating-point number such that .

fConvN=r fTolN=r

specifies the number of successive iterations for which the fConv subparameter criterion must be satisfied before the process can terminate. By default, fConvN=0. The only SAS Viya action that supports this method is the nlmod action.

fConv2=r fTol2=r

specifies a second function convergence criterion. For all techniques except NMSIMP, termination requires a small predicted reduction of the objective function:

d f Superscript left-parenthesis k right-parenthesis Baseline almost-equals f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline plus bold s Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis

The predicted reduction

is computed by approximating the objective function f by the first two terms of the Taylor series and substituting the Newton step,

bold s Superscript left-parenthesis k right-parenthesis Baseline equals minus left-bracket bold upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline bold g Superscript left-parenthesis k right-parenthesis

For the NMSIMP technique, termination requires a small standard deviation of the function values of the simplex vertices , ,

StartRoot StartFraction 1 Over n plus 1 EndFraction sigma-summation Underscript l Endscripts left-bracket f left-parenthesis bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus ModifyingAbove f With bar left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis right-bracket squared EndRoot less-than-or-equal-to sans-serif-italic r

where . If there are boundary constraints active at , the mean and standard deviation are computed only for the unconstrained vertices.

The default value is r = 1E–6 for the NMSIMP technique and r = 0 otherwise.

gConv=r gTol=r

specifies a relative gradient convergence criterion. For all techniques except CONGRA and NMSIMP, termination requires that the normalized predicted function reduction be small:

StartFraction bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis prime left-bracket bold upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis Over StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

Here, denotes the vector of parameters that participate in the optimization, is the objective function, and is the gradient. For the CONGRA technique (where a reliable Hessian estimate is not available), the following criterion is used:

StartFraction parallel-to bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Superscript 2 Baseline parallel-to bold s left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Over parallel-to bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus bold g left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Baseline StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

This criterion is not used by the NMSIMP technique. By default, gConv=1E–8.

gConvN=n gTolN=n

specifies the number of successive iterations for which the gConv subparameter criterion must be satisfied before the process can terminate. The only SAS Viya action that supports this method is the nlmod action. By default, gConvN=0.

gConv2=r gTol2=r

specifies another relative gradient convergence criterion. For the TRUREG, LEVMAR, NRRIDG, and NEWRAP techniques, the following criterion of Browne (1982) is used:

max Underscript j Endscripts StartFraction StartAbsoluteValue bold g Subscript j Baseline left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue Over StartRoot f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis bold upper H Subscript j comma j Superscript left-parenthesis k right-parenthesis Baseline EndRoot EndFraction less-than-or-equal-to sans-serif-italic r

This criterion is not used by the other techniques.

By default, gConv2=0.

maxFunc=n

specifies the maximum number n of function calls in the optimization process. The default values are as follows, depending on the optimization technique:

TRUREG, NRRIDG, and NEWRAP: 125
QUANEW and DBLDOG: 500
CONGRA: 1,000
LBFGS: 2,000
NMSIMP: 3,000

The optimization can terminate only after completing a full iteration. Therefore, the number of function calls that are actually performed can exceed the number that is specified by this option. You can specify the optimization technique in the technique subparameter.

maxIter=n

specifies the maximum number n of iterations in the optimization process. The default values are as follows, depending on the optimization technique:

TRUREG, NRRIDG, and NEWRAP: 50
QUANEW and DBLDOG: 200
CONGRA: 400
LBFGS and NMSIMP: 1,000

These default values also apply when n is specified as a missing value. You can specify the optimization technique in the technique subparameter.

maxTime=r

specifies an upper limit of r seconds of CPU time for the optimization process. The time specified by r is checked only once at the end of each iteration. Therefore, the actual running time can be longer than r. The default value is the largest floating-point double representation of your computer.

minIter=n

specifies the minimum number of iterations. If you request more iterations than are actually needed for convergence to a stationary point, the optimization algorithms can behave strangely. For example, the effect of rounding errors can prevent the algorithm from continuing for the required number of iterations. By default, minIter=0.

technique='technique'

specifies the optimization technique for obtaining maximum likelihood estimates. You can specify one of the following techniques:

CONGRA: performs a conjugate-gradient optimization.
DBLDOG: performs a version of double-dogleg optimization.
DUQUANEW: performs a dual quasi-Newton optimization.
LBFGS: performs a limited-memory BFGS optimization.
LEVMAR: performs a Levenberg-Marquardt nonlinear least-squares minimization. This technique is available only with the nlmod action.
NEWRAP: performs a Newton-Raphson optimization with line search.
NMSIMP: performs a Nelder-Mead simplex optimization.
NONE: performs no optimization.
NRRIDG: performs a Newton-Raphson optimization with ridging.
QUANEW: performs a dual quasi-Newton optimization.
TRUREG: performs a trust-region optimization

By default, technique='NRRIDG'.

For more information, see the section Choosing an Optimization Algorithm.

xConv=r xTol=r

specifies the relative parameter convergence criterion. Convergence requires a small relative parameter change in subsequent iterations,

max Underscript j Endscripts StartAbsoluteValue delta Subscript j Superscript left-parenthesis i right-parenthesis Baseline EndAbsoluteValue less-than r

where

StartLayout 1st Row delta Subscript j Superscript left-parenthesis i right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column beta Subscript j Superscript left-parenthesis i right-parenthesis Baseline minus beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline 2nd Column StartAbsoluteValue beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline EndAbsoluteValue less-than 0.01 2nd Row 1st Column StartFraction beta Subscript j Superscript left-parenthesis i right-parenthesis Baseline minus beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline Over beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline EndFraction 2nd Column otherwise EndLayout EndLayout

and is the estimate of the jth parameter at iteration i. For the NMSIMP technique, the same formula is used, but is defined as the vertex that has the lowest function value and is defined as the vertex that has the highest function value in the simplex. The default value is r = 1E–8 for the NMSIMP technique and r = 0 otherwise.

Last updated: March 05, 2026