Shared Concepts

Optimization Parameters

This section applies to actions in the following action sets: gam, mixed, nonlinear, phreg, and regression.

This section describes parameters that are typically available for the actions in this book that perform optimizations.

The following notation is used to describe the subparameters. bold-italic beta denotes the p times 1 vector of parameters for the optimization and beta Subscript i is its ith element. The objective function being minimized, its p times 1 gradient vector, and its p times p Hessian matrix are denoted as f left-parenthesis bold-italic beta right-parenthesis, bold g left-parenthesis bold-italic beta right-parenthesis, and bold upper H left-parenthesis bold-italic beta right-parenthesis, respectively. The gradient with respect to the ith parameter is denoted as g Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis. Superscripts in parentheses denote the iteration count; for example, f left-parenthesis bold-italic beta right-parenthesis Superscript left-parenthesis k right-parenthesis is the value of the objective function at iteration k.

absConv=r
absTol=r

specifies an absolute function convergence criterion. For minimization, termination requires f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis less-than-or-equal-to r, where bold-italic beta is the vector of parameters in the optimization and f left-parenthesis dot right-parenthesis is the objective function. The default value of r is the negative square root of the largest double-precision value, which serves only as a protection against overflows.

absFconv=r
absFtol=r

specifies an absolute function difference convergence criterion. For all techniques except NMSIMP, termination requires a small change of the function value in successive iterations:

StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

Here, bold-italic beta is the vector of parameters in the optimization and f left-parenthesis dot right-parenthesis is the objective function. The same formula is used for the NMSIMP technique, but bold-italic beta Superscript left-parenthesis k right-parenthesis is defined as the vertex that has the lowest function value and bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex that has the highest function value in the simplex.

absFconvN=n
absFtolN=n

specifies the number of successive iterations for which the absFconv subparameter criterion must be satisfied before the process can be terminated. By default, absFconvN=0. The only SAS Viya action that supports this method is the nlmod action.

absGconv=r
absGtol=r

specifies an absolute gradient convergence criterion. Termination requires the maximum absolute gradient element to be small:

max Underscript j Endscripts StartAbsoluteValue g Subscript j Baseline left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue less-than-or-equal-to sans-serif-italic r

Here, bold-italic beta is the vector of parameters in the optimization and g Subscript j Baseline left-parenthesis dot right-parenthesis is the gradient of the objective function with respect to the jth parameter. This criterion is not used by the NMSIMP technique. By default, absGconv=1E–5.

absGconvN=n
absGtolN=n

specifies the number of successive iterations for which the absGconv criterion must be satisfied before the process can be terminated. By default, absGconvN=0. The only SAS Viya action that supports this method is the nlmod action.

absXconv=r
absXtol=r

specifies an absolute parameter convergence criterion: For all techniques except NMSIMP, termination requires a small Euclidean distance between successive parameter vectors,

parallel-to bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline minus bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline parallel-to less-than-or-equal-to sans-serif-italic r

For the NMSIMP technique, termination requires either a small length alpha Superscript left-parenthesis k right-parenthesis of the vertices of a restart simplex,

alpha Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

or a small simplex size,

delta Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r

where the simplex size delta Superscript left-parenthesis k right-parenthesis is defined as the L1 distance from the simplex vertex bold-italic xi Superscript left-parenthesis k right-parenthesis that has the smallest function value to the other p simplex points bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline not-equals bold-italic xi Superscript left-parenthesis k right-parenthesis:

delta Superscript left-parenthesis k right-parenthesis Baseline equals sigma-summation Underscript bold-italic beta Subscript l Baseline not-equals y Endscripts parallel-to bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline minus bold-italic xi Superscript left-parenthesis k right-parenthesis parallel-to

The default is r = 1E–8 for the NMSIMP technique and r = 0 otherwise.

corrections=m

specifies the number of the stored quasi-Newton update histories, which is referred to as the number of corrections, for the LBFGS technique. The m value is usually set as small as 3. In general, larger values improve convergence speed and solution quality for the LBFGS technique. However, for many problems, after reaching a certain threshold, the improvement can stall and the memory usage and solver’s per-iteration computation cost can increase. In practice, you can set high values for small or medium problems and small values for large problems. By default, corrections=20.

fConv=r
fTol=r

specifies a relative function difference convergence criterion. For all techniques except NMSIMP, termination requires a small relative change of the function value in successive iterations,

StartFraction StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue Over StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

Here, bold-italic beta denotes the vector of parameters that participate in the optimization, and f left-parenthesis dot right-parenthesis is the objective function. The same formula is used for the NMSIMP technique, but bold-italic beta Superscript left-parenthesis k right-parenthesis is defined as the vertex that has the lowest function value and bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex that has the highest function value in the simplex.

The default value is r=2 times epsilon where epsilon is the machine precision, which is the smallest double-precision floating-point number such that 1 plus epsilon greater-than 1.

fConvN=r
fTolN=r

specifies the number of successive iterations for which the fConv subparameter criterion must be satisfied before the process can terminate. By default, fConvN=0. The only SAS Viya action that supports this method is the nlmod action.

fConv2=r
fTol2=r

specifies a second function convergence criterion. For all techniques except NMSIMP, termination requires a small predicted reduction of the objective function:

d f Superscript left-parenthesis k right-parenthesis Baseline almost-equals f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline plus bold s Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis

The predicted reduction

StartLayout 1st Row 1st Column d f Superscript left-parenthesis k right-parenthesis 2nd Column equals minus bold g Superscript left-parenthesis k right-parenthesis prime Baseline bold s Superscript left-parenthesis k right-parenthesis Baseline minus one-half bold s Superscript left-parenthesis k right-parenthesis prime Baseline bold upper H Superscript left-parenthesis k right-parenthesis Baseline bold s Superscript left-parenthesis k right-parenthesis Baseline 2nd Row 1st Column Blank 2nd Column equals minus one-half bold s Superscript left-parenthesis k right-parenthesis Super Superscript prime Superscript Baseline bold g Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to sans-serif-italic r EndLayout

is computed by approximating the objective function f by the first two terms of the Taylor series and substituting the Newton step,

bold s Superscript left-parenthesis k right-parenthesis Baseline equals minus left-bracket bold upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline bold g Superscript left-parenthesis k right-parenthesis

For the NMSIMP technique, termination requires a small standard deviation of the function values of the p plus 1 simplex vertices bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis, l equals 0 comma ellipsis comma p,

StartRoot StartFraction 1 Over n plus 1 EndFraction sigma-summation Underscript l Endscripts left-bracket f left-parenthesis bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus ModifyingAbove f With bar left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis right-bracket squared EndRoot less-than-or-equal-to sans-serif-italic r

where ModifyingAbove f With bar left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis equals StartFraction 1 Over p plus 1 EndFraction sigma-summation Underscript l Endscripts f left-parenthesis bold-italic beta Subscript l Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis. If there are p Subscript a c t boundary constraints active at bold-italic beta Superscript left-parenthesis k right-parenthesis, the mean and standard deviation are computed only for the n plus 1 minus p Subscript a c t unconstrained vertices.

The default value is r = 1E–6 for the NMSIMP technique and r = 0 otherwise.

gConv=r
gTol=r

specifies a relative gradient convergence criterion. For all techniques except CONGRA and NMSIMP, termination requires that the normalized predicted function reduction be small:

StartFraction bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis prime left-bracket bold upper H Superscript left-parenthesis k right-parenthesis Baseline right-bracket Superscript negative 1 Baseline bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis Over StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

Here, bold-italic beta denotes the vector of parameters that participate in the optimization, f left-parenthesis dot right-parenthesis is the objective function, and bold g left-parenthesis dot right-parenthesis is the gradient. For the CONGRA technique (where a reliable Hessian estimate bold upper H is not available), the following criterion is used:

StartFraction parallel-to bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Superscript 2 Baseline parallel-to bold s left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis parallel-to Over parallel-to bold g left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis minus bold g left-parenthesis bold-italic beta Superscript left-parenthesis k minus 1 right-parenthesis Baseline right-parenthesis parallel-to Subscript 2 Baseline StartAbsoluteValue f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue EndFraction less-than-or-equal-to sans-serif-italic r

This criterion is not used by the NMSIMP technique. By default, gConv=1E–8.

gConvN=n
gTolN=n

specifies the number of successive iterations for which the gConv subparameter criterion must be satisfied before the process can terminate. The only SAS Viya action that supports this method is the nlmod action. By default, gConvN=0.

gConv2=r
gTol2=r

specifies another relative gradient convergence criterion. For the TRUREG, LEVMAR, NRRIDG, and NEWRAP techniques, the following criterion of Browne (1982) is used:

max Underscript j Endscripts StartFraction StartAbsoluteValue bold g Subscript j Baseline left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis EndAbsoluteValue Over StartRoot f left-parenthesis bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis bold upper H Subscript j comma j Superscript left-parenthesis k right-parenthesis Baseline EndRoot EndFraction less-than-or-equal-to sans-serif-italic r

This criterion is not used by the other techniques.

By default, gConv2=0.

maxFunc=n

specifies the maximum number n of function calls in the optimization process. The default values are as follows, depending on the optimization technique:

The optimization can terminate only after completing a full iteration. Therefore, the number of function calls that are actually performed can exceed the number that is specified by this option. You can specify the optimization technique in the technique subparameter.

maxIter=n

specifies the maximum number n of iterations in the optimization process. The default values are as follows, depending on the optimization technique:

These default values also apply when n is specified as a missing value. You can specify the optimization technique in the technique subparameter.

maxTime=r

specifies an upper limit of r seconds of CPU time for the optimization process. The time specified by r is checked only once at the end of each iteration. Therefore, the actual running time can be longer than r. The default value is the largest floating-point double representation of your computer.

minIter=n

specifies the minimum number of iterations. If you request more iterations than are actually needed for convergence to a stationary point, the optimization algorithms can behave strangely. For example, the effect of rounding errors can prevent the algorithm from continuing for the required number of iterations. By default, minIter=0.

technique='technique'

specifies the optimization technique for obtaining maximum likelihood estimates. You can specify one of the following techniques:

CONGRA

performs a conjugate-gradient optimization.

DBLDOG

performs a version of double-dogleg optimization.

DUQUANEW

performs a dual quasi-Newton optimization.

LBFGS

performs a limited-memory BFGS optimization.

LEVMAR

performs a Levenberg-Marquardt nonlinear least-squares minimization. This technique is available only with the nlmod action.

NEWRAP

performs a Newton-Raphson optimization with line search.

NMSIMP

performs a Nelder-Mead simplex optimization.

NONE

performs no optimization.

NRRIDG

performs a Newton-Raphson optimization with ridging.

QUANEW

performs a dual quasi-Newton optimization.

TRUREG

performs a trust-region optimization

By default, technique='NRRIDG'.

For more information, see the section Choosing an Optimization Algorithm.

xConv=r
xTol=r

specifies the relative parameter convergence criterion. Convergence requires a small relative parameter change in subsequent iterations,

max Underscript j Endscripts StartAbsoluteValue delta Subscript j Superscript left-parenthesis i right-parenthesis Baseline EndAbsoluteValue less-than r

where

StartLayout 1st Row  delta Subscript j Superscript left-parenthesis i right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column beta Subscript j Superscript left-parenthesis i right-parenthesis Baseline minus beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline 2nd Column StartAbsoluteValue beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline EndAbsoluteValue less-than 0.01 2nd Row 1st Column StartFraction beta Subscript j Superscript left-parenthesis i right-parenthesis Baseline minus beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline Over beta Subscript j Superscript left-parenthesis i minus 1 right-parenthesis Baseline EndFraction 2nd Column otherwise EndLayout EndLayout

and beta Subscript j Superscript left-parenthesis i right-parenthesis is the estimate of the jth parameter at iteration i. For the NMSIMP technique, the same formula is used, but beta Subscript j Superscript left-parenthesis k right-parenthesis is defined as the vertex that has the lowest function value and beta Subscript j Superscript left-parenthesis k minus 1 right-parenthesis is defined as the vertex that has the highest function value in the simplex. The default value is r = 1E–8 for the NMSIMP technique and r = 0 otherwise.

Last updated: March 05, 2026