Shared Concepts

selection Parameter

This section applies to actions in the following action sets: phreg, quantreg, and regression.

Actions in this book that support model selection use the selection parameter to control details about the model selection process.

You can specify the following subparameters in the selection parameter:

method='method'

specifies the method to be used to select the model.

The following methods are available and are explained in detail in the section Model Selection Methods. By default, the stepwise method is used.

backward

specifies backward elimination. This method starts with all effects in the model and deletes effects.

bestsubset

specifies best-subset selection. This method uses the branch-and-bound technique (Furnival and Wilson 1974) to find the best subsets of model effects without examining all possible subsets. The only SAS Viya action that supports this method is the glm action.

forward

specifies forward selection. This method starts with no effects in the model and adds effects.

elasticnet

specifies the elastic net method. This is an extension of LASSO that estimates parameters by using constrained optimization, in which both the sum of the absolute regression coefficients and the sum of the squared regression coefficients are constrained. The SAS Viya actions that support this method are the glm action and the logistic action.

forwardswap

specifies forward-swap selection, which is an extension of the forward selection method. Before any addition step, the action makes all pairwise swaps of one effect in the model and one effect out of the current model that improve the selection criterion.

lar

specifies least angle regression. Like forward selection, this method starts by adding effects to an empty model. The parameter estimates at any step are "shrunk" when they are compared to the corresponding least squares estimates. If the model contains classification variables, then these classification variables are split. For more information, see the split subparameter in the class parameter. The only SAS Viya action that supports this method is the glm action.

lasso

specifies the LASSO method, which adds and deletes parameters by using a version of ordinary least squares in which the sum of the absolute regression coefficients is constrained. If the model contains classification variables, then these classification variables are split. For more information, see the split subparameter in the class parameter.

mcp

specifies the MCP method, which minimizes ordinary least squares plus the minimax concave penalty (MCP) function. The only SAS Viya action that supports this method is the glm action.

none

specifies no model selection.

scad

specifies the SCAD method, which minimizes ordinary least squares plus the smoothly clipped absolute deviation (SCAD) function. The only SAS Viya action that supports this method is the glm action.

stepwise

specifies stepwise regression. This method is similar to the forward selection method except that effects already in the model do not necessarily stay there.

Table 8 lists the applicable subparameters for each of these methods.

Table 8: Applicable Subparameters by method

Subparameter BACKWARD BESTSUBSET ELASTICNET FORWARD LAR LASSO MCP/SCAD STEPWISE
adaptive x x
bestSubsetOptions x
candidates x x x x x x
choose x x x x x x x
competitive x
details x x x x x x x
elasticNetOptions x
enScale x
enSteps x
fast x
fcpSelectionOptions x
gamma x x
hierarchy x x x x x x
lsCoeffs x x
L2 x
L2High x
L2Low x
maxEffects x x x x x x
maxSteps x x x x x x
minEffects x x
orderSelect x x x x x x
plots x x x x x x
select x x x x
slEntry x x x x x
slStay x x x x
stop x x x x x x
stopHorizon x x x


The subparameters that you can specify in the selection parameter are listed and described in the Syntax section of the specific action chapters. More details about many of these subparameters follow; as described in Table 8, not all selection subparameters are applicable to every method or to every action.

adaptive=TRUE | FALSE

when set to True, applies adaptive weights to each of the coefficients when the LASSO or elastic net method is performed. Ordinary least squares estimates of the model parameters are used to form the adaptive weights. You can specify the gamma subparameter to specify the power transformation to use in forming the adaptive weights.

bestSubsetOptions={best-parameter}

specifies subparameters that control how best-subset selection works and displays its results. This subparameter is ignored unless the method subparameter value is bestsubset.

You can specify the following best-parameters:

best=n

specifies the maximum number of subset models to display if the model selection criterion is CP or ADJRSQ, or specifies the maximum number of subset models for each size if the model selection criterion is RSQUARE, where n is a positive integer.

computeBeta=TRUE | FALSE

when set to True, displays estimated regression coefficients for each model in the SubsetSelectionSummary table.

displayAIC=TRUE | FALSE

when set to True, displays the AIC statistic for each model in the SubsetSelectionSummary table.

displayBIC=TRUE | FALSE

when set to True, displays the BIC statistic for each model in the SubsetSelectionSummary table.

displayGMSEP=TRUE | FALSE

when set to True, displays the GMSEP statistic (estimated mean square error of prediction, assuming that both independent and dependent variables are multivariate normal (Stein 1960; Darlington 1968)) for each model in the SubsetSelectionSummary table.

displayJP=TRUE | FALSE

when set to True, displays the upper J Subscript p statistic (estimated mean square error of prediction, assuming that the values of the regressors are fixed and that the model is correct) for each model in the SubsetSelectionSummary table.

displayMSE=TRUE | FALSE

when set to True, displays the mean square error for each model in the SubsetSelectionSummary table.

displayPC=TRUE | FALSE

when set to True, displays the PC statistic (Amemiya’s prediction criterion (Amemiya 1976; Judge et al. 1980)) for each model in the SubsetSelectionSummary table.

displayRMSE=TRUE | FALSE

when set to True, displays the root mean square error for each model in the SubsetSelectionSummary table.

displaySBC=TRUE | FALSE

when set to True, displays the SBC statistic for each model in the SubsetSelectionSummary table.

displaySP=TRUE | FALSE

when set to True, displays the SP statistic for each model in the SubsetSelectionSummary table.

displaySSE=TRUE | FALSE

when set to True, displays the error sum of squares for each model in the SubsetSelectionSummary table.

sigma=value

specifies the true standard deviation of the error term to use in computing the CP and BIC statistics, where value is a positive number. If you omit this best-parameter, an estimate from the full model is used.

candidates='all' | number

specifies the maximum number of candidates to display at each step of the selection process, when the detail subparameter value is all.

choose='criterion'

chooses from the list of models (at each step of the selection process) the model that yields the best value of the specified criterion. If the optimal value of the specified criterion occurs for models at more than one step, then the model that has the smallest number of parameters is chosen. If you do not specify the choose subparameter, then the selected model is the model at the final step in the selection process. The criteria that are supported depend on the type of model that is being fit. For the supported values of criterion, see the chapters for the relevant actions.

competitive=TRUE | FALSE

when set to True, applies only when the selection method is stepwise and the select subparameter value is not sl. The criterion that is specified in the select subparameter is evaluated for all models in which an effect currently in the model is dropped or an effect not yet in the model is added. The effect whose removal from or addition to the model yields the maximum improvement to the criterion specified in the select subparameter is dropped or added.

elasticNetOptions={en-parameter}

specifies subparameters that control optimization solvers for elastic net selection. This subparameter is ignored unless the method subparameter value is ELASTICNET.

You can specify the following en-parameters:

absFConv=value

specifies an absolute function difference convergence criterion. This subparameter is available only when the solver is the alternating direction method of multipliers (ADMM) for elastic net selection.

fConv=value

specifies a relative function difference convergence criterion. This subparameter is available only when the solver is either the alternating direction method of multipliers (ADMM) or the limited-memory BFGS method of elastic net selection.

gConv=value

specifies a relative gradient convergence criterion. This subparameter is available only when the solver is the limited-memory BFGS method of elastic net selection.

lambda=value(s)

specifies the regularization parameter lamda to use for elastic net selection. If you specify a single value, elastic net selection fits a single model. If you supply a list of values, elastic net selection fits multiple models and selects the best model among the candidates. If you specify neither this subparameter nor the numLambda subparameter, elastic net selection fits a model by using a regularization parameter that is a fraction of lamda Subscript max.

mixing=value

specifies the mixing parameter that controls the balance between the upper L 1 penalty and the upper L 2 penalty in elastic net selection. The specified value must be between 0 and 1, inclusive. If you specify 0, elastic net selection fits ridge regression models. If you specify 1, elastic net selection fits LASSO regression models. The default value is 0.5.

numLambda=n

specifies the number of regularization parameters for fitting candidate models. The default value is 1. The subparameter is ignored if you supply a list of regularization parameters by using the lambda subparameter.

rho=value

specifies the scaling factor rho when the numLambda subparameter value n Subscript lamda is greater than 1. If you omit the lambda sequence values, elastic net selection fits candidate models by using the following sequence of the regularization parameters: lamda Subscript max Baseline comma lamda Subscript max Baseline rho comma ellipsis comma lamda Subscript max Baseline rho Superscript n Super Subscript lamda Superscript minus 1. If you omit this en-parameter subparameter, elastic net selection uses rho equals 10 Superscript negative 4 divided by left parenthesis n Super Subscript lamda Superscript minus 1 right parenthesis when the number of observations is greater than the number of parameters, or rho equals 10 Superscript negative 2 divided by left parenthesis n Super Subscript lamda Superscript minus 1 right parenthesis when the number of observations is less than the number of parameters.

solver='ADMM' | 'BFGS'  | 'LBFGS' | 'NLP'

specifies the optimization solver. You can specify the following values:

ADMM

specifies the alternating direction method of multipliers (ADMM).

BFGS

specifies the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method.

LBFGS

specifies the limited-memory BFGS method.

NLP

specifies the nonlinear programming (NLP) method.

enScale=TRUE | FALSE

when set to True, scales the results of the elastic net method to offset bias from the double shrinkage inherent in this method (Zou and Hastie 2005). This option applies only when you specify 'ELASTICNET' for the method subparameter. The default is not to rescale the solution; this is the so-called naive elastic net.

enSteps=n

specifies the number of steps in the search for the suitable value of the ridge regression parameter L2 when you specify 'ELASTICNET' for the method subparameter. If you specify a value for the L2 subparameter, then the enSteps subparameter is ignored. By default, enSteps=50.

fast=TRUE | FALSE

when set to True, implements the computational algorithm of Lawless and Singhal (1978) to compute a first-order approximation to the remaining slope estimates for each subsequent elimination of a variable from the model. When applied in backward selection, this option essentially leads to approximating the selection process as the selection process of a linear regression model in which the crossproducts matrix equals the Hessian matrix in the full model under consideration. This option is available only when the method subparameter value is BACKWARD. It is computationally efficient because the model is not fit after removal of each effect.

fcpSelectionOptions={fcp-parameter}

specifies subparameters that control how MCP and SCAD selections work and display the results. This subparameter is ignored unless the method subparameter value is MCP or SCAD.

You can specify the following fcp-parameters:

alpha=value

specifies the regularization parameter alpha. If you specify a value for this alpha subparameter, then the maxAlpha, minAlpha, and maxIterAlpha subparameters are ignored. If you omit the alpha subparameter, then the glm action searches for a suitable value of the alpha parameter according to the values of the maxAlpha, minAlpha, and maxIterAlpha subparameters.

bigM=value

specifies the constant script upper M when you specify 'MILP' for the solver subparameter.

coefTol=value

specifies the tolerance for the truncation of the estimated coefficients of parameters. In other words, if StartAbsoluteValue beta Subscript j Baseline EndAbsoluteValue is less than the value of the coefTol subparameter for some j, then beta Subscript j would be set to 0; that is, the jth effect is not selected in the final model. By default, coefTol=1E–7.

intTol=value

specifies the amount by which an integer variable value can differ from an integer and still be considered integer-feasible, when you specify 'MILP' for the solver subparameter. The value of the intTol subparameter can be any number between 1E–9 and 0.5, inclusive. By default, intTol=1E–7.

lambda=value

specifies the regularization parameter lamda. If you specify a value for this lambda subparameter, then the maxLambda, minLambda, maxIterLambda, and lambdaGrid subparameters are ignored. If you omit the lambda subparameter, then the glm action searches for a suitable value of the lamda parameter according to the values of the maxLambda, minLambda, maxIterLambda, and lambdaGrid subparameters.

lambdaGrid='LINSPACE' | 'LOGSPACE'

specifies the grid pattern for searching for a suitable value of the regularization parameter lamda. You can specify the following values:

LINSPACE

requests an evenly spaced grid search for a suitable value of lamda in the range left-bracket lamda Subscript min Baseline comma lamda Subscript max Baseline right-bracket, where lamda Subscript min and lamda Subscript max are the values that you specify for the minLambda and maxLambda subparameters, respectively.

LOGSPACE

requests a log-scale grid search for a suitable value of lamda in the range left-bracket lamda Subscript min Baseline comma lamda Subscript max Baseline right-bracket, where lamda Subscript min and lamda Subscript max are the values that you specify for the minLambda and maxLambda subparameters, respectively.

By default, lambdaGrid='LOGSPACE'.

maxAlpha=value

specifies the highest value to use in the search for a suitable value of the regularization parameter alpha. If you specify a value for the alpha subparameter, then the maxAlpha subparameter is ignored. By default, maxAlpha=5.7 when you specify 'SCAD' for the method subparameter and maxAlpha=4.7 when you specify 'MCP' for the method subparameter.

maxIterAlpha=value

specifies the number of steps in the search for a suitable value of the regularization parameter alpha. The grid values of parameter alpha are evenly spaced between minAlpha and maxAlpha subparameter values. If you specify a value for the alpha subparameter, then the maxIterAlpha subparameter is ignored. By default, maxIterAlpha=4.

maxIterLambda=value

specifies the number of steps in the search for a suitable value of the regularization parameter lamda. If you specify a value for the lambda subparameter, then the maxIterLambda subparameter is ignored. By default, maxIterLambda=10.

maxLambda=value

specifies the highest value to use in the search for a suitable value of the regularization parameter lamda. If you specify a value for the lambda subparameter, then the maxLambda subparameter is ignored.

maxTime=value

specifies the upper limit of time (in seconds) for performing the optimization process, in each alpha and lamda step. By default, maxTime=600.

minAlpha=value

specifies the lowest value to use in the search for a suitable value of the regularization parameter alpha. If you specify a value for the alpha subparameter, then the minAlpha subparameter is ignored. By default, minAlpha=2.7 when you specify 'SCAD' for the method subparameter and minAlpha=1.7 when you specify 'MCP' for the method subparameter.

minLambda=value

specifies the lowest value to use in the search for a suitable value of the regularization parameter lamda. If you specify a value for the lambda subparameter, then the minLambda subparameter is ignored.

scale=TRUE | FALSE

specifies whether the design matrix is scaled (TRUE) or not scaled (FALSE) during model selection. By default, scale=TRUE; this is the same as in other penalized methods, such as LASSO and elastic net.

solver='MILP' | 'NLP'

specifies the optimization solver. You can specify the following values:

MILP

specifies the mixed integer linear programming (MILP) solver.

NLP

specifies the nonlinear programming (NLP) solver.

By default, solver='MILP'.

gamma=nonnegative_number

specifies the power transformation that is applied to the parameters in forming the adaptive weights when the adaptive subparameter is specified. By default, gamma=1.

lsCoeffs=TRUE | FALSE

when set to True, requests a hybrid version of the LAR and LASSO methods, in which the sequence of models is determined by the LAR or LASSO algorithm but the coefficients of the parameters for the model at any step are determined by using ordinary least squares.

L2=value

specifies the ridge regularization parameter to use when the method subparameter value is ELASTICNET. If you specify a value for the L2 subparameter, then the value that you specify is used to define the elastic net method, and the L2High, L2Low, and enSteps subparameters are ignored. If you do not specify the L2 subparameter together with the elastic net method, then the glm action searches for a suitable value of the L2 subparameter according to the values of the L2High, L2Low, and enSteps subparameters.

L2High=value

specifies the highest value to use in the search for a suitable value of the ridge regression subparameter L2 when you specify the elastic net method. If you specify a value for the L2 subparameter, then the L2High subparameter is ignored. By default, L2High=1.

L2Low=value

specifies the lowest value to use in the search for a suitable value of the ridge regression subparameter L2 when you specify the elastic net method. If you specify a value for the L2 subparameter, then the L2Low subparameter is ignored. By default, L2Low=0.

maxEffects=n

specifies the maximum number of effects in any model that is considered during the selection process. This option is ignored when backward selection is specified. If, at some step of the selection process, the model contains the specified maximum number of effects, then no candidates for addition are considered.

maxSteps=n

specifies the maximum number of selection steps that are performed. The default value of n is the number of effects in the model parameter when the method subparameter value is FORWARD, BACKWARD, or LAR. The default is three times the number of effects when the method subparameter value is STEPWISE, LASSO, or ELASTICNET.

minEffects=n

specifies the minimum number of effects in any model that is considered during backward selection. This option is ignored unless the method subparameter value is BACKWARD. The selection process terminates if, at some step of the selection process, the model contains the specified minimum number of effects.

select='sl' | 'criterion'

specifies the criterion that the action uses to determine the order in which effects enter or leave at each step of the selection method. For each step, the effect whose addition to or removal from the current model yields the maximum improvement in the specified criterion is selected. You can use the traditional significance-level approach by specifying the SL criterion; for other supported criteria, see the chapter for the relevant action. This option is not valid when the method subparameter value is LAR, LASSO, or ELASTICNET.

slEntry=value

specifies the significance level for entry when the stop subparameter value is SL or the select subparameter value is SL. By default, slEntry=0.05.

slStay=value

specifies the significance level for staying in the model when the stop subparameter value is SL or the select subparameter value is SL. By default, slStay=0.05.

stop='sl' | 'none' | 'criterion'

specifies a criterion that is used to stop the selection process. The criteria that are supported depend on the type of model that is being fit. For information about the supported criteria, see the chapter for the relevant action.

If you do not specify the stop subparameter but do specify the select subparameter, then the criterion specified in the select subparameter is also used for stopping.

You can specify the following criteria:

none

stops the selection process if no suitable add or drop candidates can be found or if a size-based limit is reached. For example, if you specify 'NONE' for the stop subparameter and 5 for the maxEffects subparameter, then the selection process stops at the first step that produces a model that has five effects.

sl

stops the selection process at the step where the significance level of the candidate for entry is greater than the value of the slEntry subparameter for addition steps when the method subparameter value is forward or stepwise and where the significance level of the candidate for removal is greater than the value of the slStay subparameter when the method subparameter value is backward or stepwise.

criterion

stops the selection process if the selection process produces a local extremum of this criterion or if a size-based limit is reached. For example, if you specify 'AIC' for the stop subparameter and 5 for the maxSteps subparameter, then the selection process stops before step 5 if the sequence of models has a local minimum of the AIC criterion before step 5. The determination of whether a local minimum is reached is made on the basis of a stop horizon. The default stop horizon is 3, but you can change it by using the stopHorizon subparameter. If the stop horizon is n and the criterion at any step is better than the criterion at the next n steps, then the selection process terminates.

In addition, you can also specify the following subparameters:

details='none' | 'summary' | 'steps' | 'all'

specifies the level of detail to be produced about the selection process. By default, details='summary'.

When the details subparameter value is all or steps, the following output is produced:

  • tables that provide information about the model that is selected at each step of the selection process.

  • entry and removal statistics for inclusion or exclusion candidates at each step. By default, only the top 10 candidates at each step are shown. If you specify steps and set the candidates subparameter to n, then the best n candidates are shown. If you specify steps and set the candidates subparameter to all, then all candidates are shown.

  • a selection summary table that shows by step the effect that is added to or removed from the model in addition to the values of the criteria specified in the select, stop, and choose subparameters for the resulting model.

  • a stop reason table that describes why the selection process stopped.

  • a selection reason table that describes why the selected model was chosen.

  • a selected effects table that lists the effects that are in the selected model.

The summary level produces only the selection summary, stop reason, selection reason, and selected effects tables. In addition, if you specify neither an L2 subparameter nor a solver subparameter for the elastic net method, the summary level also displays an elastic net summary table, which shows the ridge regularization parameter L2 (in the ratio scaling) and choose subparameter values in each L2 search step.

hierarchy='none' | 'single' | 'singleclass'

specifies whether and how the model hierarchy requirement is applied. You can specify that only classification effects, or both classification and continuous effects, be subject to the hierarchy requirement. This subparameter is ignored unless the method subparameter value is forward, backward, or stepwise.

Model hierarchy refers to the requirement that, for any term to be in the model, all model effects that are contained in the term must be present in the model. For example, in order for the interaction A*B to enter the model, the main effects A and B must be in the model. Likewise, neither effect A nor effect B can leave the model while the interaction A*B is in the model.

You can specify the following values:

none or default

specifies that model hierarchy not be maintained. Any single effect can enter or leave the model at any step of the selection process.

single

specifies that only one effect enter or leave the model at one time, subject to the model hierarchy requirement. For example, suppose that the model contains the main effects A and B and the interaction A*B. In the first step of the selection process, either A or B can enter the model. In the second step, the other main effect can enter the model. The interaction effect can enter the model only when both main effects have already entered. Also, before A or B can be removed from the model, the A*B interaction must first be removed. All effects (classification and interval) are subject to the hierarchy requirement.

singleclass

is the same as single except that only classification effects are subject to the hierarchy requirement.

By default, hierarchy='none'.

orderSelect=TRUE | FALSE

when set to True, displays effects in the selected model in the order in which they first entered the model. If you omit this subparameter, then effects in the selected model are displayed in the order in which they appear in the model parameter.

plots=TRUE | FALSE

when set to True, produces the coefficientProgression and selectionSummaryForPlots tables that you can use to create selection diagnostic plots. For examples of such plots, see the section Model Selection Plots (SAS Visual Statistics: Procedures).

stopHorizon=n

specifies the number of consecutive steps at which the criterion specified in the stop subparameter must worsen in order for a local extremum to be detected. For example, suppose that the stopping criterion is AIC and the sequence of AIC values at steps 1 to 6 of a selection are 10, 7, 4, 6, 5, 2. If stopHorizon=2, then the AIC criterion is deemed to have a local minimum at step 3 because the AIC value at the next two steps are greater than the value 4 that occurs at step 3. However, if stopHorizon=3, then the value at step 3 is not deemed to be a local minimum because the AIC value at step 6 is lower than the AIC value at step 3. If the stop subparameter value is NONE, then the stop horizon value is ignored. If the stop subparameter value is SL, then n is ignored and stopHorizon=1 is used. By default, stopHorizon=3 unless otherwise stated in individual action chapters.

Last updated: March 05, 2026