Shared Concepts

Forward Selection

This section applies to actions in the following action sets: phreg, quantreg, and regression.

When the method subparameter value is FORWARD, the forward selection technique begins with just the intercept and then sequentially adds the effect that most improves the fit. The process terminates when no significant improvement can be obtained by adding any effect.

In the traditional implementation of forward selection, the statistic that is used to determine whether to add an effect is the significance level of a hypothesis test that reflects an effect’s contribution to the model if it is included. At each step, the effect that is most significant is added. The process stops when the significance level for adding any effect is greater than some specified entry significance level.

An alternative approach to address the critical problem of when to stop the selection process is to assess the quality of the models that are produced by the forward selection method and choose the model from this sequence that "best" balances goodness of fit against model complexity. You can use several criteria for this purpose.

It is important to keep in mind that forward selection bases the decision about what effect to add at any step by considering models that differ by one effect from the current model. This search paradigm cannot guarantee reaching a "best" subset model. Furthermore, the add decision is greedy in the sense that the effect that is deemed most significant is the effect that is added. However, if your goal is to find a model that is best in terms of some selection criterion other than the significance level of the entering effect, then even this one step choice might not be optimal. For example, the effect that you would add to get a model that has the smallest value of the Mallows’ upper C left-parenthesis p right-parenthesis statistic at the next step is not necessarily the same effect that is most significant based on a hypothesis test. You can specify the criterion to optimize at each step by using the select subparameter. For example, the following CASL language statement requests that at each step the effect that is added be the one that produces a model that has the smallest value of the Mallows’ upper C left-parenthesis p right-parenthesis statistic:

selection={method='forward',select='CP'}

When all effects are variables (that is, effects have one degree of freedom and no hierarchy), using ADJRSQ, AIC, AICC, BIC, CP, RSQUARE, or SBC as the selection criterion for forward selection produces the same sequence of additions. However, if the degrees of freedom contributed by different effects are not constant or if an out-of-sample prediction-based criterion is used, then different sequences of additions might be obtained.

If you specify only the select subparameter, then this criterion is also used to decide when to stop the selection process. In the previous example, not only do effects enter based on the Mallows’ upper C left-parenthesis p right-parenthesis statistic, but the selection terminates when the upper C left-parenthesis p right-parenthesis statistic has a local minimum.

You use the choose subparameter to specify the criterion for selecting one model from the sequence of models produced. If you do not specify a choose subparameter, then the model at the final step is the selected model.

For example, if you specify the following CASL language statement, then forward selection terminates at the step where no effect can be added at the 0.2 significance level:

selection={method='forward',select='SL',choose='AIC',SLE=0.2}

However, the selected model is the first one that has the minimum value of Akaike’s information criterion. In some cases, this minimum value might occur at a step much earlier than the final step. In other cases, the AIC might start increasing only if more steps are performed—that is, a larger value is used for the significance level for entry. If you want to minimize AIC, then too many steps are performed in the former case and too few in the latter case. To address this issue, you can use the stop subparameter to specify a stopping criterion. When you specify a stopping criterion, forward selection continues until a local extremum of the stopping criterion in the sequence of models generated is reached. To be deemed a local extremum, a criterion value at a particular step must be better than its value at the next n steps, where n is known as the "stop horizon." By default, the stop horizon is three steps, but you can change this by specifying the stopHorizon subparameter.

For example, if you specify the following CASL language statement, then forward selection terminates at the step where the effect to be added at the next step would produce a model that has an AIC statistic larger than the AIC statistic of the current model:

selection={method='forward',select='SBC',choose='AIC',stopHorizon=1}

In most cases, provided that the entry significance level is large enough that the local extremum of the named criterion occurs before the final step, specifying either of the following CASL language statements selects the same model, but more steps are done in the first case:

selection={method='forward',select='SL',choose='CRITERION'}
selection={method='forward',select='SL',stop='CRITERION'}

In some cases, there might be a better local extremum that cannot be reached if you specify the stop subparameter but can be found if you use the choose subparameter. Also, you can use the choose subparameter in preference to the stop subparameter if you want to examine how the named criterion behaves as you move beyond the step where the first local minimum of this criterion occurs.

You can specify both the choose and stop parameters. You can also use these criteria together with subparameters that specify size-based limits on the selected model. You might want to consider models that are generated by forward selection and have at most some fixed number of effects, but select from within this set based on a criterion that you specify. For example, specifying the following CAS-language statements requests that forward selection continue until there are 20 effects in the final model and chooses among the sequence of models the one that has the largest value of the adjusted R-square statistic:

selection={method='forward',stop='NONE',maxeffects=20,choose='ADJRSQ'}

You can also combine these options to select a model where one of two conditions is met. For example, the following CASL language statement chooses whatever occurs first between a local minimum of the sum of squares on validation data and a local minimum of the corrected Akaike’s information criterion (AICC):

selection={method='forward',stop='AICC',choose='VALIDATE'}

You can find discussion and references to studies about criteria for variable selection in Burnham and Anderson (2002), along with some cautions and recommendations.

Examples of Forward Selection Specifications

The following CASL language statement adds effects that at each step produce the lowest value of the SBC statistic and stops at the step where adding any effect would increase the SBC statistic:

selection={method='forward',stopHorizon=1}

The following CASL language statement adds effects based on significance level and stops when all candidate effects for entry at a step have a significance level greater than the default entry significance level of 0.05:

selection={method='forward',select='SL'}

The following CASL language statement adds effects based on significance level and stops at a step where adding any effect increases the error sum of squares computed on the validation data:

selection={method='forward',select='SL',stop='VALIDATION',stopHorizon=1}

The following CASL language statement adds effects that at each step produce the lowest value of the AIC statistic and stops at the first step whose AIC value is smaller than the AIC value at the next three steps:

selection={method='forward',select='AIC'}

The following CASL language statement adds effects that at each step produce the largest value of the adjusted R-square statistic and stops at the step where the significance level that corresponds to the addition of this effect is greater than 0.2:

selection={method='forward',select='ADJRSQ',stop='SL',sle=0.2}
Last updated: March 05, 2026