Shared Concepts

Group LASSO Selection

This section applies to actions in the following action sets: phreg and regression.

The group LASSO method, proposed by Yuan and Lin (2006), is a variant of LASSO that is specifically designed for models defined in terms of effects that have multiple degrees of freedom, such as the main effects of classification variables and interactions between classification variables. If all effects in the model are continuous, then the group LASSO method is the same as the LASSO method.

Recall that LASSO selection depends on solving a constrained optimization problem of the form

min left-brace minus upper L left-parenthesis bold-italic mu semicolon bold y right-parenthesis right-brace subject to sigma-summation Underscript j equals 1 Overscript m Endscripts StartAbsoluteValue beta Subscript j Baseline EndAbsoluteValue less-than-or-equal-to t

where L is the log-likelihood function. In this formulation, individual parameters can be included or excluded from the model independently, subject only to the overall constraint. In contrast, the group LASSO method uses a constraint that forces all parameters that correspond to the same effect to be included or excluded simultaneously. For a model that has k effects, let be the group of linear coefficients that correspond to effect j in the model. Then group LASSO depends on solving a constrained optimization problem of the form

where is the number of parameters that correspond to effect j, and denotes the Euclidean norm of the parameters ,

StartMetric beta Subscript upper G Sub Subscript j Subscript Baseline EndMetric equals StartRoot sigma summation Underscript i equals 1 Overscript upper G Subscript j Baseline Endscripts beta Subscript i Superscript 2 Baseline EndRoot

That is, instead of constraining the sum of the absolute value of individual parameters, group LASSO constrains the Euclidean norm of groups of parameters, where groups are defined by effects.

You can write the group LASSO method in the equivalent Lagrangian form, which is an example of a penalized log-likelihood function:

min left brace minus upper L left parenthesis bold italic mu semicolon bold y right parenthesis right brace plus lamda sigma summation Underscript j equals 1 Overscript k Endscripts StartRoot StartAbsoluteValue upper G Subscript j Baseline EndAbsoluteValue EndRoot StartMetric beta Subscript upper G Sub Subscript j Subscript Baseline EndMetric

The weight was suggested by Yuan and Lin (2006) in order to take the size of the group into consideration in group LASSO.

Unlike LASSO for linear models, group LASSO does not allow a piecewise linear constant solution path as generated by a LAR algorithm. Instead, the method proposed by Nesterov (2013) is adopted to solve the Lagrangian form of the group LASSO problem that corresponds to a prespecified regularization parameter . Nesterov’s method is known to have an optimal convergence rate for first-order black-box optimization. Because the optimal is usually unknown, a series of regularization parameters is used, where is a positive value less than 1. You can specify by using the lassoRho subparameter in the action; the default value is . In the ith step of group LASSO selection, the value that is used for is .

A unique feature of the group LASSO method is that it does not necessarily add or remove precisely one effect at each step of the process. This is different from the forward, stepwise, and backward selection methods.

As with the other selection methods, you can specify a criterion to choose among the models at each step of the group LASSO algorithm by using the choose subparameter. You can also specify a stopping criterion by using the stop subparameter. If you do not specify either the choose or stop subparameter, the model at the last LASSO step is chosen as the selected model and parameter estimates are reported for this model. These parameter estimates are used to compute predicted values for the output data tables.

For more information, see the discussion in the section selection Parameter.

The model degrees of freedom at any step of the LASSO are simply the number of nonzero regression coefficients in the model at that step. Efron et al. (2004) cite empirical evidence for doing this but do not give any mathematical justification for this choice.

Some distributions involve a dispersion parameter (the parameter in the expressions for the log likelihood). These parameters are not estimated by the LASSO optimization algorithm, and are set to either the default value or a value that you specify. You can use the phi subparameter in the action to set the dispersion to a fixed value.

Last updated: March 05, 2026