This section applies to actions in the following action sets: phreg and regression.
The group LASSO method, proposed by Yuan and Lin (2006), is a variant of LASSO that is specifically designed for models defined in terms of effects that have multiple degrees of freedom, such as the main effects of classification variables and interactions between classification variables. If all effects in the model are continuous, then the group LASSO method is the same as the LASSO method.
Recall that LASSO selection depends on solving a constrained optimization problem of the form
where L is the log-likelihood function. In this formulation, individual parameters can be included or excluded from the model independently, subject only to the overall constraint. In contrast, the group LASSO method uses a constraint that forces all parameters that correspond to the same effect to be included or excluded simultaneously. For a model that has k effects, let be the group of linear coefficients that correspond to effect j in the model. Then group LASSO depends on solving a constrained optimization problem of the form
where is the number of parameters that correspond to effect j, and
denotes the Euclidean norm of the parameters
,
That is, instead of constraining the sum of the absolute value of individual parameters, group LASSO constrains the Euclidean norm of groups of parameters, where groups are defined by effects.
You can write the group LASSO method in the equivalent Lagrangian form, which is an example of a penalized log-likelihood function:
The weight was suggested by Yuan and Lin (2006) in order to take the size of the group into consideration in group LASSO.
Unlike LASSO for linear models, group LASSO does not allow a piecewise linear constant solution path as generated by a LAR algorithm. Instead, the method proposed by Nesterov (2013) is adopted to solve the Lagrangian form of the group LASSO problem that corresponds to a prespecified regularization parameter . Nesterov’s method is known to have an optimal convergence rate for first-order black-box optimization. Because the optimal
is usually unknown, a series of regularization parameters
is used, where
is a positive value less than 1. You can specify
by using the
lassoRho subparameter in the action; the default value is . In the ith step of group LASSO selection, the value that is used for
is
.
A unique feature of the group LASSO method is that it does not necessarily add or remove precisely one effect at each step of the process. This is different from the forward, stepwise, and backward selection methods.
As with the other selection methods, you can specify a criterion to choose among the models at each step of the group LASSO algorithm by using the choose subparameter. You can also specify a stopping criterion by using the stop subparameter. If you do not specify either the choose or stop subparameter, the model at the last LASSO step is chosen as the selected model and parameter estimates are reported for this model. These parameter estimates are used to compute predicted values for the output data tables.
For more information, see the discussion in the section selection Parameter.
The model degrees of freedom at any step of the LASSO are simply the number of nonzero regression coefficients in the model at that step. Efron et al. (2004) cite empirical evidence for doing this but do not give any mathematical justification for this choice.
Some distributions involve a dispersion parameter (the parameter in the expressions for the log likelihood). These parameters are not estimated by the LASSO optimization algorithm, and are set to either the default value or a value that you specify. You can use the
phi subparameter in the action to set the dispersion to a fixed value.