This section applies to actions in the regression action set.
When the method parameter value is ELASTICNET, the elastic net method proposed by Zou and Hastie (2005) is performed. The elastic net method, which bridges the LASSO method and ridge regression, strikes a balance between having a parsimonious model and borrowing strength from correlated regressors, by solving the least squares regression problem with constraints on both the sum of the absolute coefficients and the sum of the squared coefficients.
More specifically, the elastic net coefficients are the solution to the constrained optimization problem
This can be written as the equivalent Lagrangian form
Elastic net can be treated as a convex combination of LASSO and ridge penalty; pure LASSO and pure ridge are two limiting cases. If is set to a very large value or, equivalently, if
is set to 0, then the elastic net method reduces to ridge regression. If
is set to a very large value or, equivalently, if
is set to 0, then the elastic net method reduces to LASSO. If
and
are both large or, equivalently, if
and
are both set to 0, then the elastic net method reduces to ordinary least squares regression.
The elastic net method can overcome the limitations of LASSO in the following three scenarios:
If you have more parameters than observations (), the LASSO method selects at most n variables before it saturates, because of the nature of the convex optimization problem. This can be a defect for a variable selection method. By contrast, the elastic net method can select more than n variables in this case because of the ridge regression regularization.
If there is a group of variables that have high pairwise correlations, then whereas LASSO tends to select only one variable from that group, the elastic net method can select more than one variable.
If you have more observations than parameters (), and there are high correlations between predictors, then it has been empirically observed that the prediction performance of LASSO is dominated by ridge regression. In this case, the elastic net method can achieve better prediction performance by using ridge regression regularization.
The Lagrangian form of the elastic net optimization problem can be reformulated as
where the augmented design matrix and response
are defined by
This implies that for a given , the coefficients of the elastic net fit follow the same piecewise linear path as LASSO and can be solved using least angle regression (LARS) algorithm. Moreover, Zou and Hastie (2005) suggest rescaling the coefficients by
to deal with the double amount of shrinkage in the elastic net fit; such rescaling is applied when you specify the
enScale subparameter.
If you have a good estimate of , you can specify the value in the
L2 subparameter. If you do not specify a value for , then by default the
glm action searches for a value between 0 and 1 that is optimal according to the current value of the choose criterion (by default, the choose subparameter value is ‘SBC’).
Computing the entire solution path can be prohibitive for large problems when m is large. Instead of using the LARS algorithm, you can use other optimization techniques to solve LASSO and elastic net problems for a reduced set of regularization parameters. You can reformulate the general elastic net objective into the following optimization problem:
where is the regularization parameter and
is the mixing parameter that controls the balance between the LASSO penalty and the ridge penalty. If
, the problem reduces to the LASSO regression; if
, the problem reduces to the ridge regression. Because the LASSO penalty is nonsmooth, the optimization is not readily solved by traditional techniques that use gradient information. To solve the problem, the action takes the following three general approaches:
Uses the Orthant-Wise limited-memory quasi-Newton (OWL-QN) method (Andrew and Gao 2007), which is based on LBFGS and can efficiently optimize regularized objective functions. This solver efficiently handles wide data, where
. You can use it by specifying the
solver subparameter value ‘LBFGS’.
Uses the alternating direction method of multipliers (ADMM), which decomposes the objective into smooth and nonsmooth parts and solves efficiently with augmented Lagrangians (Boyd et al. 2011). This solver efficiently handles tall data, where . You can use it by specifying the
solver subparameter value ‘ADMM’. This method is available only for the glm action.
Turns the nonsmooth LASSO penalty into a smooth penalty by using the reformulation of , and thus
, where
. This reformulation converts the optimization to a constrained nonlinear problem. You can use two solvers by specifying the
solver subparameter values ‘BFGS’ and ‘NLP’. For low-dimensional problems, both solvers can provide accurate solutions.
If you use one of the solvers to optimize the elastic net objective without supplying a list of regularization parameters or specifying the number of regularization parameters, the glm action uses a single heuristic value for regularization:
where makes the lower bound of the
regularization such that
. Note that for ridge regression when
,
does not exist because the ridge penalty is not sparsity-inducing. In this case, elastic net selection computes
by assuming that
if you do not supply a list of regularization parameters.
If you supply a list of regularization parameters by using the lambda subparameter, the action sorts the values in descending order and then selects candidate models. If you supply only the number of regularization parameters by using the
nLambda subparameter, the action constructs a series of values by using the following mechanism:
You can specify by using the
rho subparameter. Otherwise the action uses
The idea of penalization by using both LASSO and ridge penalty also extends to generalized linear models, including logistic regression models. For those models, the general elastic net objective function is formulated as
where is the log-likelihood function. If you do not supply a list of regularization parameters or do not specify the number of regularization parameters, the
logistic action uses the default nLambda parameter value of 20 in order to choose a good candidate in a reasonable amount of time.