4. Solve an Elastic-Net problem | R Documentation |
Computes the entire Elastic-Net solution for the regression coefficients for all values of the penalization parameter, via the Coordinate Descent (CD) algorithm (Friedman, 2007). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors
solveEN(Sigma, Gamma, alpha = 1, lambda = NULL,
nlambda = 100, lambda.min = .Machine$double.eps^0.5,
lambda.max = NULL, common.lambda = TRUE, beta0 = NULL,
nsup.max = NULL, scale = TRUE, sdx = NULL, tol = 1E-5,
maxiter = 1000, mc.cores = 1L, save.at = NULL,
precision.format = c("double","single"),
fileID = NULL, verbose = FALSE)
Sigma |
(numeric matrix) Variance-covariance matrix of predictors |
Gamma |
(numeric matrix) Covariance between response variable and predictors. If it contains more than one column, the algorithm is applied to each column separately as different response variables |
lambda |
(numeric vector) Penalization parameter sequence. Default is max(abs(Gamma)/alpha) to a minimum equal to zero. If |
nlambda |
(integer) Number of lambdas generated when |
lambda.min, lambda.max |
(numeric) Minimum and maximum value of lambda that are generated when |
common.lambda |
|
beta0 |
(numeric vector) Initial value for the regression coefficients that the algorithm will update for |
alpha |
(numeric) Value between 0 and 1 for the weights given to the L1 and L2-penalties |
scale |
|
sdx |
(numeric vector) Scaling factor that will be used to scale the regression coefficients. When |
tol |
(numeric) Maximum error between two consecutive solutions of the CD algorithm to declare convergence |
maxiter |
(integer) Maximum number of iterations to run the CD algorithm at each lambda step before convergence is reached |
nsup.max |
(integer) Maximum number of non-zero coefficients in the last solution.
Default |
mc.cores |
(integer) Number of cores used. When |
save.at |
(character) Path where regression coefficients are to be saved (this may include a prefix added to the files). Default |
fileID |
(character) Suffix added to the file name where regression coefficients are to be saved. Default |
precision.format |
(character) Either 'single' or 'double' for numeric precision and memory occupancy (4 or 8 bytes, respectively) of the regression coefficients. This is only used when |
verbose |
|
Finds solutions for the regression coefficients in a linear model
yi = x'i β + ei
where
yi is the response for the ith observation,
xi=(xi1,...,xip)'
is a vector of p
predictors assumed to have unit variance,
β=(β1,...,βp)'
is a vector of regression coefficients, and
ei
is a residual.
The regression coefficients β are estimated as function of the variance matrix among predictors (Σ) and the covariance vector between response and predictors (Γ) by minimizing the penalized mean squared error function
-Γ' β + 1/2 β' Σ β + λ J(β)
where λ is the penalization parameter and J(β) is a penalty function given by
1/2(1-α)||β||22 + α||β||1
where 0 ≤ α ≤ 1, and ||β||1 = ∑j=1|βj| and ||β||22 = ∑j=1βj2 are the L1 and (squared) L2-norms, respectively.
The "partial residual" excluding the contribution of the predictor xij is
ei(j) = yi - x'i β + xijβj
then the ordinary least-squares (OLS) coefficient of xij on this residual is (up-to a constant)
βj(ols) = Γj - Σ'j β + βj
where Γj is the jth element of Γ and Σj is the jth column of the matrix Σ.
Coefficients are updated for each j=1,...,p
from their current value
βj
to a new value
βj(α,λ),
given α and
λ,
by "soft-thresholding" their OLS estimate until convergence as fully described in Friedman (2007).
Returns a list object containing the elements:
lambda
: (vector) all the sequence of values of the penalty.
beta
: (matrix) regression coefficients for each predictor (in rows) associated to each value of the penalization parameter lambda (in columns).
nsup
: (vector) number of non-zero predictors associated to each value of lambda.
The returned object is of the class 'LASSO' for which methods coef
and fitted
exist. Function 'path.plot' can be also used
Friedman J, Hastie T, Höfling H, Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332.
Hoerl AE, Kennard RW (1970). Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
Tibshirani R (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58(1), 267–288.
Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2), 301–320.
require(SFSI)
data(wheatHTP)
y = as.vector(Y[,"E1"]) # Response variable
X = scale(X_E1) # Predictors
# Training and testing sets
tst = which(Y$trial %in% 1:10)
trn = seq_along(y)[-tst]
# Calculate covariances in training set
XtX = var(X[trn,])
Xty = cov(X[trn,],y[trn])
# Run the penalized regression
fm = solveEN(XtX,Xty,alpha=0.5,nlambda=100)
# Predicted values
yHat1 = fitted(fm, X=X[trn,]) # training data
yHat2 = fitted(fm, X=X[tst,]) # testing data
# Penalization vs correlation
plot(-log(fm$lambda[-1]),cor(y[trn],yHat1[,-1]), main="training", type="l")
plot(-log(fm$lambda[-1]),cor(y[tst],yHat2[,-1]), main="testing", type="l")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.