3. Solve an Elastic-Net problem | R Documentation |
Computes the entire Elastic-Net solution for the regression coefficients for all values of the penalization parameter, via the Coordinate Descent (CD) algorithm (Friedman, 2007). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors
solveEN(Sigma, Gamma, alpha = 1, lambda = NULL, nlambda = 100,
lambda.min = .Machine$double.eps^0.5, lambda.max = NULL,
common.lambda = TRUE, beta0 = NULL, nsup.max = NULL,
scale = TRUE, sdx = NULL, tol = 1E-5, maxiter = 1000,
mc.cores = 1L, save.at = NULL, fileID = NULL,
precision.format = c("double","single"), sparse = FALSE,
eps = .Machine$double.eps*100, verbose = FALSE)
Sigma |
(numeric matrix) Variance-covariance matrix of predictors |
Gamma |
(numeric matrix) Covariance between response variable and predictors. If it contains more than one column, the algorithm is applied to each column separately as different response variables |
lambda |
(numeric vector) Penalization parameter sequence. Default is max(abs(Gamma)/alpha) to a minimum equal to zero. If |
nlambda |
(integer) Number of lambdas generated when |
lambda.min , lambda.max |
(numeric) Minimum and maximum value of lambda that are generated when |
common.lambda |
|
beta0 |
(numeric vector) Initial value for the regression coefficients that will be updated. If |
alpha |
(numeric) Value between 0 and 1 for the weights given to the L1 and L2-penalties |
scale |
|
sdx |
(numeric vector) Scaling factor that will be used to scale the regression coefficients. When |
tol |
(numeric) Maximum error between two consecutive solutions of the CD algorithm to declare convergence |
maxiter |
(integer) Maximum number of iterations to run the CD algorithm at each lambda step before convergence is reached |
nsup.max |
(integer) Maximum number of non-zero coefficients in the last solution.
Default |
mc.cores |
(integer) Number of cores used. When |
save.at |
(character) Path where regression coefficients are to be saved (this may include a prefix added to the files). Default |
fileID |
(character) Suffix added to the file name where regression coefficients are to be saved. Default |
precision.format |
(character) Either 'single' or 'double' for numeric precision and memory occupancy (4 or 8 bytes, respectively) of the regression coefficients. This is only used when |
sparse |
|
eps |
(numeric) A numerical zero to determine if entries are near-zero. Default is the machine precision |
verbose |
|
Finds solutions for the regression coefficients in a linear model
yi = x'i β + ei
where
yi is the response for the ith observation,
xi = (xi1,...,xip)'
is a vector of p
predictors assumed to have unit variance,
β = (β1,...,βp)'
is a vector of regression coefficients, and
ei
is a residual.
The regression coefficients β are estimated as function of the variance matrix among predictors (Σ) and the covariance vector between response and predictors (Γ) by minimizing the penalized mean squared error function
-Γ' β + 1/2 β' Σ β + λ J(β)
where λ is the penalization parameter and J(β) is a penalty function given by
1/2(1-α)||β||22 + α||β||1
where 0 ≤ α ≤ 1, and ||β||1 = ∑j=1|βj| and ||β||22 = ∑j=1βj2 are the L1 and (squared) L2-norms, respectively.
The "partial residual" excluding the contribution of the predictor xij is
ei(j) = yi - x'i β + xijβj
then the ordinary least-squares (OLS) coefficient of xij on this residual is (up-to a constant)
βj(ols) = Γj - Σ'j β + βj
where Γj is the jth element of Γ and Σj is the jth column of the matrix Σ.
Coefficients are updated for each j=1,...,p
from their current value
βj
to a new value
βj(α,λ),
given α and
λ,
by "soft-thresholding" their OLS estimate until convergence as fully described in Friedman (2007).
Returns a list object containing the elements:
lambda
: (vector) all the sequence of values of the penalty.
beta
: (matrix) regression coefficients for each predictor (in rows) associated to each value of the penalization parameter lambda (in columns).
nsup
: (vector) number of non-zero predictors associated to each value of lambda.
The returned object is of the class 'LASSO' for which methods coef
and fitted
exist. Function 'path.plot' can be also used
Friedman J, Hastie T, Höfling H, Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332.
require(SFSI)
data(wheatHTP)
y = as.vector(Y[,"E1"]) # Response variable
X = scale(X_E1) # Predictors
# Training and testing sets
tst = which(Y$trial %in% 1:10)
trn = seq_along(y)[-tst]
# Calculate covariances in training set
XtX = var(X[trn,])
Xty = cov(X[trn,],y[trn])
# Run the penalized regression
fm = solveEN(XtX,Xty,alpha=0.5,nlambda=100)
# Regression coefficients
dim(coef(fm))
dim(coef(fm, ilambda=50)) # Coefficients associated to the 50th lambda
dim(coef(fm, nsup=25)) # Coefficients with around nsup=25 are non-zero
# Predicted values
yHat1 = predict(fm, X=X[trn,]) # training data
yHat2 = predict(fm, X=X[tst,]) # testing data
# Penalization vs correlation
plot(-log(fm$lambda[-1]),cor(y[trn],yHat1[,-1]), main="training", type="l")
plot(-log(fm$lambda[-1]),cor(y[tst],yHat2[,-1]), main="testing", type="l")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.