solveEN: Coordinate Descent algorithm to solve Elastic-Net-type...
In SFSI: Sparse Family and Selection Index

3. Solve an Elastic-Net problem

R Documentation

Coordinate Descent algorithm to solve Elastic-Net-type problems

Description

Computes the entire Elastic-Net solution for the regression coefficients for all values of the penalization parameter, via the Coordinate Descent (CD) algorithm (Friedman, 2007). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors

Usage

solveEN(Sigma, Gamma, alpha = 1, lambda = NULL, nlambda = 100,
        lambda.min = .Machine$double.eps^0.5, lambda.max = NULL,
        common.lambda = TRUE, beta0 = NULL, nsup.max = NULL,
        scale = TRUE, sdx = NULL, tol = 1E-5, maxiter = 1000,
        mc.cores = 1L, save.at = NULL, fileID = NULL,
        precision.format = c("double","single"), sparse = FALSE,
        eps = .Machine$double.eps*100, verbose = FALSE)

Arguments

`Sigma`	(numeric matrix) Variance-covariance matrix of predictors
`Gamma`	(numeric matrix) Covariance between response variable and predictors. If it contains more than one column, the algorithm is applied to each column separately as different response variables
`lambda`	(numeric vector) Penalization parameter sequence. Default is `lambda = NULL`, in this case a decreasing grid of `'nlambda'` lambdas will be generated starting from a maximum equal to max(abs(Gamma)/alpha) to a minimum equal to zero. If `alpha = 0` the grid is generated starting from a maximum equal to 5
`nlambda`	(integer) Number of lambdas generated when `lambda = NULL`
`lambda.min`, `lambda.max`	(numeric) Minimum and maximum value of lambda that are generated when `lambda = NULL`
`common.lambda`	`TRUE` or `FALSE` to whether computing the coefficients for a grid of lambdas common to all columns of `Gamma` or for a grid of lambdas specific to each column of `Gamma`. Default is `common.lambda = TRUE`
`beta0`	(numeric vector) Initial value for the regression coefficients that will be updated. If `beta0 = NULL` a vector of zeros will be considered. These values will be used as starting values for the first lambda value
`alpha`	(numeric) Value between 0 and 1 for the weights given to the L1 and L2-penalties
`scale`	`TRUE` or `FALSE` to scale matrix `Sigma` for variables with unit variance and scale `Gamma` by the standard deviation (`sdx`) of the corresponding predictor taken from the diagonal of `Sigma`
`sdx`	(numeric vector) Scaling factor that will be used to scale the regression coefficients. When `scale = TRUE` this scaling factor vector is set to the squared root of the diagonal of `Sigma`, otherwise a provided value is used assuming that `Sigma` and `Gamma` are scaled
`tol`	(numeric) Maximum error between two consecutive solutions of the CD algorithm to declare convergence
`maxiter`	(integer) Maximum number of iterations to run the CD algorithm at each lambda step before convergence is reached
`nsup.max`	(integer) Maximum number of non-zero coefficients in the last solution. Default `nsup.max = NULL` will calculate solutions for the entire lambda grid
`mc.cores`	(integer) Number of cores used. When `mc.cores` > 1, the analysis is run in parallel for each column of `Gamma`. Default is `mc.cores = 1`
`save.at`	(character) Path where regression coefficients are to be saved (this may include a prefix added to the files). Default `save.at = NULL` will no save the regression coefficients and they are returned in the output object
`fileID`	(character) Suffix added to the file name where regression coefficients are to be saved. Default `fileID = NULL` will automatically add sequential integers from 1 to the number of columns of `Gamma`
`precision.format`	(character) Either 'single' or 'double' for numeric precision and memory occupancy (4 or 8 bytes, respectively) of the regression coefficients. This is only used when `save.at` is not `NULL`
`sparse`	`TRUE` or `FALSE` to whether matrix `Sigma` is sparse with entries being zero or near-zero
`eps`	(numeric) A numerical zero to determine if entries are near-zero. Default is the machine precision
`verbose`	`TRUE` or `FALSE` to whether printing progress

Details

Finds solutions for the regression coefficients in a linear model

y_i = x'_i β + e_i

where y_i is the response for the i^th observation, x_i = (x_i1,...,x_ip)' is a vector of p predictors assumed to have unit variance, β = (β₁,...,β_p)' is a vector of regression coefficients, and e_i is a residual.

The regression coefficients β are estimated as function of the variance matrix among predictors (Σ) and the covariance vector between response and predictors (Γ) by minimizing the penalized mean squared error function

-Γ' β + 1/2 β' Σ β + λ J(β)

where λ is the penalization parameter and J(β) is a penalty function given by

1/2(1-α)||β||₂² + α||β||₁

where 0 ≤ α ≤ 1, and ||β||₁ = ∑_j=1|β_j| and ||β||₂² = ∑_j=1β_j² are the L1 and (squared) L2-norms, respectively.

The "partial residual" excluding the contribution of the predictor x_ij is

e_i^(j) = y_i - x'_i β + x_ijβ_j

then the ordinary least-squares (OLS) coefficient of x_ij on this residual is (up-to a constant)

β_j^(ols) = Γ_j - Σ'_j β + β_j

where Γ_j is the j^th element of Γ and Σ_j is the j^th column of the matrix Σ.

Coefficients are updated for each j=1,...,p from their current value β_j to a new value β_j(α,λ), given α and λ, by "soft-thresholding" their OLS estimate until convergence as fully described in Friedman (2007).

Value

Returns a list object containing the elements:

lambda: (vector) all the sequence of values of the penalty.
beta: (matrix) regression coefficients for each predictor (in rows) associated to each value of the penalization parameter lambda (in columns).
nsup: (vector) number of non-zero predictors associated to each value of lambda.

The returned object is of the class 'LASSO' for which methods coef and fitted exist. Function 'path.plot' can be also used

References

Friedman J, Hastie T, Höfling H, Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332.

Examples

  require(SFSI)
  data(wheatHTP)
  
  y = as.vector(Y[,"E1"])  # Response variable
  X = scale(X_E1)          # Predictors

  # Training and testing sets
  tst = which(Y$trial %in% 1:10)
  trn = seq_along(y)[-tst]

  # Calculate covariances in training set
  XtX = var(X[trn,])
  Xty = cov(X[trn,],y[trn])
  
  # Run the penalized regression
  fm = solveEN(XtX,Xty,alpha=0.5,nlambda=100) 
  
  # Regression coefficients
  dim(coef(fm))
  dim(coef(fm, ilambda=50)) # Coefficients associated to the 50th lambda
  dim(coef(fm, nsup=25))    # Coefficients with around nsup=25 are non-zero

  # Predicted values
  yHat1 = predict(fm, X=X[trn,])  # training data
  yHat2 = predict(fm, X=X[tst,])  # testing data
  
  # Penalization vs correlation
  plot(-log(fm$lambda[-1]),cor(y[trn],yHat1[,-1]), main="training", type="l")
  plot(-log(fm$lambda[-1]),cor(y[tst],yHat2[,-1]), main="testing", type="l")

SFSI documentation built on Sept. 11, 2024, 9:11 p.m.