LARS: Least Angle Regression to solve LASSO-type problems

View source: R/LARS.R

4. Least Angle Regression (LARS)R Documentation

Least Angle Regression to solve LASSO-type problems

Description

Computes the entire LASSO solution for the regression coefficients, starting from zero, to the least-squares estimates, via the Least Angle Regression (LARS) algorithm (Efron, 2004). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors.

Usage

LARS(Sigma, Gamma, method = c("LAR","LASSO"),
     nsup.max = NULL, steps.max = NULL, 
     eps = .Machine$double.eps*100, scale = TRUE, 
     sdx = NULL, mc.cores = 1L, save.at = NULL,
     precision.format = c("double","single"),
     fileID = NULL, verbose = 1)

Arguments

Sigma

(numeric matrix) Variance-covariance matrix of predictors

Gamma

(numeric matrix) Covariance between response variable and predictors. If it contains more than one column, the algorithm is applied to each column separately as different response variables

method

(character) Either:

  • 'LAR': Computes the entire sequence of all coefficients. Values of lambdas are calculated at each step.

  • 'LASSO': Similar to 'LAR' but solutions when a predictor leaves the solution are also returned.

Default is method = 'LAR'

nsup.max

(integer) Maximum number of non-zero coefficients in the last LARS solution. Default nsup.max = NULL will calculate solutions for the entire lambda sequence

steps.max

(integer) Maximum number of steps (i.e., solutions) to be computed. Default steps.max = NULL will calculate solutions for the entire lambda sequence

eps

(numeric) A numerical zero. Default is the machine precision

scale

TRUE or FALSE to scale matrix Sigma for variables with unit variance and scale Gamma by the standard deviation (sdx) of the corresponding predictor taken from the diagonal of Sigma

sdx

(numeric vector) Scaling factor that will be used to scale the regression coefficients. When scale = TRUE this scaling factor vector is set to the squared root of the diagonal of Sigma, otherwise a provided value is used assuming that Sigma and Gamma are scaled

mc.cores

(integer) Number of cores used. When mc.cores > 1, the analysis is run in parallel for each column of Gamma. Default is mc.cores = 1

save.at

(character) Path where regression coefficients are to be saved (this may include a prefix added to the files). Default save.at = NULL will no save the regression coefficients and they are returned in the output object

fileID

(character) Suffix added to the file name where regression coefficients are to be saved. Default fileID = NULL will automatically add sequential integers from 1 to the number of columns of Gamma

precision.format

(character) Either 'single' or 'double' for numeric precision and memory occupancy (4 or 8 bytes, respectively) of the regression coefficients. This is only used when save.at is not NULL

verbose

If numeric greater than zero details on each LARS step will be printed

Details

Finds solutions for the regression coefficients in a linear model

yi = x'i β + ei

where yi is the response for the ith observation, xi = (xi1,...,xip)' is a vector of p predictors assumed to have unit variance, β = (β1,...,βp)' is a vector of regression coefficients, and ei is a residual.

The regression coefficients β are estimated as function of the variance matrix among predictors (Σ) and the covariance vector between response and predictors (Γ) by minimizing the penalized mean squared error function

-Γ' β + 1/2 β'Σβ + 1/2 λ ||β||1

where λ is the penalization parameter and ||β||1 = ∑j=1j| is the L1-norm.

The algorithm to find solutions for each βj is fully described in Efron (2004) in which the "current correlation" between the predictor xij and the residual ei = yi - x'i β is expressed (up-to a constant) as

rj = Γj - Σ'j β

where Γj is the jth element of Γ and Σj is the jth column of the matrix Σ

Value

Returns a list object with the following elements:

  • lambda: (vector) all the sequence of values of the LASSO penalty.

  • beta: (matrix) regression coefficients for each predictor (in rows) associated to each value of the penalization parameter lambda (in columns).

  • nsup: (vector) number of non-zero predictors associated to each value of lambda.

The returned object is of the class 'LASSO' for which methods coef and predict exist. Function 'path.plot' can be also used

Author(s)

Adapted from the 'lars' function in package 'lars' (Hastie & Efron, 2013)

References

Efron B, Hastie T, Johnstone I, Tibshirani R (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.

Hastie T, Efron B (2013). lars: least angle regression, Lasso and forward stagewise. https://cran.r-project.org/package=lars.

Examples

  require(SFSI)
  data(wheatHTP)
  
  y = as.vector(Y[,"E1"])   # Response variable
  X = scale(X_E1)           # Predictors

  # Training and testing sets
  tst = which(Y$trial %in% 1:10)
  trn = seq_along(y)[-tst]

  # Calculate covariances in training set
  XtX = var(X[trn,])
  Xty = cov(X[trn,],y[trn])
  
  # Run the penalized regression
  fm = LARS(XtX, Xty, method="LASSO")  
  
  # Regression coefficients
  dim(coef(fm))
  dim(coef(fm, ilambda=50)) # Coefficients associated to the 50th lambda
  dim(coef(fm, nsup=25))    # Coefficients with around nsup=25 are non-zero

  # Predicted values
  yHat1 = predict(fm, X=X[trn,])  # training data
  yHat2 = predict(fm, X=X[tst,])  # testing data
  
  # Penalization vs correlation
  plot(-log(fm$lambda[-1]),cor(y[trn],yHat1[,-1]), main="Training", type="l")
  plot(-log(fm$lambda[-1]),cor(y[tst],yHat2[,-1]), main="Testing", type="l")

SFSI documentation built on Sept. 11, 2024, 9:11 p.m.