lars2: Least Angle Regression to solve the LASSO-type problem
In MarcooLopez/SFSI_data: Sparse Family and Selection Index

Description Usage Arguments Details Value Author(s) References Examples

View source: R/lars.R

Computes the entire LASSO solution for the regression coefficients, starting from zero, to the least-squares estimates, via the Least Angle Regression (LARS) algorithm (Efron, 2004). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors.

1 2	lars2(P, cov, method = c("LAR", "LAR-LASSO"), maxDF = NULL, eps = .Machine$double.eps, scale = TRUE, verbose = FALSE)

`P`	Variance-covariance matrix among predictors
`cov`	Covariance vector between response variable and predictors
`method`	One of: `'LAR'`: Computes the entire sequence of all coefficients. Values of lambdas are calculated at each step. `'LAR-LASSO'`: Similar to `'LAR'` but solutions when a predictor leaves the solution are also returned. Default is `method='LAR'`
`maxDF`	Maximum number of predictors in the last lars solution. Default `maxDF=NULL` will calculate solution for all the predictors
`eps`	An effective zero. Default is the machine precision
`scale`	`TRUE` or `FALSE` to recalculate the matrix `P` for variables with unit variance and scale `cov` by the standard deviation of the corresponding predictor taken from the diagonal of `P`
`verbose`	`TRUE` or `FALSE` to whether printing each lars step

Finds solutions for the regression coefficients in a linear model

y_i = x'_i β + e_i

where y_i is the response for the i^th observation, x_i=(x_i1,...,x_ip)' is a vector of p predictors assumed to have unit variance, β=(β₁,...,β_p)' is a vector of regression coefficients, and e_i is a residual.

The regression coefficients β are estimated as function of the variance matrix among predictors (P) and the covariance vector between response and predictors (cov) by minimizing the penalized mean squared error function

-cov' β + 1/2 β'Pβ + 1/2 λ ||β||₁

where λ is the penalization parameter and ||β||₁ = ∑|β_j| is the L1-norm.

The algorithm to find solutions for each β_j is fully described in Efron (2004) in which the "current correlation" between the predictor x_ij and the residual e_i = y_i - x'_i β is expressed (up-to a constant) as

r_j = cov_j - P'_j β

where cov_j is the j^th element of cov and P_j is the j^th column of the matrix P

List with the following elements:

beta: vector of regression coefficients.
lambda: penalty of LASSO-type problem for all the sequence of coefficients.
df: degrees of freedom, number of non-zero predictors at each solution.
sdx: vector of standard deviation of predictors.

The returned object is of the class 'SSI' for which methods fitted exist. Function plotPath can be also used

Marco Lopez-Cruz (lopezcru@msu.edu) and Gustavo de los Campos. Adapted from 'lars' package (Hastie & Efron, 2013)

Efron B, Hastie T, Johnstone I, Tibshirani R (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.

Friedman J, Hastie T, Tibshirani R(2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

Hastie T, Efron B (2013). lars: least angle regression, Lasso and forward stagewise. https://cran.r-project.org/package=lars.

Tibshirani R (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58(1), 267–288.

  require(SFSI)
  data(wheatHTP)
  y = as.vector(Y[,"YLD"])  # Response variable
  X = scale(WL)             # Predictors

  # Training and testing sets
  tst = sample(seq_along(y),ceiling(0.3*length(y)))
  trn = seq_along(y)[-tst]

  # Calculate covariances in training set
  XtX = var(X[trn,])
  Xty = cov(y[trn],X[trn,])
  
  # Run the penalized regression
  fm = lars2(XtX,Xty)  
  fm = lars2(XtX,Xty,method="LAR-LASSO")  
  
  # Predicted values
  yHat1 = fitted(fm, X=X[trn,])  # training data
  yHat2 = fitted(fm, X=X[tst,])  # testing data
  
  # Penalization vs correlation
  plot(-log(fm$lambda),cor(y[trn],yHat1)[1,], main="training")
  plot(-log(fm$lambda),cor(y[tst],yHat2)[1,], main="testing")