# lass0: Variable selection for linear regression with Lasso-Zero In lass0: Lasso-Zero for (High-Dimensional) Linear Regression

## Description

Fits a (possibly high-dimensional) linear model with Lasso-Zero. Lasso-Zero aggregates several estimates obtained by solving the basis pursuit problem after concatenating random noise dictionaries to the input matrix. The procedure is described in more details in the paper linked to in the References section below.

## Usage

 ```1 2 3 4``` ```lass0(X, y, tau, alpha, q = nrow(X), M = 30, sigma = NULL, intercept = TRUE, standardizeX = TRUE, standardizeG = NULL, qut.MC.output = NULL, GEVapprox = TRUE, parallel = FALSE, soft.thresholding = FALSE, ols = TRUE, ...) ```

## Arguments

 `X` input matrix of dimension `n x p`; each row is an observation vector. `y` response vector of size `n`. `tau` a positive threshold value. If missing, then `alpha` must be supplied. `alpha` level of the quantile universal threshold (number between 0 and 1). If missing, then `tau` must be supplied. `q` size of noise dictionaries. A noise dictionary consists in a Gaussian matrix G of size `n x q` concatenated horizontally to the input matrix X. Default is `q = nrow(X)`. `M` number of noise dictionaries used. `sigma` standard deviation of the noise. If `sigma = NULL` (default) and `tau = NULL`, the quantile universal threshold is computed based on a pivotal statistic. `intercept` whether an intercept should be fitted. If `TRUE` (default), `y` and the columns of `X` are mean-centered before the analysis, and the intercept is estimated by ```mean(y) - colMeans(X) %*% coefficients```. `standardizeX` whether the columns of `X` should be standardized to have unit standard deviation. Default is `TRUE`. `standardizeG` either a positive numerical value indicating the desired Euclidean norm of all columns of the noise dictionaries, or a logical value indicating whether the columns of the noise dictionaries should be standardized to have unit standard deviation. If `NULL` (default), then it is set to `standardizeG = standardizeX`. `qut.MC.output` an object of type `"qut.MC"` (output of `qut.MC` function), providing the result of Monte Carlo simulations necessary for the approximation of the Quantile Universal Threshold. By default, `qut.MC.output = NULL` and the `qut.MC` function is called unless `tau` is supplied. `GEVapprox` whether to approximate the distribution of the null thresholding statistic by a GEV distribution (ignored if `tau` is supplied). Default is `TRUE`. `parallel` if `TRUE`, use parallel `foreach` to make computations with different noise dictionaries and to perform Monte Carlo simulations for estimating the quantile universal threshold. Must register parallel beforehand, e.g. with `doParallel`. Default is `FALSE`. `soft.thresholding` if `TRUE`, the coefficients are soft thresholded (rather than hard thresholded) at level `tau`. Default is `FALSE`. `ols` whether to refit the nonzero coefficients with an ordinary least squares procedure. Default is `TRUE`. `...` further arguments that can be passed to `qut.MC`.

## Value

An object of class `"lass0"`. It is a list containing the following components:

 `coefficients` estimated regression coefficients. `intercept` intercept value. `fitted.values` fitted values. `residuals` residuals. `selected` set of selected features. `tau` threshold value. `Betas` matrix of size `p x M` containing the values of the `M` estimates for the regression coefficients (on the standardized scale if `standardizeX = TRUE`). `Gammas` matrix of size ```q x M``` containing the values of the `M` obtained noise coefficient vectors (on the standardized scale unless `standardizeG = FALSE`). `madGammas` statistics based on the noise coefficients, corresponding to the MAD of all nonzero entries in `Gammas` `sdsX` standard deviations of all columns of `X`. Can be used to transform `Betas` to the original scale doing `Betas / sdsX`. `qut.MC.output` either the list returned by `qut.MC`, or a character string explaining why `qut.MC` was not called. `quant.type` if tau is NULL, indicates the type of quantile used: "GEV" or "empirical" (even when GEVapprox = TRUE, the empirical quantile is used when gev.fit returns an error) `call` matched call.

## References

Descloux, P., & Sardy, S. (2018). Model selection with lasso-zero: adding straw to the haystack to better find needles. arXiv preprint arXiv:1805.05133. https://arxiv.org/abs/1805.05133

`qut.MC`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47``` ```#### EXAMPLE 1: fast example with 5x10 input matrix and a small number #### (MCrep = 50) of Monte Carlo replications for computing QUT. set.seed(201) ## design matrix n <- 5 p <- 10 X <- matrix(rnorm(n*p), n, p) ## sparse vector S0 <- 1:2 # support beta0 <- rep(0, p) beta0[S0] <- 2 ## response: y <- X[, S0] %*% beta0[S0] + rnorm(n) ## lasso-zero: lass0.obj <- lass0(X, y, alpha = 0.05, MCrep = 50) betahat <- lass0.obj\$coefficients plot(lass0.obj) #### EXAMPLE 2: with 50x100 input matrix set.seed(202) ## design matrix n <- 50 p <- 100 X <- matrix(rnorm(n*p), n, p) ## sparse vector S0 <- 1:3 # support beta0 <- rep(0, p) beta0[S0] <- 2 ## response: y <- X[, S0] %*% beta0[S0] + rnorm(n) ## 1) lasso-zero tuned by QUT with unknown noise level lass0.obj1 <- lass0(X, y, alpha = 0.05) betahat1 <- lass0.obj1\$coefficients plot(lass0.obj1) ## 2) lasso-zero tuned by QUT with known noise level lass0.obj2 <- lass0(X, y, alpha = 0.05, sigma = 1) betahat2 <- lass0.obj2\$coefficients ## 3) lasso-zero with fixed threshold tau = 1 lass0.obj3 <- lass0(X, y, tau = 1) betahat3 <- lass0.obj3\$coefficients ```