lass0: Variable selection for linear regression with Lasso-Zero

Description Usage Arguments Value References See Also Examples

View source: R/lass0.R

Description

Fits a (possibly high-dimensional) linear model with Lasso-Zero. Lasso-Zero aggregates several estimates obtained by solving the basis pursuit problem after concatenating random noise dictionaries to the input matrix. The procedure is described in more details in the paper linked to in the References section below.

Usage

1
2
3
4
lass0(X, y, tau, alpha, q = nrow(X), M = 30, sigma = NULL,
  intercept = TRUE, standardizeX = TRUE, standardizeG = NULL,
  qut.MC.output = NULL, GEVapprox = TRUE, parallel = FALSE,
  soft.thresholding = FALSE, ols = TRUE, ...)

Arguments

X

input matrix of dimension n x p; each row is an observation vector.

y

response vector of size n.

tau

a positive threshold value. If missing, then alpha must be supplied.

alpha

level of the quantile universal threshold (number between 0 and 1). If missing, then tau must be supplied.

q

size of noise dictionaries. A noise dictionary consists in a Gaussian matrix G of size n x q concatenated horizontally to the input matrix X. Default is q = nrow(X).

M

number of noise dictionaries used.

sigma

standard deviation of the noise. If sigma = NULL (default) and tau = NULL, the quantile universal threshold is computed based on a pivotal statistic.

intercept

whether an intercept should be fitted. If TRUE (default), y and the columns of X are mean-centered before the analysis, and the intercept is estimated by mean(y) - colMeans(X) %*% coefficients.

standardizeX

whether the columns of X should be standardized to have unit standard deviation. Default is TRUE.

standardizeG

either a positive numerical value indicating the desired Euclidean norm of all columns of the noise dictionaries, or a logical value indicating whether the columns of the noise dictionaries should be standardized to have unit standard deviation. If NULL (default), then it is set to standardizeG = standardizeX.

qut.MC.output

an object of type "qut.MC" (output of qut.MC function), providing the result of Monte Carlo simulations necessary for the approximation of the Quantile Universal Threshold. By default, qut.MC.output = NULL and the qut.MC function is called unless tau is supplied.

GEVapprox

whether to approximate the distribution of the null thresholding statistic by a GEV distribution (ignored if tau is supplied). Default is TRUE.

parallel

if TRUE, use parallel foreach to make computations with different noise dictionaries and to perform Monte Carlo simulations for estimating the quantile universal threshold. Must register parallel beforehand, e.g. with doParallel. Default is FALSE.

soft.thresholding

if TRUE, the coefficients are soft thresholded (rather than hard thresholded) at level tau. Default is FALSE.

ols

whether to refit the nonzero coefficients with an ordinary least squares procedure. Default is TRUE.

...

further arguments that can be passed to qut.MC.

Value

An object of class "lass0". It is a list containing the following components:

coefficients

estimated regression coefficients.

intercept

intercept value.

fitted.values

fitted values.

residuals

residuals.

selected

set of selected features.

tau

threshold value.

Betas

matrix of size p x M containing the values of the M estimates for the regression coefficients (on the standardized scale if standardizeX = TRUE).

Gammas

matrix of size q x M containing the values of the M obtained noise coefficient vectors (on the standardized scale unless standardizeG = FALSE).

madGammas

statistics based on the noise coefficients, corresponding to the MAD of all nonzero entries in Gammas

sdsX

standard deviations of all columns of X. Can be used to transform Betas to the original scale doing Betas / sdsX.

qut.MC.output

either the list returned by qut.MC, or a character string explaining why qut.MC was not called.

quant.type

if tau is NULL, indicates the type of quantile used: "GEV" or "empirical" (even when GEVapprox = TRUE, the empirical quantile is used when gev.fit returns an error)

call

matched call.

References

Descloux, P., & Sardy, S. (2018). Model selection with lasso-zero: adding straw to the haystack to better find needles. arXiv preprint arXiv:1805.05133. https://arxiv.org/abs/1805.05133

See Also

qut.MC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#### EXAMPLE 1: fast example with 5x10 input matrix and a small number 
#### (MCrep = 50) of Monte Carlo replications for computing QUT.

set.seed(201)
## design matrix
n <- 5
p <- 10
X <- matrix(rnorm(n*p), n, p)
## sparse vector
S0 <- 1:2 # support
beta0 <- rep(0, p)
beta0[S0] <- 2
## response:
y <- X[, S0] %*% beta0[S0] + rnorm(n)
## lasso-zero:
lass0.obj <- lass0(X, y, alpha = 0.05, MCrep = 50)
betahat <- lass0.obj$coefficients
plot(lass0.obj)


#### EXAMPLE 2: with 50x100 input matrix


set.seed(202)
## design matrix
n <- 50
p <- 100
X <- matrix(rnorm(n*p), n, p)
## sparse vector
S0 <- 1:3 # support
beta0 <- rep(0, p)
beta0[S0] <- 2
## response:
y <- X[, S0] %*% beta0[S0] + rnorm(n)

## 1) lasso-zero tuned by QUT with unknown noise level
lass0.obj1 <- lass0(X, y, alpha = 0.05)
betahat1 <- lass0.obj1$coefficients
plot(lass0.obj1)

## 2) lasso-zero tuned by QUT with known noise level
lass0.obj2 <- lass0(X, y, alpha = 0.05, sigma = 1)
betahat2 <- lass0.obj2$coefficients

## 3) lasso-zero with fixed threshold tau = 1
lass0.obj3 <- lass0(X, y, tau = 1)
betahat3 <- lass0.obj3$coefficients

lass0 documentation built on Dec. 19, 2019, 1:09 a.m.

Related to lass0 in lass0...