lasso.proj: P-values based on lasso projection method

Description Usage Arguments Value Author(s) References Examples

View source: R/lasso-proj.R

Description

Compute p-values based on the lasso projection method, also known as the de-sparsified Lasso, using an asymptotic gaussian approximation to the distribution of the estimator.

Usage

1
2
3
4
5
6
lasso.proj(x, y, family = "gaussian", standardize = TRUE,
           multiplecorr.method = "holm", N = 10000,
           parallel = FALSE, ncores = getOption("mc.cores", 2L),
           betainit = "cv lasso", sigma = NULL, Z = NULL, verbose = FALSE,
           return.Z = FALSE, suppress.grouptesting = FALSE, robust = FALSE,
	   do.ZnZ = FALSE)

Arguments

x

Design matrix (without intercept).

y

Response vector.

family

family

standardize

Should design matrix be standardized to unit column standard deviation.

multiplecorr.method

Either "WY" or any of p.adjust.methods.

N

Number of empirical samples (only used if multiplecorr.method == "WY")

parallel

Should parallelization be used? (logical)

ncores

Number of cores used for parallelization.

betainit

Either a numeric vector, corresponding to a sparse estimate of the coefficient vector, or the method to be used for the initial estimation, "scaled lasso" or "cv lasso".

sigma

Estimate of the standard deviation of the error term. This estimate needs to be compatible with the initial estimate (see betainit) provided or calculated. Otherwise, results will not be correct.

Z

user input, also see return.Z below

verbose

A boolean to enable reporting on the progress of the computations. (Only prints out information when Z is not provided by the user)

return.Z

An option to return the intermediate result which only depends on the design matrix x. This intermediate results can be used when calling the function again and the design matrix is the same as before.

suppress.grouptesting

A boolean to optionally suppress the preparations made for testing groups. This will avoid quite a bit of computation and memory usage. The output will also be smaller.

robust

Uses a robust variance estimation procedure to be able to deal with model misspecification.

do.ZnZ

Use a slightly different way of choosing tuning parameters to compute Z, called Z&Z based on Zhang and Zhang (2014). This choice of tuning parameter results in a slightly higher variance of the estimator. More concretely, it achieves a 25 variance of the estimator (over j=1..ncol(x)) in comparison to tuning with cross-validation.

Value

pval

Individual p-values for each parameter.

pval.corr

Multiple testing corrected p-values for each parameter.

groupTest

Function to perform groupwise tests. Groups are indicated using an index vector with entries in 1,...,p or a list thereof.

clusterGroupTest

Function to perform groupwise tests based on hierarchical clustering. You can either provide a distance matrix and clustering method or the output of hierarchical clustering from the function hclust as for clusterGroupBound. P-values are adjusted for multiple testing.

sigmahat

\widehat{σ} coming from the scaled lasso.

Z

Only different from NULL if the option return.Z is on. This is an intermediate result from the computation which only depends on the design matrix x. These are the residuals of the nodewise regressions.

Author(s)

Ruben Dezeure

References

van de Geer, S., B<c3><bc>hlmann, P., Ritov, Y. and Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42, 1166–1202._

Zhang, C., Zhang, S. (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B 76, 217–242.

B<c3><bc>hlmann, P. and van de Geer, S. (2015) High-dimensional inference in misspecified linear models. Electronic Journal of Statistics 9, 1449–1473.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
x <- matrix(rnorm(100*20), nrow = 100, ncol = 10)
y <- x[,1] + x[,2] + rnorm(100)
fit.lasso <- lasso.proj(x, y)
which(fit.lasso$pval.corr < 0.05) # typically: '1' and '2' and no other

## Group-wise testing of the first two coefficients
fit.lasso$groupTest(1:2)

##Compute confidence intervals
confint(fit.lasso, level = 0.95)


## Hierarchical testing using distance matrix based on
## correlation matrix
out.clust <- fit.lasso$clusterGroupTest()
plot(out.clust)

## Fit the lasso projection method without doing the preparations
## for group testing (saves time and memory)
fit.lasso.faster <- lasso.proj(x, y, suppress.grouptesting = TRUE)

## Use the scaled lasso for the initial estimate
fit.lasso.scaled <- lasso.proj(x, y, betainit = "scaled lasso")
which(fit.lasso.scaled$pval.corr < 0.05)

## Use a robust estimate for the standard error
fit.lasso.robust <- lasso.proj(x, y, robust = TRUE)
which(fit.lasso.robust$pval.corr < 0.05)


## Perform the Z&Z version of the lasso projection method
fit.lasso <- lasso.proj(x, y, do.ZnZ = TRUE)

which(fit.lasso$pval.corr < 0.05) # typically: '1' and '2' and no other

hdi documentation built on May 31, 2017, 2:57 a.m.