lasso.proj: P-values based on lasso projection method In hdi: High-Dimensional Inference

Description

Compute p-values based on the lasso projection method, also known as the de-sparsified Lasso, using an asymptotic gaussian approximation to the distribution of the estimator.

Usage

 ```1 2 3 4 5 6``` ```lasso.proj(x, y, family = "gaussian", standardize = TRUE, multiplecorr.method = "holm", N = 10000, parallel = FALSE, ncores = getOption("mc.cores", 2L), betainit = "cv lasso", sigma = NULL, Z = NULL, verbose = FALSE, return.Z = FALSE, suppress.grouptesting = FALSE, robust = FALSE, do.ZnZ = FALSE) ```

Arguments

 `x` Design matrix (without intercept). `y` Response vector. `family` family `standardize` Should design matrix be standardized to unit column standard deviation. `multiplecorr.method` Either "WY" or any of `p.adjust.methods`. `N` Number of empirical samples (only used if multiplecorr.method == "WY") `parallel` Should parallelization be used? (logical) `ncores` Number of cores used for parallelization. `betainit` Either a numeric vector, corresponding to a sparse estimate of the coefficient vector, or the method to be used for the initial estimation, "scaled lasso" or "cv lasso". `sigma` Estimate of the standard deviation of the error term. This estimate needs to be compatible with the initial estimate (see betainit) provided or calculated. Otherwise, results will not be correct. `Z` user input, also see `return.Z` below `verbose` A boolean to enable reporting on the progress of the computations. (Only prints out information when Z is not provided by the user) `return.Z` An option to return the intermediate result which only depends on the design matrix x. This intermediate results can be used when calling the function again and the design matrix is the same as before. `suppress.grouptesting` A boolean to optionally suppress the preparations made for testing groups. This will avoid quite a bit of computation and memory usage. The output will also be smaller. `robust` Uses a robust variance estimation procedure to be able to deal with model misspecification. `do.ZnZ` Use a slightly different way of choosing tuning parameters to compute Z, called Z&Z based on Zhang and Zhang (2014). This choice of tuning parameter results in a slightly higher variance of the estimator. More concretely, it achieves a 25 variance of the estimator (over j=1..ncol(x)) in comparison to tuning with cross-validation.

Value

 `pval` Individual p-values for each parameter. `pval.corr` Multiple testing corrected p-values for each parameter. `groupTest` Function to perform groupwise tests. Groups are indicated using an index vector with entries in 1,...,p or a list thereof. `clusterGroupTest` Function to perform groupwise tests based on hierarchical clustering. You can either provide a distance matrix and clustering method or the output of hierarchical clustering from the function `hclust` as for `clusterGroupBound`. P-values are adjusted for multiple testing.
 `sigmahat` \widehat{σ} coming from the scaled lasso. `Z` Only different from NULL if the option return.Z is on. This is an intermediate result from the computation which only depends on the design matrix x. These are the residuals of the nodewise regressions.

Ruben Dezeure

References

van de Geer, S., B<c3><bc>hlmann, P., Ritov, Y. and Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42, 1166–1202._

Zhang, C., Zhang, S. (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B 76, 217–242.

B<c3><bc>hlmann, P. and van de Geer, S. (2015) High-dimensional inference in misspecified linear models. Electronic Journal of Statistics 9, 1449–1473.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ```x <- matrix(rnorm(100*20), nrow = 100, ncol = 10) y <- x[,1] + x[,2] + rnorm(100) fit.lasso <- lasso.proj(x, y) which(fit.lasso\$pval.corr < 0.05) # typically: '1' and '2' and no other ## Group-wise testing of the first two coefficients fit.lasso\$groupTest(1:2) ##Compute confidence intervals confint(fit.lasso, level = 0.95) ## Hierarchical testing using distance matrix based on ## correlation matrix out.clust <- fit.lasso\$clusterGroupTest() plot(out.clust) ## Fit the lasso projection method without doing the preparations ## for group testing (saves time and memory) fit.lasso.faster <- lasso.proj(x, y, suppress.grouptesting = TRUE) ## Use the scaled lasso for the initial estimate fit.lasso.scaled <- lasso.proj(x, y, betainit = "scaled lasso") which(fit.lasso.scaled\$pval.corr < 0.05) ## Use a robust estimate for the standard error fit.lasso.robust <- lasso.proj(x, y, robust = TRUE) which(fit.lasso.robust\$pval.corr < 0.05) ## Perform the Z&Z version of the lasso projection method fit.lasso <- lasso.proj(x, y, do.ZnZ = TRUE) which(fit.lasso\$pval.corr < 0.05) # typically: '1' and '2' and no other ```

hdi documentation built on May 31, 2017, 2:57 a.m.