# ROSI: Relevant One-step Selective Inference for the LASSO In selectiveInference: Tools for Post-Selection Inference

## Description

Compute p-values and confidence intervals for the lasso estimate, at a fixed value of the tuning parameter lambda using the "relevant" conditioning event of arxiv.org/1801.09037.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 ROSI(X, y, soln, lambda, penalty_factor=NULL, dispersion=1, family=c('gaussian', 'binomial'), solver=c('QP', 'glmnet'), construct_ci=TRUE, debiasing_method=c("JM", "BN"), verbose=FALSE, level=0.9, use_debiased=TRUE) 

## Arguments

 X Matrix of predictors (n by p); y Vector of outcomes (length n) soln Estimated lasso coefficients (e.g., from glmnet). This is of length p (so the intercept is not included as the first component). Be careful! This function uses the "standard" lasso objective 1/2 \|y - X β\|_2^2 + λ \|β\|_1. In contrast, glmnet multiplies the first term by a factor of 1/n. So after running glmnet, to extract the beta corresponding to a value lambda, you need to use beta = coef(obj, s=lambda/n)[-1], where obj is the object returned by glmnet (and [-1] removes the intercept, which glmnet always puts in the first component) lambda Value of lambda used to compute beta. See the above warning penalty_factor Penalty factor as used by glmnet. Actual penalty used in solving the problem is λ \cdot ∑_{i=1}^p f_i |β_i| with f being the penalty_factor. Defaults to vector of 1s. dispersion Estimate of dispersion in the GLM. Can be taken to be 1 for logisitic and should be an estimate of the error variance for the Gaussian. family Family used for likelihood. solver Solver used to solve restricted problems needed to find truncation set. Each active variable requires solving a new LASSO problem obtained by zeroing out one coordinate of original problem. The "QP" choice uses coordinate descent for a specific value of lambda, rather than glmnet which would solve for a new path each time. construct_ci Report confidence intervals or just p-values? debiasing_method Which method should be used for debiasing? Choices are "JM" (Javanmard, Montanari) or "BN" (method described in arxiv.org/1703.03282). verbose Print out progress along the way? Default is FALSE. level Confidence level for intervals. use_debiased Use the debiased estimate of the parameter or not. When FALSE, this is the method desribed in arxiv.org/1801.09037. The default TRUE often produces noticably shorter intervals and more powerful tests when p is comparable to n. Ignored when n

p, this agrees with method in arxiv.org/1801.09037.

???

## Value

 active_set Active set of LASSO. pvalues Two-sided P-values for active variables. intervals Confidence intervals estimate Relaxed (i.e. unshrunk) selected estimates. std_err Standard error of relaxed estimates (pre-selection). dispersion Dispersion parameter. lower_trunc Lower truncation point. The estimates should be outside the interval formed by the lower and upper truncation poitns. upper_trunc Lower truncation point. The estimates should be outside the interval formed by the lower and upper truncation poitns. lambda Value of tuning parameter lambda used. penalty_factor Penalty factor used for solving problem. level Confidence level. call The call to fixedLassoInf.

## Author(s)

Jelena Markovic, Jonathan Taylor

## References

Keli Liu, Jelena Markovic, Robert Tibshirani. More powerful post-selection inference, with application to the Lasso. arXiv:1801.09037

Tom Boot, Didier Nibbering. Inference in high-dimensional linear regression models. arXiv:1703.03282

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 library(selectiveInference) library(glmnet) set.seed(43) n = 100 p = 200 s = 2 sigma = 1 x = matrix(rnorm(n*p),n,p) x = scale(x,TRUE,TRUE) beta = c(rep(10, s), rep(0,p-s)) / sqrt(n) y = x %*% beta + sigma*rnorm(n) # first run glmnet gfit = glmnet(x,y,standardize=FALSE) # extract coef for a given lambda; note the 1/n factor! # (and we don't save the intercept term) lambda = 4 * sqrt(n) lambda_glmnet = 4 / sqrt(n) beta = selectiveInference:::solve_problem_glmnet(x, y, lambda_glmnet, penalty_factor=rep(1, p), family="gaussian") # compute fixed lambda p-values and selection intervals out = ROSI(x, y, beta, lambda, dispersion=sigma^2) out # an alternate approximate inverse from Boot and Nibbering out = ROSI(x, y, beta, lambda, dispersion=sigma^2, debiasing_method="BN") out 

selectiveInference documentation built on Sept. 7, 2019, 9:02 a.m.