selectiveInference: Tools for selective inference
In selectiveInference: Tools for Post-Selection Inference

Description Details Author(s) References Examples

Functions to perform post-selection inference for forward stepwise regression, least angle regression, the lasso and the many normal means problem. The lasso function also supports logistic regression and the Cox model.

Package:	selectiveInference
Type:	Package
License:	GPL-2

This package provides tools for inference after selection, in forward stepwise regression, least angle regression, the lasso, and the many normal means problem. The functions compute p-values and selection intervals that properly account for the inherent selection carried out by the procedure. These have exact finite sample type I error and coverage under Gaussian errors. For the logistic and Cox familes (fixedLassoInf), the coverage is asymptotically valid

This R package was developed as part of the selective inference software project in Python and R:

https://github.com/selective-inference

Some of the R code in this work is a modification of Python code from this repository. Here is the current selective inference software team:

Yuval Benjamini, Leonard Blier, Will Fithian, Jason Lee, Joshua Loftus, Joshua Loftus, Stephen Reid, Dennis Sun, Yuekai Sun, Jonathan Taylor, Xiaoying Tian, Ryan Tibshirani, Rob Tibshirani

The main functions included in the package are: fs, fsInf, lar, larInf, fixedLassoInf, manyMeans

Ryan Tibshirani, Rob Tibshirani, Jonathan Taylor, Joshua Loftus, Stephen Reid

Maintainer: Rob Tibshirani <tibs@stanford.edu>

Ryan Tibshirani, Jonathan Taylor, Richard Lockhart, and Rob Tibshirani (2014). Exact post-selection inference for sequential regression procedures. arXiv:1401.3889.

Jason Lee, Dennis Sun, Yuekai Sun, and Jonathan Taylor (2013). Exact post-selection inference, with application to the lasso. arXiv:1311.6238.

Stephen Reid, Jonathan Taylor, and Rob Tibshirani (2014). Post-selection point and interval estimation of signal sizes in Gaussian samples. arXiv:1405.3340.

Jonathan Taylor and Robert Tibshirani (2016) Post-selection inference for L1-penalized likelihood models. arXiv:1602.07358

set.seed(33)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)

# run forward stepwise
fsfit = fs(x,y)

# compute sequential p-values and confidence intervals
# (sigma estimated from full model)
out.seq = fsInf(fsfit)
out.seq

# compute p-values and confidence intervals after AIC stopping
out.aic = fsInf(fsfit,type="aic")
out.aic

# compute p-values and confidence intervals after 5 fixed steps
out.fix = fsInf(fsfit,type="all",k=5)
out.fix

## NOT RUN---lasso at fixed lambda- Gaussian family
## first run glmnet
# gfit = glmnet(x,y)

## extract coef for a given lambda; note the 1/n factor!
## (and we don't save the intercept term)
# lambda = .1
# beta = coef(gfit, s=lambda/n, exact=TRUE)[-1]

## compute fixed lambda p-values and selection intervals
# out = fixedLassoInf(x,y,beta,lambda,sigma=sigma)
# out


#lasso at fixed lambda- logistic family
#set.seed(43)
  #   n = 50
  #   p = 10
  #   sigma = 1
     
 #    x = matrix(rnorm(n*p),n,p)
     x=scale(x,TRUE,TRUE)
  #   
#     beta = c(3,2,rep(0,p-2))
 #    y = x%*%beta + sigma*rnorm(n)
 #    y=1*(y>mean(y))
     # first run glmnet
 #    gfit = glmnet(x,y,standardize=FALSE,family="binomial")
     
     # extract coef for a given lambda; note the 1/n factor!
     # (and here  we DO  include the intercept term)
 #    lambda = .8
 #    beta = coef(gfit, s=lambda/n, exact=TRUE)
     
 #    # compute fixed lambda p-values and selection intervals
 #    out = fixedLassoInf(x,y,beta,lambda,family="binomial")
 #    out

##lasso at fixed lambda- Cox family
#set.seed(43)
#     n = 50
 #    p = 10
 #    sigma = 1
     
 #    x = matrix(rnorm(n*p),n,p)
  #   x=scale(x,TRUE,TRUE)
     
 #    beta = c(3,2,rep(0,p-2))
 #    tim = as.vector(x%*%beta + sigma*rnorm(n))
  #   tim= tim-min(tim)+1
#status=sample(c(0,1),size=n,replace=T)
     # first run glmnet
   #  gfit = glmnet(x,Surv(tim,status),standardize=FALSE,family="cox")
     # extract coef for a given lambda; note the 1/n factor!
   
  #   lambda = 1.5
  #   beta = as.numeric(coef(gfit, s=lambda/n, exact=TRUE))
     
     # compute fixed lambda p-values and selection intervals
   #  out = fixedLassoInf(x,tim,beta,lambda,status=status,family="cox")
   #  out
## NOT RUN---many normal means
# set.seed(12345)
# n = 100 
# mu = c(rep(3,floor(n/5)), rep(0,n-floor(n/5))) 
# y = mu + rnorm(n)
# out = manyMeans(y, bh.q=0.1)
# out

## NOT RUN---forward stepwise with groups
# set.seed(1)
# n = 20
# p = 40
# x = matrix(rnorm(n*p), nrow=n)
# index = sort(rep(1:(p/2), 2))
# y = rnorm(n) + 2 * x[,1] - x[,4]
# fit = groupfs(x, y, index, maxsteps = 5)
# out = groupfsInf(fit)
# out

## NOT RUN---estimation of sigma for use in fsInf
## (or larInf or fixedLassoInf)
# set.seed(33)
# n = 50
# p = 10
# sigma = 1
# x = matrix(rnorm(n*p),n,p)
# beta = c(3,2,rep(0,p-2))
# y = x%*%beta + sigma*rnorm(n)

## run forward stepwise
# fsfit = fs(x,y)

## estimate sigma
# sigmahat = estimateSigma(x,y)$sigmahat

## run sequential inference with estimated sigma
# out = fsInf(fit,sigma=sigmahat)
# out

Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

Loading required package: intervals

Attaching package: 'intervals'

The following object is masked from 'package:Matrix':

    expand

Loading required package: survival

Call:
fsInf(obj = fsfit)

Standard deviation of noise (specified or estimated) sigma = 1.027

Sequential testing results with alpha = 0.100
 Step Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
    1   1  2.317  13.406   0.000     2.019    2.605       0.049      0.048
    2   2  1.703  12.996   0.000     1.486    1.922       0.048      0.050
    3   9 -0.265  -1.683   0.487    -0.782    1.152       0.050      0.050
    4   8 -0.175  -1.156   0.260    -4.764    1.532       0.050      0.050
    5  10  0.173   1.075   0.755   -12.195    3.056       0.050      0.050
    6   4 -0.178  -1.140   0.407   -11.057    7.428       0.050      0.050
    7   7  0.158   0.979   0.763    -9.225    2.137       0.050      0.050
    8   5  0.128   0.896   0.838    -6.737    0.737       0.050      0.050
    9   6 -0.036  -0.225   0.303      -Inf      Inf       0.000      0.000
   10   3  0.037   0.255   0.121    -1.478      Inf       0.050      0.000

Estimated stopping point from ForwardStop rule = 2

Call:
fsInf(obj = fsfit, type = "aic")

Standard deviation of noise (specified or estimated) sigma = 1.027

Testing results at step = 3, with alpha = 0.100
 Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
   1  2.807  15.850   0.000     2.510    3.099       0.049      0.050
   2  1.722  13.093   0.000     1.499    1.942       0.049      0.049
   9 -0.265  -1.683   0.556    -0.753    1.502       0.050      0.050

Estimated stopping point from AIC rule = 3

Call:
fsInf(obj = fsfit, k = 5, type = "all")

Standard deviation of noise (specified or estimated) sigma = 1.027

Testing results at step = 5, with alpha = 0.100
 Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
   1  2.788  15.588   0.000     2.170    3.144        0.05       0.05
   2  1.721  13.013   0.000     1.518    2.547        0.05       0.05
   9 -0.214  -1.334   0.409    -1.815    1.120        0.05       0.00
   8 -0.219  -1.393   0.323    -5.715    2.656        0.05       0.05
  10  0.173   1.075   0.755   -12.195    3.056        0.05       0.05