Description Details Author(s) References Examples
Functions to perform post-selection inference for forward stepwise regression, least angle regression, the lasso and the many normal means problem. The lasso function also supports logistic regression and the Cox model.
Package: | selectiveInference |
Type: | Package |
License: | GPL-2 |
This package provides tools for inference after selection, in forward stepwise regression, least angle regression, the lasso, and the many normal means problem. The functions compute p-values and selection intervals that properly account for the inherent selection carried out by the procedure. These have exact finite sample type I error and coverage under Gaussian errors. For the logistic and Cox familes (fixedLassoInf), the coverage is asymptotically valid
This R package was developed as part of the selective inference software project in Python and R:
https://github.com/selective-inference
Some of the R code in this work is a modification of Python code from this repository. Here is the current selective inference software team:
Yuval Benjamini, Leonard Blier, Will Fithian, Jason Lee, Joshua Loftus, Joshua Loftus, Stephen Reid, Dennis Sun, Yuekai Sun, Jonathan Taylor, Xiaoying Tian, Ryan Tibshirani, Rob Tibshirani
The main functions included in the package are:
fs
,
fsInf
,
lar
,
larInf
,
fixedLassoInf
,
manyMeans
Ryan Tibshirani, Rob Tibshirani, Jonathan Taylor, Joshua Loftus, Stephen Reid
Maintainer: Rob Tibshirani <tibs@stanford.edu>
Ryan Tibshirani, Jonathan Taylor, Richard Lockhart, and Rob Tibshirani (2014). Exact post-selection inference for sequential regression procedures. arXiv:1401.3889.
Jason Lee, Dennis Sun, Yuekai Sun, and Jonathan Taylor (2013). Exact post-selection inference, with application to the lasso. arXiv:1311.6238.
Stephen Reid, Jonathan Taylor, and Rob Tibshirani (2014). Post-selection point and interval estimation of signal sizes in Gaussian samples. arXiv:1405.3340.
Jonathan Taylor and Robert Tibshirani (2016) Post-selection inference for L1-penalized likelihood models. arXiv:1602.07358
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | set.seed(33)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
# run forward stepwise
fsfit = fs(x,y)
# compute sequential p-values and confidence intervals
# (sigma estimated from full model)
out.seq = fsInf(fsfit)
out.seq
# compute p-values and confidence intervals after AIC stopping
out.aic = fsInf(fsfit,type="aic")
out.aic
# compute p-values and confidence intervals after 5 fixed steps
out.fix = fsInf(fsfit,type="all",k=5)
out.fix
## NOT RUN---lasso at fixed lambda- Gaussian family
## first run glmnet
# gfit = glmnet(x,y)
## extract coef for a given lambda; note the 1/n factor!
## (and we don't save the intercept term)
# lambda = .1
# beta = coef(gfit, s=lambda/n, exact=TRUE)[-1]
## compute fixed lambda p-values and selection intervals
# out = fixedLassoInf(x,y,beta,lambda,sigma=sigma)
# out
#lasso at fixed lambda- logistic family
#set.seed(43)
# n = 50
# p = 10
# sigma = 1
# x = matrix(rnorm(n*p),n,p)
x=scale(x,TRUE,TRUE)
#
# beta = c(3,2,rep(0,p-2))
# y = x%*%beta + sigma*rnorm(n)
# y=1*(y>mean(y))
# first run glmnet
# gfit = glmnet(x,y,standardize=FALSE,family="binomial")
# extract coef for a given lambda; note the 1/n factor!
# (and here we DO include the intercept term)
# lambda = .8
# beta = coef(gfit, s=lambda/n, exact=TRUE)
# # compute fixed lambda p-values and selection intervals
# out = fixedLassoInf(x,y,beta,lambda,family="binomial")
# out
##lasso at fixed lambda- Cox family
#set.seed(43)
# n = 50
# p = 10
# sigma = 1
# x = matrix(rnorm(n*p),n,p)
# x=scale(x,TRUE,TRUE)
# beta = c(3,2,rep(0,p-2))
# tim = as.vector(x%*%beta + sigma*rnorm(n))
# tim= tim-min(tim)+1
#status=sample(c(0,1),size=n,replace=T)
# first run glmnet
# gfit = glmnet(x,Surv(tim,status),standardize=FALSE,family="cox")
# extract coef for a given lambda; note the 1/n factor!
# lambda = 1.5
# beta = as.numeric(coef(gfit, s=lambda/n, exact=TRUE))
# compute fixed lambda p-values and selection intervals
# out = fixedLassoInf(x,tim,beta,lambda,status=status,family="cox")
# out
## NOT RUN---many normal means
# set.seed(12345)
# n = 100
# mu = c(rep(3,floor(n/5)), rep(0,n-floor(n/5)))
# y = mu + rnorm(n)
# out = manyMeans(y, bh.q=0.1)
# out
## NOT RUN---forward stepwise with groups
# set.seed(1)
# n = 20
# p = 40
# x = matrix(rnorm(n*p), nrow=n)
# index = sort(rep(1:(p/2), 2))
# y = rnorm(n) + 2 * x[,1] - x[,4]
# fit = groupfs(x, y, index, maxsteps = 5)
# out = groupfsInf(fit)
# out
## NOT RUN---estimation of sigma for use in fsInf
## (or larInf or fixedLassoInf)
# set.seed(33)
# n = 50
# p = 10
# sigma = 1
# x = matrix(rnorm(n*p),n,p)
# beta = c(3,2,rep(0,p-2))
# y = x%*%beta + sigma*rnorm(n)
## run forward stepwise
# fsfit = fs(x,y)
## estimate sigma
# sigmahat = estimateSigma(x,y)$sigmahat
## run sequential inference with estimated sigma
# out = fsInf(fit,sigma=sigmahat)
# out
|
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16
Loading required package: intervals
Attaching package: 'intervals'
The following object is masked from 'package:Matrix':
expand
Loading required package: survival
Call:
fsInf(obj = fsfit)
Standard deviation of noise (specified or estimated) sigma = 1.027
Sequential testing results with alpha = 0.100
Step Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 1 2.317 13.406 0.000 2.019 2.605 0.049 0.048
2 2 1.703 12.996 0.000 1.486 1.922 0.048 0.050
3 9 -0.265 -1.683 0.487 -0.782 1.152 0.050 0.050
4 8 -0.175 -1.156 0.260 -4.764 1.532 0.050 0.050
5 10 0.173 1.075 0.755 -12.195 3.056 0.050 0.050
6 4 -0.178 -1.140 0.407 -11.057 7.428 0.050 0.050
7 7 0.158 0.979 0.763 -9.225 2.137 0.050 0.050
8 5 0.128 0.896 0.838 -6.737 0.737 0.050 0.050
9 6 -0.036 -0.225 0.303 -Inf Inf 0.000 0.000
10 3 0.037 0.255 0.121 -1.478 Inf 0.050 0.000
Estimated stopping point from ForwardStop rule = 2
Call:
fsInf(obj = fsfit, type = "aic")
Standard deviation of noise (specified or estimated) sigma = 1.027
Testing results at step = 3, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 2.807 15.850 0.000 2.510 3.099 0.049 0.050
2 1.722 13.093 0.000 1.499 1.942 0.049 0.049
9 -0.265 -1.683 0.556 -0.753 1.502 0.050 0.050
Estimated stopping point from AIC rule = 3
Call:
fsInf(obj = fsfit, k = 5, type = "all")
Standard deviation of noise (specified or estimated) sigma = 1.027
Testing results at step = 5, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 2.788 15.588 0.000 2.170 3.144 0.05 0.05
2 1.721 13.013 0.000 1.518 2.547 0.05 0.05
9 -0.214 -1.334 0.409 -1.815 1.120 0.05 0.00
8 -0.219 -1.393 0.323 -5.715 2.656 0.05 0.05
10 0.173 1.075 0.755 -12.195 3.056 0.05 0.05
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.