fsInf: Selective inference for forward stepwise regression

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/funs.fs.R

Description

Computes p-values and confidence intervals for forward stepwise regression

Usage

1
2
fsInf(obj, sigma=NULL, alpha=0.1, k=NULL, type=c("active","all","aic"), 
      gridrange=c(-100,100), bits=NULL, mult=2, ntimes=2, verbose=FALSE) 

Arguments

obj

Object returned by fs function

sigma

Estimate of error standard deviation. If NULL (default), this is estimated using the mean squared residual of the full least squares fit when n >= 2p, and using the standard deviation of y when n < 2p. In the latter case, the user should use estimateSigma function for a more accurate estimate

alpha

Significance level for confidence intervals (target is miscoverage alpha/2 in each tail)

k

See "type" argument below. Default is NULL, in which case k is taken to be the the number of steps computed in the forward stepwise path

type

Type of analysis desired: with "active" (default), p-values and confidence intervals are computed for each predictor as it is entered into the active step, all the way through k steps; with "all", p-values and confidence intervals are computed for all variables in the active model after k steps; with "aic", the number of steps k is first estimated using a modified AIC criterion, and then the same type of analysis as in "all" is carried out for this particular value of k.

Note that the AIC scheme is defined to choose a number of steps k after which the AIC criterion increases ntimes in a row, where ntimes can be specified by the user (see below). Under this definition, the AIC selection event is characterizable as a polyhedral set, and hence the extra conditioning can be taken into account exactly. Also note that an analogous BIC scheme can be specified through the mult argument (see below)

gridrange

Grid range for constructing confidence intervals, on the standardized scale

bits

Number of bits to be used for p-value and confidence interval calculations. Default is NULL, in which case standard floating point calculations are performed. When not NULL, multiple precision floating point calculations are performed with the specified number of bits, using the R package Rmpfr (if this package is not installed, then a warning is thrown, and standard floating point calculations are pursued). Note: standard double precision uses 53 bits so, e.g., a choice of 200 bits uses about 4 times double precision. The confidence interval computation is sometimes numerically challenging, and the extra precision can be helpful (though computationally more costly). In particular, extra precision might be tried if the values in the output columns of tailarea differ noticeably from alpha/2.

mult

Multiplier for the AIC-style penalty. Hence a value of 2 (default) gives AIC, whereas a value of log(n) would give BIC

ntimes

Number of steps for which AIC-style criterion has to increase before minimizing point is declared

verbose

Print out progress along the way? Default is FALSE

Details

This function computes selective p-values and confidence intervals (selection intervals) for forward stepwise regression. The default is to report the results for each predictor after its entry into the model. See the "type" argument for other options. The confidence interval construction involves numerical search and can be fragile: if the observed statistic is too close to either end of the truncation interval (vlo and vup, see references), then one or possibly both endpoints of the interval of desired coverage cannot be computed, and default to +/- Inf. The output tailarea gives the achieved Gaussian tail areas for the reported intervals—these should be close to alpha/2, and can be used for error-checking purposes.

Value

type

Type of analysis (active, all, or aic)

k

Value of k specified in call

khat

When type is "active", this is an estimated stopping point declared by forwardStop; when type is "aic", this is the value chosen by the modified AIC scheme

pv

One sided P-values for active variables, uses the sign that a variable entered the model with.

ci

Confidence intervals

tailarea

Realized tail areas (lower and upper) for each confidence interval

vlo

Lower truncation limits for statistics

vup

Upper truncation limits for statistics

vmat

Linear contrasts that define the observed statistics

y

Vector of outcomes

vars

Variables in active set

sign

Signs of active coefficients

alpha

Desired coverage (alpha/2 in each tail)

sigma

Value of error standard deviation (sigma) used

call

The call to fsInf

Author(s)

Ryan Tibshirani, Rob Tibshirani, Jonathan Taylor, Joshua Loftus, Stephen Reid

References

Ryan Tibshirani, Jonathan Taylor, Richard Lockhart, and Rob Tibshirani (2014). Exact post-selection inference for sequential regression procedures. arXiv:1401.3889.

Joshua Loftus and Jonathan Taylor (2014). A significance test for forward stepwise model selection. arXiv:1405.3920.

See Also

fs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
set.seed(33)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)

# run forward stepwise
fsfit = fs(x,y)

# compute sequential p-values and confidence intervals
# (sigma estimated from full model)
out.seq = fsInf(fsfit)
out.seq

# compute p-values and confidence intervals after AIC stopping
out.aic = fsInf(fsfit,type="aic")
out.aic

# compute p-values and confidence intervals after 5 fixed steps
out.fix = fsInf(fsfit,type="all",k=5)
out.fix

Example output

Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

Loading required package: intervals

Attaching package: 'intervals'

The following object is masked from 'package:Matrix':

    expand

Loading required package: survival

Call:
fsInf(obj = fsfit)

Standard deviation of noise (specified or estimated) sigma = 1.027

Sequential testing results with alpha = 0.100
 Step Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
    1   1  2.317  13.406   0.000     2.019    2.605       0.049      0.048
    2   2  1.703  12.996   0.000     1.486    1.922       0.048      0.050
    3   9 -0.265  -1.683   0.487    -0.782    1.152       0.050      0.050
    4   8 -0.175  -1.156   0.260    -4.764    1.532       0.050      0.050
    5  10  0.173   1.075   0.755   -12.195    3.056       0.050      0.050
    6   4 -0.178  -1.140   0.407   -11.057    7.428       0.050      0.050
    7   7  0.158   0.979   0.763    -9.225    2.137       0.050      0.050
    8   5  0.128   0.896   0.838    -6.737    0.737       0.050      0.050
    9   6 -0.036  -0.225   0.303      -Inf      Inf       0.000      0.000
   10   3  0.037   0.255   0.121    -1.478      Inf       0.050      0.000

Estimated stopping point from ForwardStop rule = 2

Call:
fsInf(obj = fsfit, type = "aic")

Standard deviation of noise (specified or estimated) sigma = 1.027

Testing results at step = 3, with alpha = 0.100
 Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
   1  2.807  15.850   0.000     2.510    3.099       0.049      0.050
   2  1.722  13.093   0.000     1.499    1.942       0.049      0.049
   9 -0.265  -1.683   0.556    -0.753    1.502       0.050      0.050

Estimated stopping point from AIC rule = 3

Call:
fsInf(obj = fsfit, k = 5, type = "all")

Standard deviation of noise (specified or estimated) sigma = 1.027

Testing results at step = 5, with alpha = 0.100
 Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
   1  2.788  15.588   0.000     2.170    3.144        0.05       0.05
   2  1.721  13.013   0.000     1.518    2.547        0.05       0.05
   9 -0.214  -1.334   0.409    -1.815    1.120        0.05       0.00
   8 -0.219  -1.393   0.323    -5.715    2.656        0.05       0.05
  10  0.173   1.075   0.755   -12.195    3.056        0.05       0.05

selectiveInference documentation built on Sept. 7, 2019, 9:02 a.m.