Stability Selection
Description
Selection of influential variables or model components with error control.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  ## generic stability selection funcion
stabsel(x, ...)
## a method to fit models with stability selection
## S3 method for class 'matrix'
stabsel(x, y, fitfun = lars.lasso,
args.fitfun = list(), cutoff, q, PFER,
folds = subsample(rep(1, nrow(x)), B = B),
B = ifelse(sampling.type == "MB", 100, 50),
assumption = c("unimodal", "rconcave", "none"),
sampling.type = c("SS", "MB"),
papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
## essentially a wrapper for data.frames (see details)
## S3 method for class 'data.frame'
stabsel(x, y, intercept = FALSE, ...)

Arguments
x 
a 
y 
a vector or matrix containing the outcome. 
intercept 
logical. If 
fitfun 
a function that takes the arguments 
args.fitfun 
a named list containing additional arguments that are
passed to the fitting function; see also argument 
cutoff 
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used. 
q 
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample. 
PFER 
upper bound for the perfamily error rate. This specifies the amount of falsely selected baselearners, which is tolerated. See details. 
folds 
a weight matrix with number of rows equal to the number
of observations, see 
assumption 
Defines the type of assumptions on the
distributions of the selection probabilities and simultaneous
selection probabilities. Only applicable for

sampling.type 
use sampling scheme of of Shah & Samworth
(2013), i.e., with complementarty pairs ( 
B 
number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use B complementray pairs in the former case this leads to 2B subsamples. 
papply 
(parallel) apply function, defaults to

verbose 
logical (default: 
FWER 
deprecated. Only for compatibility with older versions, use PFER instead. 
eval 
logical. Determines whether stability selection is
evaluated ( 
... 
additional arguments to parallel apply methods such as

Details
This function implements the stability selection procedure by
Meinshausen and Buehlmann (2010) and the improved error bounds by Shah
and Samworth (2013). For details see also Hofner et al. (2014). The
error bounds are implemented in the function
stabsel_parameters
. Two of the three arguments
cutoff
, q
and PFER
must be specified. The
perfamily error rate (PFER), i.e., the expected number of false
positives E(V), where V is the number of false positives,
is bounded by the argument PFER
.
As controlling the PFER is more conservative as controlling the
familywise error rate (FWER), the procedure also controlls the FWER,
i.e., the probability of selecting at least one noninfluential
variable (or model component) is less than PFER
.
Predefined fitfuns
functions exist but more can be
easily implemented. Note that stepwise regression methods are usually
not advised as they tend to be relatively unstable. See example below.
The function stabsel
for data.frame
s is
essentially just a wrapper to the matrix
function with
the same argments. The only difference is that in a preprocessing
step, the data set is converted to a model matrix using the function
model.matrix
. The additional argument intercept
determines if an explicit intercept should be added to the model
matrix. This is often not neccessary but depends on the fitfun
.
Value
An object of class stabsel
with a special print
method.
The object has the following elements:
phat 
selection probabilities. 
selected 
elements with maximal selection probability greater

max 
maximum of selection probabilities. 
cutoff 
cutoff used. 
q 
average number of selected variables used. 
PFER 
(realized) upper bound for the perfamily error rate. 
specifiedPFER 
specified upper bound for the perfamily error rate. 
p 
the number of effects subject to selection. 
B 
the number of subsamples. 
sampling.type 
the sampling type used for stability selection. 
assumption 
the assumptions made on the selection probabilities. 
call 
the call. 
References
B. Hofner, L. Boccuto and M. Goeker (2014), Controlling false
discoveries in highdimensional situations: Boosting with stability
selection. Technical Report, arXiv:1411.1285.
http://arxiv.org/abs/1411.1285.
N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.
R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.
See Also
stabsel_parameters
for the computation of error bounds,
stabsel.stabsel
for the fast recomputation of
parameters of a fitted stabsel
object,
fitfun
for available fitting functions and
plot.stabsel
for available plot functions
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  ## make data set available
data("bodyfat", package = "TH.data")
## set seed
set.seed(1234)
####################################################################
### using stability selection with Lasso methods:
if (require("lars")) {
(stab.lasso < stabsel(x = bodyfat[, 2], y = bodyfat[,2],
fitfun = lars.lasso, cutoff = 0.75,
PFER = 1))
(stab.stepwise < stabsel(x = bodyfat[, 2], y = bodyfat[,2],
fitfun = lars.stepwise, cutoff = 0.75,
PFER = 1))
par(mfrow = c(2, 1))
plot(stab.lasso, main = "Lasso")
plot(stab.stepwise, main = "Stepwise Selection")
## > stepwise selection seems to be quite unstable even in this low
## dimensional example!
}
####################################################################
### using stability selection directly on computed boosting models
### from mboost
if (require("mboost")) {
### lowdimensional example
mod < glmboost(DEXfat ~ ., data = bodyfat)
## compute cutoff ahead of running stabsel to see if it is a sensible
## parameter choice.
## p = ncol(bodyfat)  1 (= Outcome) + 1 ( = Intercept)
stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat)  1 + 1,
sampling.type = "MB")
## the same:
stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)
## now run stability selection
(sbody < stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
opar < par(mai = par("mai") * c(1, 1, 1, 2.7))
plot(sbody)
par(opar)
plot(sbody, type = "maxsel", ymargin = 6)
}
