stability | R Documentation |
Compute the stability path of a (possibly randomized) fitting procedure as introduced by Meinshausen and Buhlmann (2010).
stability( x, y, penalty = c("elastic.net", "bounded.reg"), subsamples = 100, sample.size = floor(n/2), randomize = TRUE, weakness = 0.5, verbose = TRUE, folds = replicate(subsamples, sample(1:nrow(x), sample.size), simplify = FALSE), mc.cores = 2, ... )
x |
matrix of features, possibly sparsely encoded (experimental). Do NOT include intercept. |
y |
response vector. |
penalty |
a string for the fitting procedure used for
cross-validation. Either |
subsamples |
integer indicating the number of subsamplings used to estimate the selection probabilities. Default is 100. |
sample.size |
integer indicating the size of each subsamples.
Default is |
randomize |
Should a randomized version of the fitting
procedure by used? Default is |
weakness |
Coefficient used for randomizing. Default is
|
verbose |
logical; indicates if the progression should be
displayed. Default is |
folds |
list with |
mc.cores |
the number of cores to use. The default uses 2 cores. |
... |
additional parameters to overwrite the defaults of the
fitting procedure. See the corresponding documentation
( |
An object of class stability.path
.
When randomized = TRUE
, the penscale
argument
that weights the penalty tuned by lambda1 is
perturbed (divided) for each subsample by a random variable
uniformly distributed on
[α,1],
where
α is
the weakness parameter.
If the user runs the fitting method with option
'bulletproof'
set to FALSE
, the algorithm may stop
at an early stage of the path. Early stops of the underlying
fitting function are handled internally, in the following way: we
chose to simply skip the results associated with such runs, in
order not to bias the stability selection procedure. If it occurs
too often, a warning is sent to the user, in which case you should
reconsider the grid of lambda1
for stability selection. If
bulletproof
is TRUE
(the default), there is nothing
to worry about, except a possible slow down when any switching to
the proximal algorithm is required.
N. Meinshausen and P. Buhlmann (2010). Stability Selection, JRSS(B).
stability.path
and
plot,stability.path-method
.
## Simulating multivariate Gaussian with blockwise correlation ## and piecewise constant vector of parameters beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25)) Soo <- matrix(0.75,25,25) ## bloc correlation between zero variables Sww <- matrix(0.75,10,10) ## bloc correlation between active variables Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo) + 0.2 diag(Sigma) <- 1 n <- 100 x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma)) y <- 10 + x %*% beta + rnorm(n,0,10) ## Build a vector of label for true nonzeros labels <- rep("irrelevant", length(beta)) labels[beta != 0] <- c("relevant") labels <- factor(labels, ordered=TRUE, levels=c("relevant","irrelevant")) ## Call to stability selection function, 200 subsampling stab <- stability(x,y, subsamples=200, lambda2=1, min.ratio=1e-2) ## Recover the selected variables for a given cutoff ## and per-family error rate, without producing any plot stabpath <- plot(stab, cutoff=0.75, PFER=1, plot=FALSE) cat("\nFalse positives for the randomized Elastic-net with stability selection: ", sum(labels[stabpath$selected] != "relevant")) cat("\nDONE.\n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.