Description Usage Arguments Details Value References See Also Examples
To perform model selection, this function scores all covariates in X
according to the area under their stability selection paths. Our model selection
procedure starts by dynamically defining a grid for
the elastic net penalization parameter lambda. To define the
grid, we solve the full-dataset elastic net. This yields
n_lambda
log-scaled values between lambda_max
and lambda_min. lambda_max is
the maximum value for which the elastic net support is not empty. On the other hand,
lambda_min can be derived through
lambda_min_ratio
, which is the ratio of lambda_min
to lambda_max. The next step is identical to the original
stability selection procedure. For each value of lambda, we
solve n_subsample
times the same elastic net, though for a different subsample.
The subsample is a random selection of half of the samples of the original dataset.
The empirical frequency of each covariate entering the support is then the number of
times the covariate is selected in the support as a fraction of n_subsample
.
We obtain the stability path by associating each value of lambda
with the corresponding empirical frequency. The final scores are the areas under the
stability path curves. This is a key difference with the original stability
selection procedure where the final score is the maximum empirical frequency.
On simulations, our scoring technique outperformed maximum empirical
frequencies.
1 2 3 4 5 6 7 8 9 10 11 |
X |
input design matrix |
Y |
response vector |
weights |
nonnegative sample weights |
family |
response type. Either 'gaussian' or 'binomial' |
n_subsample |
number of subsamples for stability selection |
n_lambda |
total number of lambda values |
short |
whether to compute the aucs only on the first half of the stability path. We observed better performance for thresholded paths |
lambda_min_ratio |
ratio of lambda_min to lambda_max (see description for a thorough explanation) |
eps |
elastic net mixing parameter. |
For a fixed lambda, the L2 penalization is lambda * eps, while the L1 penalization is lambda * (1-eps). The goal of the L2 penalization is to ensure the uniqueness of the solution. For that reason, we recommend setting eps << 1.
a vector containing the areas under the stability path curves
Slim, L., Chatelain, C., Azencott, C.-A., & Vert, J.-P. (2018). Novel Methods for Epistasis Detection in Genome-Wide Association Studies. BioRxiv.
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.
Haury, A. C., Mordelet, F., Vera-Licona, P., & Vert, J. P. (2012). TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Systems Biology, 6.
Other support estimation functions:
stabilityBIG()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # ---- Continuous data ----
n <- 50
p <- 20
X <- matrix(rnorm(n * p), ncol = p)
Y <- crossprod(t(X), rnorm(p))
aucs_cont <- stabilityGLM(X, Y,
family = "gaussian", n_subsample = 1,
short = FALSE
)
# ---- Binary data ----
X <- matrix(rnorm(n * p), ncol = p)
Y <- runif(n, min = 0, max = 1) < 1 / (1 + exp(-X[, c(1, 7, 15)] %*% rnorm(3)))
weights <- runif(n, min = 0.4, max = 0.8)
aucs_binary <- stabilityGLM(X, Y,
weights = weights,
n_lambda = 50, lambda_min_ratio = 0.05, n_subsample = 1
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.