stabilityGLM: Computes the area under the stability path for all covariates
In EpiSlim/epiGWAS: Robust Methods for Epistasis Detection

Description Usage Arguments Details Value References See Also Examples

To perform model selection, this function scores all covariates in X according to the area under their stability selection paths. Our model selection procedure starts by dynamically defining a grid for the elastic net penalization parameter lambda. To define the grid, we solve the full-dataset elastic net. This yields n_lambda log-scaled values between lambda_max and lambda_min. lambda_max is the maximum value for which the elastic net support is not empty. On the other hand, lambda_min can be derived through lambda_min_ratio, which is the ratio of lambda_min to lambda_max. The next step is identical to the original stability selection procedure. For each value of lambda, we solve n_subsample times the same elastic net, though for a different subsample. The subsample is a random selection of half of the samples of the original dataset. The empirical frequency of each covariate entering the support is then the number of times the covariate is selected in the support as a fraction of n_subsample. We obtain the stability path by associating each value of lambda with the corresponding empirical frequency. The final scores are the areas under the stability path curves. This is a key difference with the original stability selection procedure where the final score is the maximum empirical frequency. On simulations, our scoring technique outperformed maximum empirical frequencies.

stabilityGLM(
  X,
  Y,
  weights = rep(1, nrow(X)),
  family = "gaussian",
  n_subsample = 20,
  n_lambda = 100,
  short = TRUE,
  lambda_min_ratio = 0.01,
  eps = 1e-05
)

`X`	input design matrix
`Y`	response vector
`weights`	nonnegative sample weights
`family`	response type. Either 'gaussian' or 'binomial'
`n_subsample`	number of subsamples for stability selection
`n_lambda`	total number of lambda values
`short`	whether to compute the aucs only on the first half of the stability path. We observed better performance for thresholded paths
`lambda_min_ratio`	ratio of lambda_min to lambda_max (see description for a thorough explanation)
`eps`	elastic net mixing parameter.

For a fixed lambda, the L2 penalization is lambda * eps, while the L1 penalization is lambda * (1-eps). The goal of the L2 penalization is to ensure the uniqueness of the solution. For that reason, we recommend setting eps << 1.

a vector containing the areas under the stability path curves

Slim, L., Chatelain, C., Azencott, C.-A., & Vert, J.-P. (2018). Novel Methods for Epistasis Detection in Genome-Wide Association Studies. BioRxiv.

Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.

Haury, A. C., Mordelet, F., Vera-Licona, P., & Vert, J. P. (2012). TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Systems Biology, 6.

glmnet-package

Other support estimation functions: stabilityBIG()

# ---- Continuous data ----
n <- 50
p <- 20
X <- matrix(rnorm(n * p), ncol = p)
Y <- crossprod(t(X), rnorm(p))
aucs_cont <- stabilityGLM(X, Y,
  family = "gaussian", n_subsample = 1,
  short = FALSE
)

# ---- Binary data ----
X <- matrix(rnorm(n * p), ncol = p)
Y <- runif(n, min = 0, max = 1) < 1 / (1 + exp(-X[, c(1, 7, 15)] %*% rnorm(3)))
weights <- runif(n, min = 0.4, max = 0.8)
aucs_binary <- stabilityGLM(X, Y,
  weights = weights,
  n_lambda = 50, lambda_min_ratio = 0.05, n_subsample = 1
)