Description Usage Arguments Details Value References Examples
This function runs the whole ADAGES procedure in the multiple knockoff setting, i.e. it generates multiple knockoff matrices, estimates the score functions and the selection sets of multiple knockoff runs, which are then aggregated by ADAGES to obtain the final selection set.
1 2 3 4 5 6 7 8 9 10 11 | run.ADAGES(
X,
y,
knockoffs = create.second_order,
statistic = stat.glmnet_coefdiff,
q = 0.2,
K = 5,
offset = 1,
type = "ADAGES",
sets = FALSE
)
|
X |
n x p matrix or data frame of original variables. |
y |
response vector of length n. |
knockoffs |
function for the knockoff construction. It must take the n x p matrix as input
and it must return a n x p knockoff matrix. Either choose a knockoff sampler of
the |
statistic |
function that computes the score vector W of length p. It must take the data matrix,
knockoff matrix and response vector as input and outputs a vector of computed
scores. Either choose one score statistic from the |
q |
nominal level for the FDR control. Default: 0.2. |
K |
number of knockoff runs. Default: 5. |
offset |
either 0 (knockoff) or 1 (knockoff+). Default: 1. |
type |
either |
sets |
logical argument if the K selection sets of each knockoff run
should be returned. Default: |
This function requires the installation of the knockoff
package prior to its execution.
The default knockoff sampler create.second_order
is the second-order Gaussian knockoff construction from
the knockoff
package.
The default score function stat.glmnet_coefdiff
is from the knockoff
package.
It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation.
Then, the score is computed as the difference between
W_j = |Z_j| - |\tilde{Z}_j|
where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.
ADAGES
applies the minimization of the complexity ratio as a criterion to determine
the optimal threshold.
ADAGES.mod
minimizes the trade-off between the threshold and the model complexity c |S|
to determine the optimal threshold.
A list containing following components:
Shat |
aggregated selection set. |
c |
optimal threshold. |
K |
number of knockoff runs. |
sets |
if specified, individual selection sets of each knockoff run. |
Gui (2020). ADAGES: adaptive aggregation with stability for distributed feature selection. Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. https://arxiv.org/pdf/2007.10776.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))
X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)
# Basic usage with default arguments
res.ADAGES <- run.ADAGES(X, y, sets = TRUE)
res.ADAGES
# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.ADAGES <- run.ADAGES(X, y, knockoffs = equi.knock, sets = TRUE)
res.ADAGES
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.