Description Usage Arguments Details Value References Examples
This function runs the whole p-value knockoff procedure, i.e. it generates multiple knockoff matrices, estimates the scores, computes intermediate p-values and aggregates them before applying Benjamini-Hochberg or Benjamini-Yekutieli in the last step to obtain the final selection set.
1 2 3 4 5 6 7 8 9 10 11 12 |
X |
n x p matrix or data frame of original variables. |
y |
response vector of length n. |
knockoffs |
function for the knockoff construction. It must take the n x p matrix as input
and it must return a n x p knockoff matrix. Either choose a knockoff sampler of
the |
statistic |
function that computes the score vector W of length p. It must take the data matrix,
knockoff matrix and response vector as input and outputs a vector of computed
scores. Either choose one score statistic from the |
q |
nominal level for the FDR control. Default: 0.2. |
B |
number of knockoff runs. Default: 25. |
gamma |
value between (0,1) which defines the quantile value used for the aggregation.
If |
offset |
either 0 (knockoff) or 1 (knockoff+). Default: 1. |
method |
the FDR controlling method in the last step. Either |
pvals |
logical argument if the aggregated p-values should be reported. Default: |
This function requires the installation of the knockoff
package prior to its execution.
The default knockoff sampler create.second_order
is the second-order Gaussian knockoff construction from
the knockoff
package.
Although the default knockoff sampler is based on ASDP, we recommend using the equi-correlated
construction within create.second_order
because it performs significantly better. See the example below
that shows how the user can change the knockoff sampler create.second_order
to create equi-correlated knockoffs.
The default score function stat.glmnet_coefdiff
is from the knockoff
package.
It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation.
Then, the score is computed as the difference between
W_j = |Z_j| - |\tilde{Z}_j|
where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.
A list containing following components:
Shat |
aggregated selection set. |
B |
number of knockoff matrices. |
pvals |
if specified, vector of aggregated p-values. |
Benjamini and Hochberg (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289-300.
Benjamini and Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4), 1165-1188.
Meinshausen, Meier and Buehlmann (2009). p-Values for High-Dimensional Regression. Journal of the American Statistical Association 104(488), 1671-1681.
Nguyen, Chevalier, Thirion and Arlot (2020). Aggregation of Multiple Knockoffs. Proceedings of the 37th International Conference on Machine Learning. https://arxiv.org/abs/2002.09269
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))
X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)
# Basic usage with default arguments
res.pKO <- run.pKO(X, y, pvals = TRUE)
res.pKO
# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.pKO <- run.pKO(X, y, knockoffs = equi.knock, pvals = TRUE)
res.pKO
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.