run.pKO: P-value knockoffs
In cKarypidis/multiknockoffs: Multiple knockoff procedures

Description Usage Arguments Details Value References Examples

View source: R/run.pKO.R

This function runs the whole p-value knockoff procedure, i.e. it generates multiple knockoff matrices, estimates the scores, computes intermediate p-values and aggregates them before applying Benjamini-Hochberg or Benjamini-Yekutieli in the last step to obtain the final selection set.

run.pKO(
  X,
  y,
  knockoffs = create.second_order,
  statistic = stat.glmnet_coefdiff,
  q = 0.2,
  B = 25,
  gamma = 0.3,
  offset = 1,
  method = "BH",
  pvals = FALSE
)

`X`	n x p matrix or data frame of original variables.
`y`	response vector of length n.
`knockoffs`	function for the knockoff construction. It must take the n x p matrix as input and it must return a n x p knockoff matrix. Either choose a knockoff sampler of the `knockoff` package or define it manually. Default: `create.second_order` (see below).
`statistic`	function that computes the score vector W of length p. It must take the data matrix, knockoff matrix and response vector as input and outputs a vector of computed scores. Either choose one score statistic from the `knockoff` package or define it manually. Default: `stat.glmnet_coefdiff` (see below).
`q`	nominal level for the FDR control. Default: 0.2.
`B`	number of knockoff runs. Default: 25.
`gamma`	value between (0,1) which defines the quantile value used for the aggregation. If `gamma = NULL`, the adaptive search by Meinshausen et al. (2009) is used. Default: 0.3.
`offset`	either 0 (knockoff) or 1 (knockoff+). Default: 1.
`method`	the FDR controlling method in the last step. Either `"BH"` (default) or `"BY"`.
`pvals`	logical argument if the aggregated p-values should be reported. Default: `FALSE`.

This function requires the installation of the knockoff package prior to its execution.

The default knockoff sampler create.second_order is the second-order Gaussian knockoff construction from the knockoff package.

Although the default knockoff sampler is based on ASDP, we recommend using the equi-correlated construction within create.second_order because it performs significantly better. See the example below that shows how the user can change the knockoff sampler create.second_order to create equi-correlated knockoffs.

The default score function stat.glmnet_coefdiff is from the knockoff package. It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation. Then, the score is computed as the difference between

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.

A list containing following components:

`Shat`	aggregated selection set.
`B`	number of knockoff matrices.
`pvals`	if specified, vector of aggregated p-values.

Benjamini and Hochberg (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289-300.

Benjamini and Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4), 1165-1188.

Meinshausen, Meier and Buehlmann (2009). p-Values for High-Dimensional Regression. Journal of the American Statistical Association 104(488), 1671-1681.

Nguyen, Chevalier, Thirion and Arlot (2020). Aggregation of Multiple Knockoffs. Proceedings of the 37th International Conference on Machine Learning. https://arxiv.org/abs/2002.09269

n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))

X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)

# Basic usage with default arguments
res.pKO <- run.pKO(X, y, pvals = TRUE)
res.pKO

# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.pKO <- run.pKO(X, y, knockoffs = equi.knock, pvals = TRUE)
res.pKO