run.pKO: P-value knockoffs

Description Usage Arguments Details Value References Examples

View source: R/run.pKO.R

Description

This function runs the whole p-value knockoff procedure, i.e. it generates multiple knockoff matrices, estimates the scores, computes intermediate p-values and aggregates them before applying Benjamini-Hochberg or Benjamini-Yekutieli in the last step to obtain the final selection set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
run.pKO(
  X,
  y,
  knockoffs = create.second_order,
  statistic = stat.glmnet_coefdiff,
  q = 0.2,
  B = 25,
  gamma = 0.3,
  offset = 1,
  method = "BH",
  pvals = FALSE
)

Arguments

X

n x p matrix or data frame of original variables.

y

response vector of length n.

knockoffs

function for the knockoff construction. It must take the n x p matrix as input and it must return a n x p knockoff matrix. Either choose a knockoff sampler of the knockoff package or define it manually. Default: create.second_order (see below).

statistic

function that computes the score vector W of length p. It must take the data matrix, knockoff matrix and response vector as input and outputs a vector of computed scores. Either choose one score statistic from the knockoff package or define it manually. Default: stat.glmnet_coefdiff (see below).

q

nominal level for the FDR control. Default: 0.2.

B

number of knockoff runs. Default: 25.

gamma

value between (0,1) which defines the quantile value used for the aggregation. If gamma = NULL, the adaptive search by Meinshausen et al. (2009) is used. Default: 0.3.

offset

either 0 (knockoff) or 1 (knockoff+). Default: 1.

method

the FDR controlling method in the last step. Either "BH" (default) or "BY".

pvals

logical argument if the aggregated p-values should be reported. Default: FALSE.

Details

This function requires the installation of the knockoff package prior to its execution.

The default knockoff sampler create.second_order is the second-order Gaussian knockoff construction from the knockoff package.

Although the default knockoff sampler is based on ASDP, we recommend using the equi-correlated construction within create.second_order because it performs significantly better. See the example below that shows how the user can change the knockoff sampler create.second_order to create equi-correlated knockoffs.

The default score function stat.glmnet_coefdiff is from the knockoff package. It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation. Then, the score is computed as the difference between

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.

Value

A list containing following components:

Shat

aggregated selection set.

B

number of knockoff matrices.

pvals

if specified, vector of aggregated p-values.

References

Benjamini and Hochberg (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 289-300.

Benjamini and Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4), 1165-1188.

Meinshausen, Meier and Buehlmann (2009). p-Values for High-Dimensional Regression. Journal of the American Statistical Association 104(488), 1671-1681.

Nguyen, Chevalier, Thirion and Arlot (2020). Aggregation of Multiple Knockoffs. Proceedings of the 37th International Conference on Machine Learning. https://arxiv.org/abs/2002.09269

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))

X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)

# Basic usage with default arguments
res.pKO <- run.pKO(X, y, pvals = TRUE)
res.pKO

# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.pKO <- run.pKO(X, y, knockoffs = equi.knock, pvals = TRUE)
res.pKO

cKarypidis/multiknockoffs documentation built on Dec. 19, 2021, 12:53 p.m.