run.uKO: Union knockoff filter
In cKarypidis/multiknockoffs: Multiple knockoff procedures

Description Usage Arguments Details Value References Examples

View source: R/run.uKO.R

This function runs the whole union knockoff procedure, i.e. it generates multiple knockoff matrices, estimates the score functions and the selection sets of multiple knockoff runs, which are then aggregated by their union to obtain the final selection set.

run.uKO(
  X,
  y,
  knockoffs = create.second_order,
  statistic = stat.glmnet_coefdiff,
  qk = "decseq",
  q = 0.2,
  K = 5,
  q_seq = NULL,
  offset = 1,
  sets = FALSE
)

`X`	n x p matrix or data frame of original variables.
`y`	response vector of length n.
`knockoffs`	function for the knockoff construction. It must take the n x p matrix as input and it must return a n x p knockoff matrix. Either choose a knockoff sampler of the `knockoff` package or define it manually. Default: `create.second_order` (see below).
`statistic`	function that computes the score vector W of length p. It must take the data matrix, knockoff matrix and response vector as input and outputs a vector of computed scores. Either choose one score statistic from the `knockoff` package or define it manually. Default: `stat.glmnet_coefdiff` (see below).
`qk`	sequence of nominal levels. Either choose `"decseq"` (default) for q_{k} = q/2^{k-1} or `"ave"` for q_{k} = q/K.
`q`	nominal level for the FDR control. Default: 0.2.
`K`	number of knockoff runs. Default: 5.
`q_seq`	manual sequence of nominal level which has to match in length with the number of knockoff runs `K`. If this argument is specified, `qk` and `q` are ignored.
`offset`	either 0 (knockoff) or 1 (knockoff+). Default: 1.
`sets`	logical argument if the K selection sets of each knockoff run should be returned. Default: `FALSE`.

This function requires the installation of the knockoff package prior to its execution.

The default knockoff sampler create.second_order is the second-order Gaussian knockoff construction from the knockoff package.

The default score function stat.glmnet_coefdiff is from the knockoff package. It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation. Then, the score is computed as the difference between

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.

The user has to specify either qk together with q to apply one of the pre-defined nominal levels or has to define the argument q_seq for an own sequence of nominal levels.

A list containing following components:

`Shat`	aggregated selection set.
`K`	number of knockoff runs.
`FDRbound`	theoretical FDR bound.
`sets`	if specified, individual selection sets of each knockoff run.

Xie and Lederer (2021). Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data. Entropy 23(2), 230. https://www.mdpi.com/1099-4300/23/2/230/xml

n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))

X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)

# Basic usage with default arguments
res.uKO <- run.uKO(X, y, sets = TRUE)
res.uKO

# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.uKO <- run.uKO(X, y, knockoffs = equi.knock, sets = TRUE)
res.uKO