run.uKO: Union knockoff filter

Description Usage Arguments Details Value References Examples

View source: R/run.uKO.R

Description

This function runs the whole union knockoff procedure, i.e. it generates multiple knockoff matrices, estimates the score functions and the selection sets of multiple knockoff runs, which are then aggregated by their union to obtain the final selection set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
run.uKO(
  X,
  y,
  knockoffs = create.second_order,
  statistic = stat.glmnet_coefdiff,
  qk = "decseq",
  q = 0.2,
  K = 5,
  q_seq = NULL,
  offset = 1,
  sets = FALSE
)

Arguments

X

n x p matrix or data frame of original variables.

y

response vector of length n.

knockoffs

function for the knockoff construction. It must take the n x p matrix as input and it must return a n x p knockoff matrix. Either choose a knockoff sampler of the knockoff package or define it manually. Default: create.second_order (see below).

statistic

function that computes the score vector W of length p. It must take the data matrix, knockoff matrix and response vector as input and outputs a vector of computed scores. Either choose one score statistic from the knockoff package or define it manually. Default: stat.glmnet_coefdiff (see below).

qk

sequence of nominal levels. Either choose "decseq" (default) for q_{k} = q/2^{k-1} or "ave" for q_{k} = q/K.

q

nominal level for the FDR control. Default: 0.2.

K

number of knockoff runs. Default: 5.

q_seq

manual sequence of nominal level which has to match in length with the number of knockoff runs K. If this argument is specified, qk and q are ignored.

offset

either 0 (knockoff) or 1 (knockoff+). Default: 1.

sets

logical argument if the K selection sets of each knockoff run should be returned. Default: FALSE.

Details

This function requires the installation of the knockoff package prior to its execution.

The default knockoff sampler create.second_order is the second-order Gaussian knockoff construction from the knockoff package.

The default score function stat.glmnet_coefdiff is from the knockoff package. It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation. Then, the score is computed as the difference between

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.

The user has to specify either qk together with q to apply one of the pre-defined nominal levels or has to define the argument q_seq for an own sequence of nominal levels.

Value

A list containing following components:

Shat

aggregated selection set.

K

number of knockoff runs.

FDRbound

theoretical FDR bound.

sets

if specified, individual selection sets of each knockoff run.

References

Xie and Lederer (2021). Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data. Entropy 23(2), 230. https://www.mdpi.com/1099-4300/23/2/230/xml

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))

X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)

# Basic usage with default arguments
res.uKO <- run.uKO(X, y, sets = TRUE)
res.uKO

# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.uKO <- run.uKO(X, y, knockoffs = equi.knock, sets = TRUE)
res.uKO

cKarypidis/multiknockoffs documentation built on Dec. 19, 2021, 12:53 p.m.