run.ADAGES: ADAGES knockoff filter

Description Usage Arguments Details Value References Examples

View source: R/run.ADAGES.R

Description

This function runs the whole ADAGES procedure in the multiple knockoff setting, i.e. it generates multiple knockoff matrices, estimates the score functions and the selection sets of multiple knockoff runs, which are then aggregated by ADAGES to obtain the final selection set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
run.ADAGES(
  X,
  y,
  knockoffs = create.second_order,
  statistic = stat.glmnet_coefdiff,
  q = 0.2,
  K = 5,
  offset = 1,
  type = "ADAGES",
  sets = FALSE
)

Arguments

X

n x p matrix or data frame of original variables.

y

response vector of length n.

knockoffs

function for the knockoff construction. It must take the n x p matrix as input and it must return a n x p knockoff matrix. Either choose a knockoff sampler of the knockoff package or define it manually. Default: create.second_order (see below).

statistic

function that computes the score vector W of length p. It must take the data matrix, knockoff matrix and response vector as input and outputs a vector of computed scores. Either choose one score statistic from the knockoff package or define it manually. Default: stat.glmnet_coefdiff (see below).

q

nominal level for the FDR control. Default: 0.2.

K

number of knockoff runs. Default: 5.

offset

either 0 (knockoff) or 1 (knockoff+). Default: 1.

type

either "ADAGES" (default) or "ADAGES.mod" (see below).

sets

logical argument if the K selection sets of each knockoff run should be returned. Default: FALSE.

Details

This function requires the installation of the knockoff package prior to its execution.

The default knockoff sampler create.second_order is the second-order Gaussian knockoff construction from the knockoff package.

The default score function stat.glmnet_coefdiff is from the knockoff package. It fits a Lasso regression where the regularization parameter λ is tuned by cross-validation. Then, the score is computed as the difference between

W_j = |Z_j| - |\tilde{Z}_j|

where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively.

ADAGES applies the minimization of the complexity ratio as a criterion to determine the optimal threshold.

ADAGES.mod minimizes the trade-off between the threshold and the model complexity c |S| to determine the optimal threshold.

Value

A list containing following components:

Shat

aggregated selection set.

c

optimal threshold.

K

number of knockoff runs.

sets

if specified, individual selection sets of each knockoff run.

References

Gui (2020). ADAGES: adaptive aggregation with stability for distributed feature selection. Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference. https://arxiv.org/pdf/2007.10776.pdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
n <- 400; p <- 200; s_0 <- 30
amplitude <- 1; mu <- rep(0,p); rho <- 0.25
Sigma <- toeplitz(rho^(0:(p-1)))

X <- MASS::mvrnorm(n, mu, Sigma)
nonzero <- sample(p, s_0)
beta <- amplitude * (1:p %in% nonzero)
y <- X %*% beta + rnorm(n)

# Basic usage with default arguments
res.ADAGES <- run.ADAGES(X, y, sets = TRUE)
res.ADAGES

# Advanced usage with customized knockoff construction (equi-correlated)
equi.knock <- function(X) create.second_order(X, method = "equi")
res.ADAGES <- run.ADAGES(X, y, knockoffs = equi.knock, sets = TRUE)
res.ADAGES

cKarypidis/multiknockoffs documentation built on Dec. 19, 2021, 12:53 p.m.