knockoff_filter: The Knockoff Filter

Description Usage Arguments Details Value Examples

View source: R/knockoff_filters.R

Description

This is an adaptation of the knockoff.filter function from the R-package knockoff.

Usage

1
2
knockoff_filter(X, y, fdr = 0.2, family = "gaussian",
  knockoffs = knockoffs_seq, statistic = stat_glmnet)

Arguments

X

data.frame (or tibble) with "numeric" and "factor" columns only. The number of columns, ncol(X) needs to be > 2.

y

response vector with length(y) = nrow(X). Accepts "numeric" (family="gaussian") or binary "factor" (family="binomial").

fdr

target false discovery rate. Can be a vector of multiple thresholds.

family

should be "gaussian" if y is numeric, but "binomial" if y is a binary factor variable.

knockoffs

user-specified function to construct knockoff of X. It must take as input the data.frame (or tibble) X and return a data.frame (or tibble) X_k of corresponding knockoffs. By default, knockoffs=knockoffs_seq, but other option is knockoffs=knockoffs_mx for X with numeric columns only (see ?knockoffs_seq and ?knockoffs_mx).

statistic

user-specified function that constructs feature statistics used to assess variable importance. By default statistic=stat_glmnet (see ?stat_glmnet).

Details

This function takes input X with either "numeric" or "factor" columns (or both), input y can be either numeric or binary factor, and user may specify multiple fdr thresholds. The function performs the knockoff filter, which consists of three steps: 1) Simulate knockoff of X with the input function knockoffs, 2) calculate importance statistic W with the input function statistic, and 3) calculate the knockoff+ threshold for each target fdr provided. Finally, selection is made based on W > threshold.

Value

if length(fdr)=1 the function returns a vector of selected indices, otherwise a list of selected indices, one selection vector per fdr threshold supplied.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(seqknockoff)

set.seed(1)

# Simulate 10 Gaussian covariate predictors:
X <- generate_X(n=1000, p=10, p_b=0, cov_type="cov_equi", rho=0.5)

# Simulate response from model y = X%*%beta + epsilon, where epsilon ~ N(0,1) with
# first 5 beta-coefficients = 8 (all other zero).
y <- generate_y(X, p_nn=5, a=8)

S <- knockoff_filter(X, y, fdr=c(0.05, 0.1, 0.2))

# dichotomize y for logistic regression knockoff filter:
y <- factor(y > median(y))

# Below the family argument gets passed to the statistic = knockoff::stat.glmnet_coefdiff function:
S <- knockoff_filter(X, y, fdr=c(0.05, 0.1, 0.2), family="binomial")

kormama1/seqknockoff documentation built on April 11, 2021, 7:44 a.m.