MFKnockoffs.filter: Model-Free Knockoff Filter

Description Usage Arguments Details Value References Examples

Description

Run the model-free knockoff procedure from start to finish, selecting variables relevant for predicting the outcome of interest.

Usage

1
2
3
MFKnockoffs.filter(X, y, knockoffs = MFKnockoffs.create.approximate_gaussian,
  statistic = MFKnockoffs.stat.glmnet_coef_difference, q = 0.1,
  threshold = c("knockoff+", "knockoff"))

Arguments

X

matrix or data frame of predictors

y

response vector

knockoffs

the method used to construct knockoffs for the X variables. It must be a function taking a n-by-p matrix X as input and returning a n-by-p matrix of knockoff variables

statistic

the test statistic (by default, a lasso statistic with cross validation). See the Details section for more information.

q

target FDR (false discovery rate)

threshold

either 'knockoff+' or 'knockoff' (default: 'knockoff+').

Details

This function creates the knockoffs, computes the test statistics, and selects variables. It is the main entry point for the model-free knockoff package.

The parameter knockoffs controls how knockoff variables are created. By default, a multivariate normal distribution is fitted to the original variables in X. The estimated mean vector and covariance matrix are used to generate second-order approximate Gaussian knockoffs. In general, knockoffs should be a function taking a n-by-p matrix of observed variables X and returning a n-by-p matrix of knockoff variables. Two optional functions for creating knockoffs are provided with this package.

If the rows of X are distributed as a multivariate Gaussian with known parameters, then the function MFKnockoffs.create.gaussian can be used to generate valid Gaussian knockoff variables, as shown in the examples below.

If the design matrix X is assumed to be fixed instead of random, one can create knockoff variables using the function MFKnockoffs.create.fixed. This corresponds to the original framework of the (non Model-Free) knockoff filter.

For more information about creating knockoffs, type ??MFKnockoffs.create.

The default test statistic is MFKnockoffs.stat.glmnet_coef_difference. For a complete list of the statistics provided with this package, type ??MFKnockoffs.stat.

It is also possible to provide custom test statistics. An example can be found in the vignette.

Value

An object of class "MFKnockoffs.result". This object is a list containing at least the following components:

X

matrix of original variables

X_k

matrix of knockoff variables

statistic

computed test statistics

threshold

computed selection threshold

selected

named vector of selected variables

References

Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://statweb.stanford.edu/~candes/MF_Knockoffs/index.html

Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
p=100; n=200; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)

# Basic usage with default arguments
result = MFKnockoffs.filter(X, y)
print(result$selected)

# Advanced usage with custom arguments
knockoffs = function(X) MFKnockoffs.create.gaussian(X, mu, Sigma)
k_stat = function(X, X_k, y) MFKnockoffs.stat.glmnet_coef_difference(X, X_k, y, nfolds=5)
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)

MFKnockoffs documentation built on May 2, 2019, 6:33 a.m.