Description Usage Arguments Details Value References Examples
Run the model-free knockoff procedure from start to finish, selecting variables relevant for predicting the outcome of interest.
1 2 3 | MFKnockoffs.filter(X, y, knockoffs = MFKnockoffs.create.approximate_gaussian,
statistic = MFKnockoffs.stat.glmnet_coef_difference, q = 0.1,
threshold = c("knockoff+", "knockoff"))
|
X |
matrix or data frame of predictors |
y |
response vector |
knockoffs |
the method used to construct knockoffs for the X variables. It must be a function taking a n-by-p matrix X as input and returning a n-by-p matrix of knockoff variables |
statistic |
the test statistic (by default, a lasso statistic with cross validation). See the Details section for more information. |
q |
target FDR (false discovery rate) |
threshold |
either 'knockoff+' or 'knockoff' (default: 'knockoff+'). |
This function creates the knockoffs, computes the test statistics, and selects variables. It is the main entry point for the model-free knockoff package.
The parameter knockoffs
controls how knockoff variables are created.
By default, a multivariate normal distribution is fitted to the original
variables in X. The estimated mean vector and covariance matrix are used
to generate second-order approximate Gaussian knockoffs.
In general, knockoffs
should be a function taking a n-by-p matrix of
observed variables X and returning a n-by-p matrix of knockoff variables.
Two optional functions for creating knockoffs are provided with this package.
If the rows of X are distributed as a multivariate Gaussian with known parameters,
then the function MFKnockoffs.create.gaussian
can be used to generate
valid Gaussian knockoff variables, as shown in the examples below.
If the design matrix X is assumed to be fixed instead of random, one can create
knockoff variables using the function MFKnockoffs.create.fixed
. This
corresponds to the original framework of the (non Model-Free) knockoff filter.
For more information about creating knockoffs, type ??MFKnockoffs.create
.
The default test statistic is MFKnockoffs.stat.glmnet_coef_difference.
For a complete list of the statistics provided with this package,
type ??MFKnockoffs.stat
.
It is also possible to provide custom test statistics. An example can be found in the vignette.
An object of class "MFKnockoffs.result". This object is a list containing at least the following components:
X |
matrix of original variables |
X_k |
matrix of knockoff variables |
statistic |
computed test statistics |
threshold |
computed selection threshold |
selected |
named vector of selected variables |
Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://statweb.stanford.edu/~candes/MF_Knockoffs/index.html
Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | p=100; n=200; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
# Basic usage with default arguments
result = MFKnockoffs.filter(X, y)
print(result$selected)
# Advanced usage with custom arguments
knockoffs = function(X) MFKnockoffs.create.gaussian(X, mu, Sigma)
k_stat = function(X, X_k, y) MFKnockoffs.stat.glmnet_coef_difference(X, X_k, y, nfolds=5)
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.