Description Usage Arguments Details Value References Examples

View source: R/knockoff_filter.R

This function runs the Knockoffs procedure from start to finish, selecting variables relevant for predicting the outcome of interest.

1 2 3 4 5 6 7 8 | ```
knockoff.filter(
X,
y,
knockoffs = create.second_order,
statistic = stat.glmnet_coefdiff,
fdr = 0.1,
offset = 1
)
``` |

`X` |
n-by-p matrix or data frame of predictors. |

`y` |
response vector of length n. |

`knockoffs` |
method used to construct knockoffs for the |

`statistic` |
statistics used to assess variable importance. By default, a lasso statistic with cross-validation is used. See the Details section for more information. |

`fdr` |
target false discovery rate (default: 0.1). |

`offset` |
either 0 or 1 (default: 1). This is the offset used to compute the rejection threshold on the statistics. The value 1 yields a slightly more conservative procedure ("knockoffs+") that controls the false discovery rate (FDR) according to the usual definition, while an offset of 0 controls a modified FDR. |

This function creates the knockoffs, computes the importance statistics, and selects variables. It is the main entry point for the knockoff package.

The parameter `knockoffs`

controls how knockoff variables are created.
By default, the model-X scenario is assumed and a multivariate normal distribution
is fitted to the original variables *X*. The estimated mean vector and the covariance
matrix are used to generate second-order approximate Gaussian knockoffs.
In general, the function `knockoffs`

should take a n-by-p matrix of
observed variables *X* as input and return a n-by-p matrix of knockoffs.
Two default functions for creating knockoffs are provided with this package.

In the model-X scenario, under the assumption that the rows of *X* are distributed
as a multivariate Gaussian with known parameters, then the function
`create.gaussian`

can be used to generate Gaussian knockoffs,
as shown in the examples below.

In the fixed-X scenario, one can create the knockoffs using the function
`create.fixed`

. This requires *n ≥q p* and it assumes
that the response *Y* follows a homoscedastic linear regression model.

For more information about creating knockoffs, type `??create`

.

The default importance statistic is stat.glmnet_coefdiff.
For a complete list of the statistics provided with this package,
type `??stat`

.

It is possible to provide custom functions for the knockoff constructions or the importance statistics. Some examples can be found in the vignette.

An object of class "knockoff.result". This object is a list containing at least the following components:

`X` |
matrix of original variables |

`Xk` |
matrix of knockoff variables |

`statistic` |
computed test statistics |

`threshold` |
computed selection threshold |

`selected` |
named vector of selected variables |

Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://web.stanford.edu/group/candes/knockoffs/index.html

Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ```
p=200; n=100; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
# Basic usage with default arguments
result = knockoff.filter(X, y)
print(result$selected)
# Advanced usage with custom arguments
knockoffs = function(X) create.gaussian(X, mu, Sigma)
k_stat = function(X, Xk, y) stat.glmnet_coefdiff(X, Xk, y, nfolds=5)
result = knockoff.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.