selection: Generalized Model-Free Knockoff Filter for Controlled...

Description Usage Arguments Details Value References

View source: R/selection.R

Description

This function is the main entry point for the package. It runs the knockoff procedure which accomodates various covariate distributions and model-free associations using gradient boosting, and ultimately provides the selected variables for the input dataset. Users can specify whether to screen variables before selection and the extent of screening.

Usage

1
2
3
4
5
6
knockoffs.varsel(X, y, X_k, family = Gaussian(),
  q=0.10, knockoff.method = c("sdp","asdp","svm"), knockoff.shrink = T,
  stat = c("RRB", "LCD", "DRS"),
  screen = T, screening.num = nrow(X), screening.knot = 10,
  max.mstop = 100, baselearner = c("bbs", "bols", "btree"), cv.fold = 5,
  threshold=c('knockoff','knockoff+')

Arguments

X

matrix of predictors

y

response vector, or a survival object with two columns

X_k

knockoff variables (n by p), if pre-specified

family

Binomial(), Binomial(link = “logit”, type=”glm”), Gaussian(), Poisson(), CoxPH(), Cindex(), GammaReg(), NBinomial(), Weibull(), Loglog(), Lognormal(), etc. See mboost documentation for details.

q

target FDR (false discovery rate)

knockoff.method

method to construct knockoffs. 'sdp' or 'asdp' means sampling second-order multivariate Gaussian knockoffs via either SDP or approximate SDP. 'svm' means constructing by regression, specifically, by support vector regression

knockoff.shrink

whether to shrink the estimated covariance matrix (default: FALSE)

stat

statistics measuring variable importance. 'RRB' represents risk reduction in boosting, 'LCD' represents Lasso coefficient difference, and 'DRS' represents difference in R-square (R-squares are obtained from boosting)

screen

whether to screen the variables (default: TRUE)

screening.num

number of variables left after screening (default: sample size)

screening.knot

parameter in screening process

max.mstop

maximum number of boosting iteration

baselearner

base-learners when fitting models using mboost. 'bols' means linear base-learners, 'bbs' penalized regression splines with a B-spline basis, and 'btree' boosts stumps.

cv.fold

number of folds in cross-validation to choose number of iteration

threshold

method to calculate knockoff threshold, either “knockoff” or “knockoff+”

Details

The default family for continuous response is Gaussian(), Binomial() for binary response, and CoxPH() for survival response.

Value

A vector containing the selected covariate indices

References

Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://statweb.stanford.edu/~candes/MF_Knockoffs/index.html

Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3–35. http://dx.doi.org/10.1007/s00180-012-0382-5 Available as vignette via: vignette(package = "mboost", "mboost_tutorial")


hanfu-bios/varsel documentation built on March 19, 2018, 10:08 a.m.