Description Usage Arguments Details Value References
This function is the main entry point for the package. It runs the knockoff procedure which accomodates various covariate distributions and model-free associations using gradient boosting, and ultimately provides the selected variables for the input dataset. Users can specify whether to screen variables before selection and the extent of screening.
1 2 3 4 5 6 | knockoffs.varsel(X, y, X_k, family = Gaussian(),
q=0.10, knockoff.method = c("sdp","asdp","svm"), knockoff.shrink = T,
stat = c("RRB", "LCD", "DRS"),
screen = T, screening.num = nrow(X), screening.knot = 10,
max.mstop = 100, baselearner = c("bbs", "bols", "btree"), cv.fold = 5,
threshold=c('knockoff','knockoff+')
|
X |
matrix of predictors |
y |
response vector, or a survival object with two columns |
X_k |
knockoff variables (n by p), if pre-specified |
family |
Binomial(), Binomial(link = <e2><80><9c>logit<e2><80><9d>, type=<e2><80><9d>glm<e2><80><9d>), Gaussian(), Poisson(), CoxPH(), Cindex(), GammaReg(), NBinomial(), Weibull(), Loglog(), Lognormal(), etc. See mboost documentation for details. |
q |
target FDR (false discovery rate) |
knockoff.method |
method to construct knockoffs. 'sdp' or 'asdp' means sampling second-order multivariate Gaussian knockoffs via either SDP or approximate SDP. 'svm' means constructing by regression, specifically, by support vector regression |
knockoff.shrink |
whether to shrink the estimated covariance matrix (default: FALSE) |
stat |
statistics measuring variable importance. 'RRB' represents risk reduction in boosting, 'LCD' represents Lasso coefficient difference, and 'DRS' represents difference in R-square (R-squares are obtained from boosting) |
screen |
whether to screen the variables (default: TRUE) |
screening.num |
number of variables left after screening (default: sample size) |
screening.knot |
parameter in screening process |
max.mstop |
maximum number of boosting iteration |
baselearner |
base-learners when fitting models using mboost. 'bols' means linear base-learners, 'bbs' penalized regression splines with a B-spline basis, and 'btree' boosts stumps. |
cv.fold |
number of folds in cross-validation to choose number of iteration |
threshold |
method to calculate knockoff threshold, either <e2><80><9c>knockoff<e2><80><9d> or <e2><80><9c>knockoff+<e2><80><9d> |
The default family for continuous response is Gaussian(), Binomial() for binary response, and CoxPH() for survival response.
A vector containing the selected covariate indices
Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://statweb.stanford.edu/~candes/MF_Knockoffs/index.html
Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853
Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3<e2><80><93>35. http://dx.doi.org/10.1007/s00180-012-0382-5 Available as vignette via: vignette(package = "mboost", "mboost_tutorial")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.