View source: R/mc.backward.vs.R
mc.backward.vs | R Documentation |
This function implements the backward variable selection approach for BART (see Algorithm 2 in Luo and Daniels (2021) for details). Parallel computation is used within each step of the backward selection approach.
mc.backward.vs( x, y, split.ratio = 0.8, probit = FALSE, true.idx = NULL, xinfo = matrix(0, 0, 0), numcut = 100L, usequants = FALSE, cont = FALSE, rm.const = TRUE, k = 2, power = 2, base = 0.95, split.prob = "polynomial", ntree = 50L, ndpost = 1000, nskip = 1000, keepevery = 1L, printevery = 100L, verbose = FALSE, mc.cores = 2L, nice = 19L, seed = 99L )
x |
A matrix or a data frame of predictors values with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with q levels in a data frame, it is replaced with q dummy variables. |
y |
A vector of response (continuous or binary) values. |
split.ratio |
A number between 0 and 1; the data set |
probit |
A Boolean argument indicating whether the response variable is binary or continuous; |
true.idx |
(Optional) A vector of indices of the true relevant predictors; if provided, metrics including precision, recall and F1 score are returned. |
xinfo |
A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point.
|
numcut |
The number of possible cut-points; If a single number is given, this is used for all predictors;
Otherwise a vector with length equal to |
usequants |
A Boolean argument indicating how the cut-points in |
cont |
A Boolean argument indicating whether to assume all predictors are continuous. |
rm.const |
A Boolean argument indicating whether to remove constant predictors. |
k |
The number of prior standard deviations that E(Y|x) = f(x) is away from +/-.5. The response
( |
power |
The power parameter of the polynomial splitting probability for the tree prior. Only used if
|
base |
The base parameter of the polynomial splitting probability for the tree prior if |
split.prob |
A string indicating what kind of splitting probability is used for the tree prior. If
|
ntree |
The number of trees in the ensemble. |
ndpost |
The number of posterior samples returned. |
nskip |
The number of posterior samples burned in. |
keepevery |
Every |
printevery |
As the MCMC runs, a message is printed every |
verbose |
A Boolean argument indicating whether any messages are printed out. |
mc.cores |
The number of cores to employ in parallel. |
nice |
Set the job niceness. The default niceness is 19 and niceness goes from 0 (highest) to 19 (lowest). |
seed |
Seed required for reproducible MCMC. |
The backward selection starts with the full model with all the predictors, followed by comparing the deletion of each predictor
using mean squared error (MSE) if the response variable is continuous (or mean log loss (MLL) if the response variable is binary)
and then deleting the predictor whose loss gives the smallest MSE (or MLL). This process is repeated until there is only one
predictor in the model and ultimately returns ncol{x}
"winner" models with different model sizes ranging from 1 to
ncol{x}
.
Given the ncol{x}
"winner" models, the one with the largest expected log pointwise predictive density based on leave-one-out
(LOO) cross validation is the best model. See Section 3.3 in Luo and Daniels (2021) for details.
If true.idx
is provided, the precision, recall and F1 scores are returned.
The function mc.backward.vs()
returns a list with the following components.
best.model.names |
The vector of column names of the predictors selected by the backward selection approach. |
best.model.cols |
The vector of column indices of the predictors selected by the backward selection approach. |
best.model.order |
The step where the best model is located. |
models |
The list of winner models from each step of the backward selection procedure; length equals |
model.errors |
The vector of MSEs (or MLLs if the response variable is binary) for the |
elpd.loos |
The vector of LOO scores for the |
all.models |
The list of all the evaluated models. |
all.model.errors |
The vector of MSEs (or MLLs if the response variable is binary) for all the evaluated models. |
precision |
The precision score for the backward selection approach; only returned if |
recall |
The recall score for the backward selection approach; only returned if |
f1 |
The F1 score for the backward selection approach; only returned if |
all.models.idx |
The vector of Boolean arguments indicating whether the corresponding model in |
Chuji Luo: cjluo@ufl.edu and Michael J. Daniels: daniels@ufl.edu.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266β298.
Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.
Rockova V, Saha E (2019). βOn theory for BART.β In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839β2848). PMLR.
Vehtari, Aki, Andrew Gelman, and Jonah Gabry (2017). "Erratum to: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC." Stat. Comput. 27.5, p. 1433.
permute.vs
, medianInclusion.vs
and abc.vs
.
## simulate data (Scenario C.C.1. in Luo and Daniels (2021)) set.seed(123) data = friedman(100, 5, 1, FALSE) ## parallel::mcparallel/mccollect do not exist on windows if(.Platform$OS.type=='unix') { ## test mc.backward.vs() function res = mc.backward.vs(data$X, data$Y, split.ratio=0.8, probit=FALSE, true.idx=c(1:5), ntree=10, ndpost=100, nskip=100, mc.cores=2) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.