abc.vs | R Documentation |
This function implements the variable selection approach proposed in Liu, Rockova and Wang (2021). Rockova and Pas (2020) introduce a spike-and-forest prior which wraps the BART prior with a spike-and-slab prior on the model space. Due to intractable marginal likelihood, Liu, Rockova and Wang (2021) propose an approximate Bayesian computation (ABC) sampling method based on data-splitting to help sample from the model space with higher ABC acceptance rate.
abc.vs( x, y, nabc = 1000, tolerance = 0.1, threshold = 0.25, beta.params = c(1, 1), beta.theta = NA, split.ratio = 0.5, probit = FALSE, true.idx = NULL, analysis = TRUE, sparse = FALSE, xinfo = matrix(0, 0, 0), numcut = 100L, usequants = FALSE, cont = FALSE, rm.const = TRUE, k = 2, power = 2, base = 0.95, split.prob = "polynomial", ntree = 10L, ndpost = 1, nskip = 200, keepevery = 1L, printevery = 100L, verbose = FALSE )
x |
A matrix or a data frame of predictors values with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with q levels in a data frame, it is replaced with q dummy variables. |
y |
A vector of response (continuous or binary) values. |
nabc |
The number of ABC samples, i.e., the number of subsets sampled from the model space. |
tolerance |
A number between 0 and 1; the |
threshold |
A number between 0 and 1; within the ABC accepted subsets, predictors with MPVIP exceeding
|
beta.params |
A vector with two positive numbers; the spike-and-slab prior on the model space is assumed to be a beta-binomial
prior, i.e., θ ~ Beta( |
beta.theta |
A number between 0 and 1; the probability that a predictor is included into a model;
if |
split.ratio |
A number between 0 and 1; the data set |
probit |
A Boolean argument indicating whether the response variable is binary or continuous; |
true.idx |
(Optional) A vector of indices of the true relevant predictors; if |
analysis |
A Boolean argument indicating whether to perform variable selection; if |
sparse |
A Boolean argument indicating whether to perform DART or BART. |
xinfo |
A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point.
|
numcut |
The number of possible cut-points; If a single number is given, this is used for all predictors;
Otherwise a vector with length equal to |
usequants |
A Boolean argument indicating how the cut-points in |
cont |
A Boolean argument indicating whether to assume all predictors are continuous. |
rm.const |
A Boolean argument indicating whether to remove constant predictors. |
k |
The number of prior standard deviations that E(Y|x) = f(x) is away from +/-.5. The response
( |
power |
The power parameter of the polynomial splitting probability for the tree prior. Only used if
|
base |
The base parameter of the polynomial splitting probability for the tree prior if |
split.prob |
A string indicating what kind of splitting probability is used for the tree prior. If
|
ntree |
The number of trees in the ensemble. |
ndpost |
The number of posterior samples returned. |
nskip |
The number of posterior samples burned in. |
keepevery |
Every |
printevery |
As the MCMC runs, a message is printed every |
verbose |
A Boolean argument indicating whether any messages are printed out. |
At each iteration of the algorithm, the data set is randomly split into a training set and a testing set according to a certain
split ratio. The algorithm proceeds by sampling a subset from the spike-and-slab prior on the model space, fitting a BART model
on the training set only with the predictors in the subset, and computing the root mean squared errors (RMSE) for the test set
based on a posterior sample from the fitted BART model. Only those subsets that result in a low RMSE on the test set are kept
for selection. ABC Bayesian forest selects predictors based on their marginal posterior variable inclusion probabilities (MPVIPs)
which are estimated by computing the proportion of ABC accepted BART posterior samples that use the predictor at least one time.
Given the MPVIPs, predictors with MPVIP exceeding a pre-specified threshold are selected.
See Liu, Rockova and Wang (2021) or Section 2.2.4 in Luo and Daniels (2021) for details.
The function abc.vs()
returns a list with the following components.
theta |
The probability that a predictor is included into a model. |
models |
A matrix with |
actual.models |
A matrix with |
model.errors |
The vector of MSEs (or MLLs if the response variable is binary) for the |
idx |
The vector of indices (in terms of the row numbers of |
top.models |
A matrix with |
top.actual.models |
A matrix with |
mip |
The vector of marginal posterior variable inclusion probabilities; only returned when |
best.model |
The vector of predictors selected by ABC Bayesian forest; only returned when |
precision |
The precision score for the ABC Bayesian forest; only returned when |
recall |
The recall score for the ABC Bayesian forest; only returned when |
f1 |
The F1 score for the ABC Bayesian forest; only returned when |
Chuji Luo: cjluo@ufl.edu and Michael J. Daniels: daniels@ufl.edu.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266–298.
Linero, A. R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." J. Amer. Statist. Assoc. 113 626–636.
Liu, Yi, Veronika Rockova, and Yuexi Wang (2021). "Variable selection with ABC Bayesian forests." J. R. Stat. Soc. Ser. B. Stat. Methodol. 83.3, pp. 453–481.
Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.
Rockova Veronika and Stephanie van der Pas (2020). "Posterior concentration for Bayesian regression trees and forests." Ann. Statist. 48.4, pp. 2108–2131.
permute.vs
, medianInclusion.vs
and mc.backward.vs
.
## simulate data (Scenario C.M.1. in Luo and Daniels (2021)) set.seed(123) data = mixone(100, 10, 1, FALSE) ## test abc.vs() function res = abc.vs(data$X, data$Y, nabc=100, tolerance=0.1, threshold=0.25, beta.params=c(1.0, 1.0), split.ratio=0.5, probit=FALSE, true.idx=c(1,2,6:8), ntree=10, ndpost=1, nskip=200, analysis=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.