mc.wbart | R Documentation |
BART is a Bayesian approach to nonparametric function estimation and inference using a sum of trees.
For a continuous response y and a p-dimensional vector of predictors x = (x_1, ..., x_p)',
BART models y and x using
y = f(x) + ε,
where f is a sum of Bayesian regression trees function and ε ~ N(0, σ^2).
The function mc.wbart()
is inherited from the CRAN R package 'BART' and is a variant of the function
wbart()
with parallel computation.
mc.wbart( x.train, y.train, x.test = matrix(0, 0, 0), sparse = FALSE, theta = 0, omega = 1, a = 0.5, b = 1, augment = FALSE, rho = NULL, xinfo = matrix(0, 0, 0), numcut = 100L, usequants = FALSE, cont = FALSE, rm.const = TRUE, power = 2, base = 0.95, split.prob = "polynomial", k = 2, sigmaf = NA, sigest = NA, sigdf = 3, sigquant = 0.9, lambda = NA, fmean = mean(y.train), w = rep(1, length(y.train)), ntree = 200L, ndpost = 1000L, nskip = 100L, keepevery = 1L, printevery = 100L, keeptrainfits = TRUE, transposed = FALSE, verbose = FALSE, mc.cores = 2L, nice = 19L, seed = 99L )
x.train |
A matrix or a data frame of predictors values (for training) with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with q levels in a data frame, it is replaced with q dummy variables. |
y.train |
A vector of continuous response values for training. |
x.test |
A matrix or a data frame of predictors values for testing, which has the same structure as |
sparse |
A Boolean argument indicating whether to replace the discrete uniform distribution for selecting a split variable with a categorical distribution whose event probabilities follow a Dirichlet distribution (see Linero (2018) for details). |
theta |
Set |
omega |
Set |
a |
A sparse parameter of Beta(a, b) hyper-prior where 0.5<=a<=1; a lower value induces more sparsity. |
b |
A sparse parameter of Beta(a, b) hyper-prior; typically, b=1. |
augment |
A Boolean argument indicating whether data augmentation is performed in the variable selection procedure of Linero (2018). |
rho |
A sparse parameter; typically ρ = p where p is the number of predictors. |
xinfo |
A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point.
|
numcut |
The number of possible cut-points; If a single number is given, this is used for all predictors;
Otherwise a vector with length equal to |
usequants |
A Boolean argument indicating how the cut-points in |
cont |
A Boolean argument indicating whether to assume all predictors are continuous. |
rm.const |
A Boolean argument indicating whether to remove constant predictors. |
power |
The power parameter of the polynomial splitting probability for the tree prior. Only used if
|
base |
The base parameter of the polynomial splitting probability for the tree prior if |
split.prob |
A string indicating what kind of splitting probability is used for the tree prior. If
|
k |
The number of prior standard deviations that E(Y|x) = f(x) is away from +/-.5. The response
( |
sigmaf |
The standard deviation of |
sigest |
A rough estimate of the error standard deviation, the square of which follows an inverse chi-squared prior.
If |
sigdf |
The degrees of freedom for the error variance prior. |
sigquant |
The quantile of the error variance prior, where |
lambda |
The scale parameter of the error variance prior. |
fmean |
BART operates on |
w |
A vector of weights which multiply the standard deviation. |
ntree |
The number of trees in the ensemble. |
ndpost |
The number of posterior samples returned. |
nskip |
The number of posterior samples burned in. |
keepevery |
Every |
printevery |
As the MCMC runs, a message is printed every |
keeptrainfits |
A Boolean argument indicating whether to keep |
transposed |
A Boolean argument indicating whether the matrices |
verbose |
A Boolean argument indicating whether any messages are printed out. |
mc.cores |
The number of cores to employ in parallel. |
nice |
Set the job niceness. The default niceness is 19 and niceness goes from 0 (highest) to 19 (lowest). |
seed |
Seed required for reproducible MCMC. |
This function is inherited from BART::mc.wbart()
and is a variant of the function wbart()
with parallel computation.
While the original features of BART::wbart()
are preserved, two modifications are made.
The first modification is to provide two types of split probability for BART. One split probability is proposed in
Chipman et al. (2010) and defined as
p(d) = γ * (1+d)^{-β},
where d is the depth of the node, γ \in (0,1) and β \in (0,∞). The other split probability is proposed by Rockova and Saha (2019) and defined as
p(d) = γ^d,
where γ \in (1/n, 1/2). BART with the second split probability is proved
to achieve the optimal posterior contraction.
The second modification is to provide five types of variable importance measures (vip
, within.type.vip
,
pvip
, varprob.mean
and mi
) in the return object, for the sake of the existence of mixed-type predictors.
The function mc.wbart()
returns an object of type wbart
which essentially is a list consisting of the
following components.
sigma |
A vector with |
yhat.train.mean |
|
yhat.train |
A matrix with |
yhat.test.mean |
|
yhat.test |
A matrix with |
varcount |
A matrix with |
varprob |
A matrix with |
treedraws |
A list containing the posterior samples of the ensembles (trees structures, split variables and split values); Can be used for prediction. |
proc.time |
The process time of running the function |
mu |
BART operates on |
mr.vecs |
A list of ncol(x.train) sub-lists with each corresponding to a predictor; Each sub-list contains
|
vip |
A vector of variable inclusion proportions (VIP) proposed in Chipman et al. (2010). |
within.type.vip |
A vector of within-type VIPs proposed in Luo and Daniels (2021). |
pvip |
A vector of marginal posterior variable inclusion probabilities (PVIP) proposed in Linero (2018); Only useful
when DART is fit, i.e., |
varprob.mean |
A vector of posterior split probabilities (PSP) proposed in Linero (2018); Only useful when DART is fit,
i.e., |
mr.mean |
A matrix with |
mi |
A vector of Metropolis importance (MI) proposed in Luo and Daniels (2021). |
rm.const |
A vector of indicators for the predictors (after dummification) used in BART; when the indicator is negative, it refers to remove that predictor. |
ndpost |
The number of posterior samples returned. |
Chuji Luo: cjluo@ufl.edu and Michael J. Daniels: daniels@ufl.edu.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266–298.
Linero, A. R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." J. Amer. Statist. Assoc. 113 626–636.
Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.
Rockova V, Saha E (2019). “On theory for BART.” In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839–2848). PMLR.
Sparapani, R., Spanbauer, C. and McCulloch, R. (2021). "Nonparametric machine learning and efficient computation with bayesian additive regression trees: the BART R package." J. Stat. Softw. 97 1–66.
wbart
.
## simulate data (Scenario C.M.1. in Luo and Daniels (2021)) set.seed(123) data = mixone(100, 10, 1, FALSE) ## parallel::mcparallel/mccollect do not exist on windows if(.Platform$OS.type=='unix') { ## test mc.wbart() function res = mc.wbart(data$X, data$Y, ntree=10, nskip=100, ndpost=100, mc.cores=2) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.