Description Usage Arguments Details Value Author(s) References See Also Examples
BART is a Bayesian “sumoftrees” model in which each tree is constrained by a prior to be a weak learner.
For numeric response y = f(x) + ε, where ε ~ N(0, σ^2).
For binary response y, P(Y = 1  x) = Φ(f(x)), where Φ denotes the standard normal cdf (probit link).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  bart(x.train, y.train, x.test = matrix(0.0, 0, 0),
sigest = NA, sigdf = 3, sigquant = 0.90,
k = 2.0,
power = 2.0, base = 0.95,
binaryOffset = 0.0, weights = NULL,
ntree = 200,
ndpost = 1000, nskip = 100,
printevery = 100, keepevery = 1, keeptrainfits = TRUE,
usequants = FALSE, numcut = 100, printcutoffs = 0,
verbose = TRUE, nchain = 1, nthread = 1, combinechains = TRUE,
keeptrees = FALSE, keepcall = TRUE, sampleronly = FALSE)
bart2(formula, data, test, subset, weights, offset, offset.test = offset,
sigest = NA_real_, sigdf = 3.0, sigquant = 0.90,
k = NULL,
power = 2.0, base = 0.95,
n.trees = 75L,
n.samples = 500L, n.burn = 500L,
n.chains = 4L, n.threads = min(guessNumCores(), n.chains), combineChains = FALSE,
n.cuts = 100L, useQuantiles = FALSE,
n.thin = 1L, keepTrainingFits = TRUE,
printEvery = 100L, printCutoffs = 0L,
verbose = TRUE, keepTrees = FALSE,
keepCall = TRUE, samplerOnly = FALSE, ...)
## S3 method for class 'bart'
plot(x,
plquants = c(0.05, 0.95), cols = c('blue', 'black'),
...)
## S3 method for class 'bart'
predict(object, newdata, offset,
type = c("ev", "ppd", "bart"),
combineChains = TRUE, ...)
extract(object, ...)
## S3 method for class 'bart'
extract(object,
type = c("ev", "ppd", "bart"),
sample = c("train", "test"),
combineChains = TRUE, ...)
## S3 method for class 'bart'
fitted(object,
type = c("ev", "ppd", "bart"),
sample = c("train", "test"),
...)

x.train 
Explanatory variables for training (in sample) data. May be a matrix or a data frame, with rows corresponding to observations and columns to variables. If a variable is a factor in a data frame, it is replaced with dummies. Note that q dummies are created if q > 2 and one dummy is created if q = 2, where q is the number of levels of the factor. 
y.train 
Dependent variable for training (in sample) data. If 
x.test 
Explanatory variables for test (out of sample) data. Should have same column structure as

sigest 
For continuous response models, an estimate of the error variance, σ^2,
used to calibrate an inversechisquared prior used on that parameter. If not supplied,
the leastsquares estimate is derived instead. See 
sigdf 
Degrees of freedom for error variance prior. Not applicable when y is binary. 
sigquant 
The quantile of the error variance prior that the rough estimate
( 
k 
For numeric y, 
power 
Power parameter for tree prior. 
base 
Base parameter for tree prior. 
binaryOffset 
Used for binary y. When present, the model is P(Y = 1  x) = Φ(f(x) + binaryOffset), allowing fits with probabilities shrunk towards values other than 0.5. 
weights 
An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations y  x ~ N(f(x), σ^2 / w), where f(x) is the unknown function. 
ntree, n.trees 
The number of trees in the sumoftrees formulation. 
ndpost, n.samples 
The number of posterior draws after burn in, 
nskip, n.burn 
Number of MCMC iterations to be treated as burn in. 
printevery, printEvery 
As the MCMC runs, a message is printed every 
keepevery, n.thin 
Every 
keeptrainfits, keepTrainingFits 
If 
usequants, useQuantiles 
When 
numcut, n.cuts 
The maximum number of possible values used in decision rules (see 
printcutoffs, printCutoffs 
The number of cutoff rules to printed to screen before the MCMC is run. Given a single integer, the same value will be used for all variables. If 0, nothing is printed. 
verbose 
Logical; if 
nchain, n.chains 
Integer specifying how many independent tree sets and fits should be calculated. 
nthread, n.threads 
Integer specifying how many threads to use. Depending on the CPU architecture, using more than the number of chains can degrade performance for small/medium data sets. As such some calculations may be executed single threaded regardless. 
combinechains, combineChains 
Logical; if 
keeptrees, keepTrees 
Logical; must be 
keepcall, keepCall 
Logical; if 
formula 
The same as 
data 
The same as 
test 
The same as 
subset 
A vector of logicals or indicies used to subset of the data. Can be missing. 
offset 
The same as 
offset.test 
A vector of offsets to be used with test data, in case it is different than the training offset.
If 
object 
An object of class 
newdata 
Test data for prediction. Obeys all the same rules as 
sampleronly, samplerOnly 
Builds the sampler from its arguments and returns it without running it. Useful to use the

x 
Object of class 
plquants 
In the plots, beliefs about f(x) are indicated by plotting the
posterior median and a lower and upper quantile. 
cols 
Vector of two colors. First color is used to plot the median of f(x) and the second color is used to plot the lower and upper quantiles. 
type 
The quantity to be returned by generic functions. Options are 
sample 
Either 
... 
Additional arguments passed on to 
BART is an Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f, σ)  (x, y) in the numeric y case and just f in the binary y case.
Thus, unlike a lot of other modeling methods in R, bart
does not produce a single model object
from which fits and summaries may be extracted. The output consists of values
f*(x) (and σ* in the numeric case) where * denotes a particular draw.
The x is either a row from the training data (x.train
) or the test data (x.test
).
Decision rules for any tree are of the form x ≤ c vs. x > c
for each ‘x’ corresponding to a column of x.train
.
usequants
determines the means by which the set of possible c is
determined. If usequants
is TRUE
, then the c are
a subset of the values interpolated halfway between the unique, sorted values
obtained from the corresponding column of x.train
.
If usequants
is FALSE
, the cutoffs are equally spaced across the
range of values taken on by the corresponding
column of x.train
.
The number of possible values of c is determined by numcut
.
If usequants
is FALSE
, numcut
equally spaced cutoffs
are used covering the range of values in the corresponding
column of x.train
. If usequants
is TRUE
, then for a variable
the minimum of numcut
and one less than the number
of unique elements for that variable are used.
k
The amount of shrinkage of the node parameters is controlled by k
.
k
can be given as either a fixed, positive number, or as any value
that can be used to build a supported hyperprior. At present, only
χ_ν s priors are supported, where ν is a degrees of freedom
and s is a scale. Both values must be positive, however the scale can
be infinite which yields an improper prior, interpretted as just the polynomial
part of the distribution. If nu is 1 and s is ∞, the
prior is “flat”.
For BART on binary outcomes, the degree of overfitting can be highly sensitive to
k
so it is encouraged to consider a number of values. The default
hyperprior for binary BART, chi(1.25, Inf)
, has been shown to work well
in a large number of datasets, however crossvalidation may be helpful. Running
for a short time with a flat prior may be helpful to see the range of values of
k
that are consistent with the data.
bart
and rbart_vi
support fitted
to return the
posterior mean of a predicted quantity, as well as predict
to
return a set of posterior samples for a different sample. In addition, the
extract
generic can be used to obtain the posterior samples for the
training data or test data supplied during the initial fit.
Using predict
with a bart
object requires that it be fitted with the
option keeptrees
/keepTrees
as TRUE
. Keeping the trees for
a fit can require a sizeable amount of memory and is off by default.
All generics return values on the scale of expected value of the response by
default. This means that predict
, extract
, and fitted
for binary outcomes return probabilities unless specifically the sumoftrees
component is requested (type = "bart"
). This is in contrast to
yhat.train
/yhat.test
that are returned with the fitted model.
save
ing and load
ing fitted BART objects for use with
predict
requires that R's serialization mechanism be able to access the
underlying trees, in addition to being fit with keeptrees
/keepTrees
as TRUE
. For memory purposes, the trees are not stored as R objects unless
specifically requested. To do this, one must “touch” the sampler's state
object before saving, e.g. for a fitted object bartFit
, execute
invisible(bartFit$fit$state)
.
bart
returns a list assigned class bart
. For applicable
quantities, ndpost / keepevery
samples are returned.
In the numeric y case, the list has components:

A array/matrix of posterior samples. The (i, j, k) value is the jth draw of
the posterior of f evaluated at the kth row of 

Same as 

Vector of means of 

Vector of means of 

Matrix of posterior samples of 

Burnin draws of 

A matrix with number of rows equal to the number of kept draws and each column corresponding to a training variable. Contains the total count of the number of times that variable is used in a tree decision rule (over all trees). 

The rough error standard deviation (σ) used in the prior. 

The input dependent vector of values for the dependent variable.
This is used in 

Optional sampler object which stores the values of the tree splits. Required for using


Information that can be lost if 

Optional matrix of posterior samples of 
In the binary y case, the returned list has the components
yhat.train
, yhat.test
, and varcount
as above. In addition the list
has a binaryOffset
component giving the value used.
Note that in the binary y, case yhat.train
and yhat.test
are
f(x) + binaryOffset. For draws of the probability
P(Y = 1  x), apply the normal cdf (pnorm
) to these values.
The plot
method sets mfrow
to c(1,2)
and makes two plots.
The first plot is the sequence of kept draws of σ
including the burnin draws. Initially these draws will decline as BART finds fit
and then level off when the MCMC has burnt in. The second plot has y on the
horizontal axis and posterior intervals for the corresponding f(x) on the vertical axis.
Hugh Chipman: [email protected],
Robert McCulloch: [email protected],
Vincent Dorie: [email protected].
Chipman, H., George, E., and McCulloch, R. (2009) BART: Bayesian Additive Regression Trees.
Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning. Advances in Neural Information Processing Systems 19, Scholkopf, Platt and Hoffman, Eds., MIT Press, Cambridge, MA, 265272.
both of the above at: http://www.robmcculloch.org
Friedman, J.H. (1991) Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ## simulate data (example from Friedman MARS paper)
## y = f(x) + epsilon , epsilon ~ N(0, sigma)
## x consists of 10 variables, only first 5 matter
f < function(x) {
10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3]  0.5)^2 +
10 * x[,4] + 5 * x[,5]
}
set.seed(99)
sigma < 1.0
n < 100
x < matrix(runif(n * 10), n, 10)
Ey < f(x)
y < rnorm(n, Ey, sigma)
## run BART
set.seed(99)
bartFit < bart(x, y)
plot(bartFit)
## compare BART fit to linear matter and truth = Ey
lmFit < lm(y ~ ., data.frame(x, y))
fitmat < cbind(y, Ey, lmFit$fitted, bartFit$yhat.train.mean)
colnames(fitmat) < c('y', 'Ey', 'lm', 'bart')
print(cor(fitmat))

Running BART with numeric y
number of trees: 200
Prior:
k: 2.000000
degrees of freedom in sigma prior: 3.000000
quantile in sigma prior: 0.900000
scale in sigma prior: 0.002181
power and base for tree prior: 2.000000 0.950000
use quantiles for rule cut points: false
data:
number of training observations: 100
number of test observations: 0
number of explanatory variables: 10
init sigma: 2.756573, curr sigma: 2.756573
Cutoff rules c in x<=c vs x>c
Number of cutoffs: (var: number of possible c):
(1: 100) (2: 100) (3: 100) (4: 100) (5: 100)
(6: 100) (7: 100) (8: 100) (9: 100) (10: 100)
Running mcmc loop:
iteration: 100 (of 1000)
iteration: 200 (of 1000)
iteration: 300 (of 1000)
iteration: 400 (of 1000)
iteration: 500 (of 1000)
iteration: 600 (of 1000)
iteration: 700 (of 1000)
iteration: 800 (of 1000)
iteration: 900 (of 1000)
iteration: 1000 (of 1000)
total seconds in loop: 0.766282
Tree sizes, last iteration:
2 2 2 2 3 2 3 1 2 3 2 3 2 2 3 3 2 3 2 2
2 3 3 2 2 1 3 3 2 3 2 2 5 5 2 2 3 2 2 2
2 2 3 2 3 2 3 2 1 2 2 3 3 3 3 2 3 2 2 2
3 3 1 2 2 3 3 2 2 3 3 4 2 2 2 3 2 1 2 1
4 2 3 2 2 2 2 2 2 5 3 4 2 2 2 2 2 2 4 2
3 2 2 2 2 3 3 2 2 2 3 2 2 3 1 3 2 2 2 2
2 2 2 3 3 2 2 3 2 2 2 3 2 2 3 2 4 2 2 2
3 2 2 2 2 1 5 3 2 2 3 2 1 2 3 2 2 2 2 2
2 2 2 3 3 4 4 2 3 3 2 2 3 2 3 2 2 3 2 2
4 1 2 2 2 2 2 2 2 2 2 2 3 2 2 2 1 2 3 2
Variable Usage, last iteration (var:count):
(1: 37) (2: 23) (3: 24) (4: 37) (5: 27)
(6: 23) (7: 34) (8: 20) (9: 28) (10: 18)
DONE BART
y Ey lm bart
y 1.0000000 0.9847984 0.8841787 0.9982682
Ey 0.9847984 1.0000000 0.9009389 0.9887574
lm 0.8841787 0.9009389 1.0000000 0.8985059
bart 0.9982682 0.9887574 0.8985059 1.0000000
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.