Description Usage Arguments Details Value Author(s) See Also Examples
Fits the BART model against varying k
, power
, base
, and ntree
parameters using Kfold or repeated random subsampling crossvalidation, sharing burnin between parameter settings. Results are given an array of evalulations of a loss functions on the heldout sets.
1 2 3 4 5 6  xbart(formula, data, subset, weights, offset, verbose = FALSE, n.samples = 200L,
method = c("kfold", "random subsample"), n.test = c(5, 0.2),
n.reps = 40L, n.burn = c(200L, 150L, 50L),
loss = c("rmse", "log", "mcr"), n.threads = guessNumCores(), n.trees = 75L,
k = 2, power = 2, base = 0.95, drop = TRUE,
resid.prior = chisq, control = dbartsControl(), sigma = NA_real_)

formula 
An object of class 
data 
An optional data frame, list, or environment containing predictors to be used with the
model. For backwards compatibility, can also be the 
subset 
An optional vector specifying a subset of observations to be used in the fitting process. 
weights 
An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations y  x ~ N(f(x), σ^2 / w), where f(x) is the unknown function. 
offset 
An optional vector specifying an offset from 0 for the relationship between the underyling function, f(x), and the response y. Only is useful for binary responses, in which case the model fit is to assume P(Y = 1  X = x) = Φ(f(x) + offset), where Φ is the standard normal cumulative distribution function. 
verbose 
A logical determining if additional output is printed to the console. 
n.samples 
A positive integer, setting the number of posterior samples drawn for each fit of training data and used by the loss function. 
method 
Character string, either 
n.test 
For each fit, the test sample size or proportion. For method 
n.reps 
A positive integer setting the number of cross validation steps that will be taken. For 
n.burn 
Between one and three positive integers, specifying the 1) initial burnin, 2) burnin when moving from
one parameter setting to another, and 3) the burnin between each random subsample replication. The third
parameter is also the burn in when moving between folds in 
loss 
Either a one of the present loss functions as characterstrings ( 
n.threads 
Across different sets of parameters ( 
n.trees 
A vector of positive integers setting the BART hyperparameter for the number of trees in the
sumoftrees formulation. See 
k 
A vector of positive real numbers, setting the BART hyperparameter for the nodemean prior standard deviation. 
power 
A vector of real numbers greater than one, setting the BART hyperparameter for the tree prior's growth probability, given by {base} / (1 + depth)^{power}. 
base 
A vector of real numbers in (0, 1), setting the BART hyperparameter for the tree prior's growth probability. 
drop 
Logical, determining if dimensions with a single value are dropped from the result. 
resid.prior 
An expression of the form 
control 
An object inheriting from 
sigma 
A positive numeric estimate of the residual standard deviation. If 
Crossvalidates n.reps
replications against the crossproduct of given hyperparameter vectors
n.trees
* k
* power
* base
.
For each fit, either one fold is withheld as test data and n.test  1
folds are used as
training data or n * n.test
observations are withheld as test data and n * (1  n.test)
used as training. A replication corresponds to fitting all K folds in "kfold"
crossvalidation
or a single fit with "random subsample"
. The training data is used to fit a model and make
predictions on the test data which are used together with the test data itself to evaluate the loss
function.
loss
functions are either the default of average logloss for binary outcomes and
rootmeansquared error for continuous outcomes, missclassification rates for binary outcomes, or a
function
with arguments y.test
and y.test.hat
. y.test.hat
is of dimensions
equal to length(y.test)
* n.samples
. A third option is to pass a list of
list(function, evaluationEnvironment)
, so as to provide default bindings. RMSE is a monotonic
transformation of the average logloss for continuous outcomes, so specifying logloss in that case
calculates RMSE instead.
An array of dimensions n.reps
* length(n.trees)
*
length(k)
* length(power)
* length(base)
.
If drop
is TRUE
, dimensions of length 1 are omitted. If all hyperparameters
are of length 1, then the result will be a vector of length n.reps
. When the result is an
array, the dimnames
of the result shall be set to the corresponding hyperparameters.
For method "kfold"
, each element is an average across the K fits. For
"random subsample"
, each element represents a single fit.
Vincent Dorie: [email protected]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  f < function(x) {
10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3]  0.5)^2 +
10 * x[,4] + 5 * x[,5]
}
set.seed(99)
sigma < 1.0
n < 100
x < matrix(runif(n * 10), n, 10)
Ey < f(x)
y < rnorm(n, Ey, sigma)
mad < function(y.train, y.train.hat)
mean(abs(y.train  apply(y.train.hat, 1L, mean)))
## low iteration numbers to to run quickly
xval < xbart(x, y, n.samples = 15L, n.reps = 4L, n.burn = c(10L, 3L, 1L),
n.trees = c(5L, 7L),
k = c(1, 2, 4),
power = c(1.5, 2),
base = c(0.75, 0.8, 0.95), n.threads = 1L,
loss = mad)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.