bart | R Documentation |
Run the BART algorithm for supervised learning.
bart(
X_train,
y_train,
leaf_basis_train = NULL,
rfx_group_ids_train = NULL,
rfx_basis_train = NULL,
X_test = NULL,
leaf_basis_test = NULL,
rfx_group_ids_test = NULL,
rfx_basis_test = NULL,
num_gfr = 5,
num_burnin = 0,
num_mcmc = 100,
previous_model_json = NULL,
previous_model_warmstart_sample_num = NULL,
general_params = list(),
mean_forest_params = list(),
variance_forest_params = list()
)
X_train |
Covariates used to split trees in the ensemble. May be provided either as a dataframe or a matrix. Matrix covariates will be assumed to be all numeric. Covariates passed as a dataframe will be preprocessed based on the variable types (e.g. categorical columns stored as unordered factors will be one-hot encoded, categorical columns stored as ordered factors will passed as integers to the core algorithm, along with the metadata that the column is ordered categorical). |
y_train |
Outcome to be modeled by the ensemble. |
leaf_basis_train |
(Optional) Bases used to define a regression model |
rfx_group_ids_train |
(Optional) Group labels used for an additive random effects model. |
rfx_basis_train |
(Optional) Basis for "random-slope" regression in an additive random effects model.
If |
X_test |
(Optional) Test set of covariates used to define "out of sample" evaluation data.
May be provided either as a dataframe or a matrix, but the format of |
leaf_basis_test |
(Optional) Test set of bases used to define "out of sample" evaluation data.
While a test set is optional, the structure of any provided test set must match that
of the training set (i.e. if both |
rfx_group_ids_test |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis_test |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
num_gfr |
Number of "warm-start" iterations run using the grow-from-root algorithm (He and Hahn, 2021). Default: 5. |
num_burnin |
Number of "burn-in" iterations of the MCMC sampler. Default: 0. |
num_mcmc |
Number of "retained" iterations of the MCMC sampler. Default: 100. |
previous_model_json |
(Optional) JSON string containing a previous BART model. This can be used to "continue" a sampler interactively after inspecting the samples or to run parallel chains "warm-started" from existing forest samples. Default: |
previous_model_warmstart_sample_num |
(Optional) Sample number from |
general_params |
(Optional) A list of general (non-forest-specific) model parameters, each of which has a default value processed internally, so this argument list is optional.
|
mean_forest_params |
(Optional) A list of mean forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
variance_forest_params |
(Optional) A list of variance forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
List of sampling outputs and a wrapper around the sampled forests (which can be used for in-memory prediction on new data, or serialized to JSON on disk).
n <- 100
p <- 5
X <- matrix(runif(n*p), ncol = p)
f_XW <- (
((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) +
((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) +
((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) +
((0.75 <= X[,1]) & (1 > X[,1])) * (7.5)
)
noise_sd <- 1
y <- f_XW + rnorm(n, 0, noise_sd)
test_set_pct <- 0.2
n_test <- round(test_set_pct*n)
n_train <- n - n_test
test_inds <- sort(sample(1:n, n_test, replace = FALSE))
train_inds <- (1:n)[!((1:n) %in% test_inds)]
X_test <- X[test_inds,]
X_train <- X[train_inds,]
y_test <- y[test_inds]
y_train <- y[train_inds]
bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test,
num_gfr = 10, num_burnin = 0, num_mcmc = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.