bcf | R Documentation |
Run the Bayesian Causal Forest (BCF) algorithm for regularized causal effect estimation.
bcf(
X_train,
Z_train,
y_train,
propensity_train = NULL,
rfx_group_ids_train = NULL,
rfx_basis_train = NULL,
X_test = NULL,
Z_test = NULL,
propensity_test = NULL,
rfx_group_ids_test = NULL,
rfx_basis_test = NULL,
num_gfr = 5,
num_burnin = 0,
num_mcmc = 100,
previous_model_json = NULL,
previous_model_warmstart_sample_num = NULL,
general_params = list(),
prognostic_forest_params = list(),
treatment_effect_forest_params = list(),
variance_forest_params = list()
)
X_train |
Covariates used to split trees in the ensemble. May be provided either as a dataframe or a matrix. Matrix covariates will be assumed to be all numeric. Covariates passed as a dataframe will be preprocessed based on the variable types (e.g. categorical columns stored as unordered factors will be one-hot encoded, categorical columns stored as ordered factors will passed as integers to the core algorithm, along with the metadata that the column is ordered categorical). |
Z_train |
Vector of (continuous or binary) treatment assignments. |
y_train |
Outcome to be modeled by the ensemble. |
propensity_train |
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data. |
rfx_group_ids_train |
(Optional) Group labels used for an additive random effects model. |
rfx_basis_train |
(Optional) Basis for "random-slope" regression in an additive random effects model.
If |
X_test |
(Optional) Test set of covariates used to define "out of sample" evaluation data.
May be provided either as a dataframe or a matrix, but the format of |
Z_test |
(Optional) Test set of (continuous or binary) treatment assignments. |
propensity_test |
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data. |
rfx_group_ids_test |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis_test |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
num_gfr |
Number of "warm-start" iterations run using the grow-from-root algorithm (He and Hahn, 2021). Default: 5. |
num_burnin |
Number of "burn-in" iterations of the MCMC sampler. Default: 0. |
num_mcmc |
Number of "retained" iterations of the MCMC sampler. Default: 100. |
previous_model_json |
(Optional) JSON string containing a previous BCF model. This can be used to "continue" a sampler interactively after inspecting the samples or to run parallel chains "warm-started" from existing forest samples. Default: |
previous_model_warmstart_sample_num |
(Optional) Sample number from |
general_params |
(Optional) A list of general (non-forest-specific) model parameters, each of which has a default value processed internally, so this argument list is optional.
|
prognostic_forest_params |
(Optional) A list of prognostic forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
treatment_effect_forest_params |
(Optional) A list of treatment effect forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
variance_forest_params |
(Optional) A list of variance forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
List of sampling outputs and a wrapper around the sampled forests (which can be used for in-memory prediction on new data, or serialized to JSON on disk).
n <- 500
p <- 5
X <- matrix(runif(n*p), ncol = p)
mu_x <- (
((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) +
((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) +
((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) +
((0.75 <= X[,1]) & (1 > X[,1])) * (7.5)
)
pi_x <- (
((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) +
((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) +
((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) +
((0.75 <= X[,1]) & (1 > X[,1])) * (0.8)
)
tau_x <- (
((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) +
((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) +
((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) +
((0.75 <= X[,2]) & (1 > X[,2])) * (2.0)
)
Z <- rbinom(n, 1, pi_x)
noise_sd <- 1
y <- mu_x + tau_x*Z + rnorm(n, 0, noise_sd)
test_set_pct <- 0.2
n_test <- round(test_set_pct*n)
n_train <- n - n_test
test_inds <- sort(sample(1:n, n_test, replace = FALSE))
train_inds <- (1:n)[!((1:n) %in% test_inds)]
X_test <- X[test_inds,]
X_train <- X[train_inds,]
pi_test <- pi_x[test_inds]
pi_train <- pi_x[train_inds]
Z_test <- Z[test_inds]
Z_train <- Z[train_inds]
y_test <- y[test_inds]
y_train <- y[train_inds]
mu_test <- mu_x[test_inds]
mu_train <- mu_x[train_inds]
tau_test <- tau_x[test_inds]
tau_train <- tau_x[train_inds]
bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train,
propensity_train = pi_train, X_test = X_test, Z_test = Z_test,
propensity_test = pi_test, num_gfr = 10,
num_burnin = 0, num_mcmc = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.