| VCBART_ind | R Documentation |
Fit a varying coefficient model to panel data. Assumes residual errors are independent within and between subjects. See Deshpande et al. (2024) for details about the model and MCMC sampler.
VCBART_ind(Y_train,subj_id_train, ni_train,X_train,
Z_cont_train = matrix(0, nrow = 1, ncol = 1),
Z_cat_train = matrix(0L, nrow = 1, ncol = 1),
X_test = matrix(0, nrow = 1, ncol = 1),
Z_cont_test = matrix(0, nrow = 1, ncol = 1),
Z_cat_test = matrix(0, nrow = 1, ncol = 1),
unif_cuts = rep(TRUE, times = ncol(Z_cont_train)),
cutpoints_list = NULL,
cat_levels_list = NULL,
edge_mat_list = NULL,
graph_split = rep(FALSE, times = ncol(Z_cat_train)),
sparse = TRUE,
M = 50,
mu0 = NULL, tau = NULL, nu = NULL, lambda = NULL,
nd = 1000, burn = 1000, thin = 1,
save_samples = TRUE, save_trees = TRUE,
verbose = TRUE, print_every = floor( (nd*thin + burn)/10))
Y_train |
Vector of continous responses for training data |
ni_train |
Vector containing the number of observations per subject in the training data. |
subj_id_train |
Vector of length |
X_train |
Matrix of covariates for training observations. Do not include intercept as the first column. |
Z_cont_train |
Matrix of continuous modifiers for training data. Note, modifiers must be rescaled to lie in the interval [-1,1]. Default is a 1x1 matrix, which signals that there are no continuous modifiers in the training data. |
Z_cat_train |
Integer matrix of categorical modifiers for training data. Note categorical levels should be 0-indexed. That is, if a categorical modifier has 10 levels, the values should run from 0 to 9. Default is a 1x1 matrix, which signals that there are no categorical modifiers in the training data. |
X_test |
Matrix of covariate for testing observations. Default is a 1x1 matrix, which signals that testing data is not provided. |
Z_cont_test |
Matrix of continuous modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided. |
Z_cat_test |
Integer matrix of categorical modifiers for testing data. Default is a 1x1 matrix, which signals that testing data is not provided. |
unif_cuts |
Vector of logical values indicating whether cutpoints for each continuous modifier should be drawn from a continuous uniform distribution ( |
cutpoints_list |
List of length |
cat_levels_list |
List of length |
edge_mat_list |
List of adjacency matrices if any of the categorical modifiers are network-structured. Default is |
graph_split |
Vector of logicals indicating whether each categorical modifier is network-structured. Default is |
sparse |
Logical, indicating whether or not to perform variable selection in each tree ensemble based on a sparse Dirichlet prior rather than uniform prior; see Linero 2018. Default is |
M |
Number of trees in each ensemble. Default is 50. |
mu0 |
Prior mean for jumps/leaf parameters. Default is 0 for each beta function. If supplied, must be a vector of length |
tau |
Prior standard deviation for jumps/leaf parameters. Default is |
nu |
Degrees of freedom for scaled-inverse chi-square prior on sigma^2. Default is 3. |
lambda |
Scale hyperparameter for scaled-inverse chi-square prior on sigma^2. Default places 90% prior probability that sigma is less than |
nd |
Number of posterior draws to return. Default is 1000. |
burn |
Number of MCMC iterations to be treated as "warmup" or "burn-in". Default is 1000. |
thin |
Number of post-warmup MCMC iteration by which to thin. Default is 1. |
save_samples |
Logical, indicating whether to return all posterior samples. Default is |
save_trees |
Logical, indicating whether or not to save a text-based representation of the tree samples. This representation can be passed to |
verbose |
Logical, inciating whether to print progress to R console. Default is |
print_every |
As the MCMC runs, a message is printed every |
Given p covariates X_{1}, \ldots, X_{p} and r effect modifiers Z_{1}, \ldots, Z_{r}, the varying coefficient model asserts that
E[Y \vert X = x, Z = ] = \beta_0(z) + \beta_1(z) * x_1 + ... \beta_p(z) * X_p.
That is, for any r-vector Z, the relationships between X and Y is linear.
However, the specific relationship is allowed to vary with respect tp Z.
VCBART approximates the covariate effect functions \beta_0(Z), \ldots, \beta_p(Z) using ensembles of regression trees.
This function assumes that the within-subject errors are independent.
A list containing
y_mean |
Mean of the training observations (needed by |
y_sd |
Standard deviation of the training observations (needed by |
x_mean |
Vector of means of columns of |
x_sd |
Vector of standard deviations of |
yhat.train.mean |
Vector containing posterior mean of evaluations of regression function E[y|x,z] on training data. |
betahat.train.mean |
Matrix with |
yhat.train |
Matrix with |
betahat.train |
Array of dimension with |
yhat.test.mean |
Vector containing posterior mean of evaluations of regression function E[y|x,z] on testing data. |
betahat.test.mean |
Matrix with |
yhat.test |
Matrix with |
betahat.test |
Array of size |
sigma |
Vector containing ALL samples of the residual standard deviation, including warmup. |
varcounts |
Array of size |
theta |
If |
trees |
A list (of length |
Deshpande, S.K, Bai, R., Balocchi, C., Starling, J., and Weiss, J. (2026). VCBART: Bayesian trees for varying coefficients. Bayesian Analysis. 21(1):281–308. \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1214/24-BA1470")}
############
# True beta functions
beta0_true <- function(Z){
tmp_Z <- (Z+1)/2
return( 3 * tmp_Z[,1] +
(2 - 5 * (tmp_Z[,2] > 0.5)) * sin(pi * tmp_Z[,1]) -
2 * (tmp_Z[,2] > 0.5))
}
beta1_true <- function(Z){
tmp_Z <- (Z+1)/2
return(sin(2*tmp_Z[,1] + 0.5)/(4*tmp_Z[,1] + 1) + (2*tmp_Z[,1] - 0.5)^3)
}
beta2_true <- function(Z){
tmp_Z <- (Z+1)/2
return( (3 - 3*cos(6*pi*tmp_Z[,1]) * tmp_Z[,1]^2) * (tmp_Z[,1] > 0.6) -
(10 * sqrt(tmp_Z[,1])) * (tmp_Z[,1] < 0.25) )
}
################
# Set problem dimensions
###############
set.seed(417)
n_all <- 500
ni_all <- rep(4, times = n_all) # 4 observations per subject
subj_id_all <- rep(1:n_all, each = 4) # give every subject an id number
N_all <- sum(ni_all) # total number of observations
p <- 2 # number of covariates
R_cont <- 20 # number of continuous modifiers
R_cat <- 0 # number of categorical modifiers
R <- R_cont + R_cat
################
# Generate covariates & modifiers
################
X_all <-
matrix(rnorm(N_all*p, mean = 0, sd = 1), nrow = N_all, ncol = p)
Z_cont_all <-
matrix(runif(N_all * R_cont, min = -1, max = 1), nrow = N_all, ncol = R_cont)
################
# Define true coefficient functions & noise level
###############
beta0_all <- beta0_true(Z_cont_all)
beta1_all <- beta1_true(Z_cont_all)
beta2_all <- beta2_true(Z_cont_all)
beta_all <- cbind(beta0_all, beta1_all, beta2_all)
sigma <- 0.1
################
# Generate response surface & outcomes
###############
mu_all <- beta0_all + X_all[,1] * beta1_all + X_all[,2] * beta2_all
Y_all <- mu_all + sigma * rnorm(n = N_all, mean = 0, sd = 1)
## Token run to ensure installation works
fit <-
VCBART_ind(Y_train = Y_all,
subj_id_train = subj_id_all,
ni_train = ni_all,
X_train = X_all,
Z_cont_train = Z_cont_all,
nd = 5, burn = 5,
verbose = FALSE)
## Longer example
fit <-
VCBART_ind(Y_train = Y_all,
subj_id_train = subj_id_all,
ni_train = ni_all,
X_train = X_all,
Z_cont_train = Z_cont_all,
verbose = FALSE)
oldpar <- par(no.readonly = TRUE)
par(mar = c(3,3,2,1), mgp = c(1.8, 0.5, 0), mfrow = c(1,2))
plot(beta_all, fit$betahat.train.mean,
pch = 16, cex = 0.5,
xlab = "Actual", ylab = "Posterior Mean",
main = "Coefficients")
abline(a = 0, b = 1, col = 'blue')
plot(mu_all, fit$yhat.train.mean,
pch = 16, cex = 0.5,
xlab = "Actual", ylab = "Posterior Mean",
main = "Regression Function E[Y|X,Z]")
abline(a = 0, b = 1, col = 'blue')
par(oldpar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.