BMTrees_prediction: Bayesian Trees Mixed-Effects Models for Predicting...

View source: R/BMTrees_prediction.R

BMTrees_predictionR Documentation

Bayesian Trees Mixed-Effects Models for Predicting Longitudinal Outcomes

Description

Provides predictions for outcomes in longitudinal data using Bayesian Trees Mixed-Effects Models (BMTrees) and its semiparametric variants. The function predicts values for test data while accounting for random effects, complex relationships, and potential model misspecification.

Usage

BMTrees_prediction(
  X_train,
  Y_train,
  Z_train,
  subject_id_train,
  X_test,
  Z_test,
  subject_id_test,
  model = c("BMTrees", "BMTrees_R", "BMTrees_RE", "mixedBART"),
  binary = FALSE,
  nburn = 3000L,
  npost = 4000L,
  skip = 1L,
  verbose = TRUE,
  seed = NULL,
  tol = 1e-20,
  resample = 5,
  ntrees = 200,
  pi_CDP = 0.99
)

Arguments

X_train

A matrix of covariates in the training set.

Y_train

A numeric or logical vector of outcomes in the training set.

Z_train

A matrix of random predictors in the training set.

subject_id_train

A character vector of subject IDs in the training set.

X_test

A matrix of covariates in the testing set.

Z_test

A matrix of random predictors in the testing set.

subject_id_test

A character vector of subject IDs in the testing set.

model

A character string specifying the predictive model. Options are "BMTrees", "BMTrees_R", "BMTrees_RE", and "mixedBART". Default: "BMTrees".

binary

Logical. Indicates whether the outcome is binary (TRUE) or continuous (FALSE). Default: FALSE.

nburn

An integer specifying the number of burn-in iterations for Gibbs sampler. Default: 3000L.

npost

An integer specifying the number of posterior samples to collect. Default: 4000L.

skip

An integer indicating the thinning interval for MCMC samples. Default: 1L.

verbose

Logical. If TRUE, displays MCMC progress. If FALSE, shows a progress bar. Default: TRUE.

seed

An optional integer for setting the random seed to ensure reproducibility. Default: NULL.

tol

A numeric tolerance value to prevent numerical overflow and underflow in the model. Default: 1e-20.

resample

An integer specifying the number of resampling steps for the CDP prior. Default: 5. This parameter is only valid for "BMTrees" and "BMTrees_R".

ntrees

An integer specifying the number of trees in BART. Default: 200.

pi_CDP

A value between 0 and 1 for calculating the empirical prior in the CDP prior. Default: 0.99.

Value

A list containing posterior samples and predictions:

post_tree_train

Posterior samples of the fixed-effects from BART on training data.

post_Sigma

Posterior samples of covariance matrices in random effects.

post_lambda_F

Posterior samples of lambda parameter in CDP normal mixture on random errors.

post_lambda_G

Posterior samples of lambda parameter in CDP normal mixture on random-effects.

post_B

Posterior samples of the coefficients in random effects.

post_random_effect_train

Posterior samples of random effects for training data.

post_sigma

Posterior samples of error deviation.

post_expectation_y_train

Posterior expectations of training data outcomes, equal to fixed-effects + random effects.

post_expectation_y_test

Posterior expectations of testing data outcomes, equal to fixed-effects + random effects.

post_predictive_y_train

Posterior predictive distributions for training outcomes, equal to fixed-effects + random effects + predictive residual.

post_predictive_y_test

Posterior predictive distributions for testing outcomes, equal to fixed-effects + random effects + predictive residual.

post_eta

Posterior samples of location parameters in CDP normal mixture on random errors.

post_mu

Posterior samples of location parameters in CDP normal mixture on random effects.

Note

This function utilizes modified C++ code originally derived from the BART3 package (Bayesian Additive Regression Trees). The original package was developed by Rodney Sparapani and is licensed under GPL-2. Modifications were made by Jungang Zou, 2024.

References

For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3

Examples


data = simulation_prediction(n_subject = 100, seed = 1234, nonlinear = TRUE, 
nonrandeff = TRUE, nonresidual = TRUE) 

# To make it faster to compile and check, we only run 30 iterations for burn-in 
# and 40 for posterior sampling phases.
# Please increase to 3000 and 4000 iterations, respectively, when running the model.
model = BMTrees_prediction(data$X_train, data$Y_train, data$Z_train, 
data$subject_id_train, data$X_test, data$Z_test, data$subject_id_test, model = "BMTrees", 
binary = FALSE, nburn = 30L, npost = 40L, skip = 1L, verbose = TRUE, seed = 1234)
model$post_predictive_y_test
model$post_sigma


SBMTrees documentation built on April 3, 2025, 6:10 p.m.