View source: R/sequential_imputation.R
sequential_imputation | R Documentation |
Implements sequential imputation for missing covariates and outcomes in longitudinal data. The function uses a Bayesian non-parametric framework with mixed-effects models to handle both normal and non-normal random effects and errors. It sequentially imputes missing values by constructing univariate models in a fixed order, ensuring simplicity and consistency with a valid joint distribution.
sequential_imputation(
X,
Y,
Z = NULL,
subject_id,
type,
binary_outcome = FALSE,
model = c("BMTrees", "BMTrees_R", "BMTrees_RE", "mixedBART"),
nburn = 0L,
npost = 3L,
skip = 1L,
verbose = TRUE,
seed = NULL,
tol = 1e-20,
resample = 5,
ntrees = 200,
reordering = TRUE,
pi_CDP = 0.99
)
X |
A matrix of missing covariates. |
Y |
A vector of missing outcomes (numeric or logical). |
Z |
A matrix of complete random predictors. |
subject_id |
A vector of subject IDs corresponding to the rows of |
type |
A logical vector indicating whether each covariate in |
binary_outcome |
A logical value indicating whether the outcome |
model |
A character vector specifying the imputation model. Options are |
nburn |
An integer specifying the number of burn-in iterations. Default: |
npost |
An integer specifying the number of sampling iterations. Default: |
skip |
An integer specifying the interval for keeping samples in the sampling phase. Default: |
verbose |
A logical value indicating whether to display progress and MCMC information. Default: |
seed |
A random seed for reproducibility. Default: |
tol |
A small numerical tolerance to prevent numerical overflow or underflow in the model. Default: |
resample |
An integer specifying the number of resampling steps for the CDP prior. Default: |
ntrees |
An integer specifying the number of trees in BART. Default: |
reordering |
A logical value indicating whether to apply a reordering strategy for sorting covariates. Default: |
pi_CDP |
A value between 0 and 1 for calculating the empirical prior in the CDP prior. Default: |
The function builds on the Bayesian Trees Mixed-Effects Model (BMTrees), which extends Mixed-Effects BART by using centralized Dirichlet Process (CDP) Normal Mixture priors. This framework handles non-normal random effects and errors, addresses model misspecification, and captures complex relationships. The function employs a Metropolis-Hastings MCMC method to sequentially impute missing values.
A three-dimensional array of imputed data with dimensions (npost / skip, N, p + 1)
, where:
N
is the number of observations.
p
is the number of covariates in X
.
The array includes imputed covariates and outcomes.
This function utilizes modified C++ code originally derived from the BART3 package (Bayesian Additive Regression Trees). The original package was developed by Rodney Sparapani and is licensed under GPL-2. Modifications were made by Jungang Zou, 2024.
For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3
data <- simulation_imputation(n_subject = 100, seed = 1234, nonrandeff = TRUE,
nonresidual = TRUE, alligned = FALSE)
# To make it faster to compile and check, we only run 30 iterations for burn-in
# and 40 for posterior sampling phases.
# Please increase to 3000 and 4000 iterations, respectively, when running the model.
model <- sequential_imputation(data$X_mis, data$Y_mis, data$Z, data$subject_id,
rep(0, 9), binary_outcome = FALSE, model = "BMTrees", nburn = 30L,
npost = 40L, skip = 2L, verbose = TRUE, seed = 1234)
model$imputed_data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.