sequential_imputation: Sequential Imputation for Missing Data

View source: R/sequential_imputation.R

sequential_imputationR Documentation

Sequential Imputation for Missing Data

Description

Implements sequential imputation for missing covariates and outcomes in longitudinal data. The function uses a Bayesian non-parametric framework with mixed-effects models to handle both normal and non-normal random effects and errors. It sequentially imputes missing values by constructing univariate models in a fixed order, ensuring simplicity and consistency with a valid joint distribution.

Usage

sequential_imputation(
  X,
  Y,
  Z = NULL,
  subject_id,
  type,
  binary_outcome = FALSE,
  model = c("BMTrees", "BMTrees_R", "BMTrees_RE", "mixedBART"),
  nburn = 0L,
  npost = 3L,
  skip = 1L,
  verbose = TRUE,
  seed = NULL,
  tol = 1e-20,
  resample = 5,
  ntrees = 200,
  reordering = TRUE,
  pi_CDP = 0.99
)

Arguments

X

A matrix of missing covariates.

Y

A vector of missing outcomes (numeric or logical).

Z

A matrix of complete random predictors.

subject_id

A vector of subject IDs corresponding to the rows of X and Y. Can be both integer or character

type

A logical vector indicating whether each covariate in X is binary (1) or continuous (0).

binary_outcome

A logical value indicating whether the outcome Y is binary (1) or continuous (0). Default: 0.

model

A character vector specifying the imputation model. Options are "BMTrees", "BMTrees_R", "BMTrees_RE", and "mixedBART". Default: "BMTrees".

nburn

An integer specifying the number of burn-in iterations. Default: 0.

npost

An integer specifying the number of sampling iterations. Default: 3.

skip

An integer specifying the interval for keeping samples in the sampling phase. Default: 1.

verbose

A logical value indicating whether to display progress and MCMC information. Default: TRUE.

seed

A random seed for reproducibility. Default: NULL.

tol

A small numerical tolerance to prevent numerical overflow or underflow in the model. Default: 1e-20.

resample

An integer specifying the number of resampling steps for the CDP prior. Default: 5. This parameter is only valid for "BMTrees" and "BMTrees_R".

ntrees

An integer specifying the number of trees in BART. Default: 200.

reordering

A logical value indicating whether to apply a reordering strategy for sorting covariates. Default: TRUE.

pi_CDP

A value between 0 and 1 for calculating the empirical prior in the CDP prior. Default: 0.99.

Details

The function builds on the Bayesian Trees Mixed-Effects Model (BMTrees), which extends Mixed-Effects BART by using centralized Dirichlet Process (CDP) Normal Mixture priors. This framework handles non-normal random effects and errors, addresses model misspecification, and captures complex relationships. The function employs a Metropolis-Hastings MCMC method to sequentially impute missing values.

Value

A three-dimensional array of imputed data with dimensions (npost / skip, N, p + 1), where:

  • N is the number of observations.

  • p is the number of covariates in X. The array includes imputed covariates and outcomes.

Note

This function utilizes modified C++ code originally derived from the BART3 package (Bayesian Additive Regression Trees). The original package was developed by Rodney Sparapani and is licensed under GPL-2. Modifications were made by Jungang Zou, 2024.

References

For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3

Examples


data <- simulation_imputation(n_subject = 100, seed = 1234, nonrandeff = TRUE, 
        nonresidual = TRUE, alligned = FALSE) 

# To make it faster to compile and check, we only run 30 iterations for burn-in 
# and 40 for posterior sampling phases.
# Please increase to 3000 and 4000 iterations, respectively, when running the model.
model <- sequential_imputation(data$X_mis, data$Y_mis, data$Z, data$subject_id, 
        rep(0, 9), binary_outcome = FALSE, model = "BMTrees", nburn = 30L, 
        npost = 40L, skip = 2L, verbose = TRUE, seed = 1234)
model$imputed_data


SBMTrees documentation built on April 3, 2025, 6:10 p.m.