simulation_prediction_binary: Simulate Binary Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_binary

R Documentation

Simulate Binary Longitudinal Data for Prediction

Description

Generates synthetic longitudinal data with binary outcomes, designed for evaluating classification and prediction models. The function creates a latent continuous variable based on covariates and random effects, then converts it into binary outcomes using various link functions (corresponding to the residual argument).

Usage

simulation_prediction_binary(
  train_prop = 0.7,
  n_subject = 1000,
  n_obs_per_sub = 5,
  seed = NULL,
  nonlinear = FALSE,
  residual = c("normal", "logistic", "t3", "t2"),
  randeff = c("MVN", "MVN_mixture", "skewed_MVN", "MVT3", "MVT2")
)

Arguments

`train_prop`	A numeric value between 0 and 1 indicating the proportion of the population to be used for the training set. Default: `0.7`.
`n_subject`	An integer specifying the total number of subjects in the population. Default: `1000`.
`n_obs_per_sub`	An integer specifying the number of observations per subject. Default: `5`.
`seed`	An optional integer for setting the random seed to ensure reproducibility. Default: `NULL`.
`nonlinear`	A logical value. If `TRUE`, the latent variable is generated using a complex nonlinear function of the covariates. If `FALSE`, it is a linear combination. Default: `FALSE`.
`residual`	A character string specifying the link function (CDF) used to generate probabilities from the latent variable. This effectively acts as the error distribution assumption in a Generalized Linear Mixed Model (GLMM) context: `"normal"`: Uses the standard normal CDF (Probit link). `"logistic"`: Uses the logistic CDF (Logit link). `"t3"`: Uses the Student's t (df=3) CDF. `"t2"`: Uses the Student's t (df=2) CDF.
`randeff`	A character string specifying the distribution of the random effects added to the latent variable. Options are: `"MVN"`: Multivariate Normal distribution. `"MVN_mixture"`: Mixture of Multivariate Normal distributions. `"skewed_MVN"`: Multivariate Skew-normal distribution. `"MVT3"`: Multivariate t-distribution with 3 degrees of freedom. `"MVT2"`: Multivariate t-distribution with 2 degrees of freedom.

Details

The function simulates a latent continuous variable Y^* based on fixed effects (linear or nonlinear X) and random effects (Z * Bi). This latent variable is scaled and then transformed into a probability p using the CDF specified by residual.

For the training set, the observed outcome Y_train is sampled from a Bernoulli distribution with probability p. For the testing set, the function returns the probability p itself (Y_test), allowing for precise evaluation of the model's ability to estimate propensity scores or risk.

Value

A list containing the following components:

subject_id_train: A vector of subject IDs for the training set.
Z_train: A matrix of random predictors (time/intercept) for the training set.
X_train: A matrix of covariates for the training set.
Y_train: A vector of observed binary outcomes (0 or 1) for the training set.
subject_id_test: A vector of subject IDs for the testing set.
Z_test: A matrix of random predictors for the testing set.
X_test: A matrix of covariates for the testing set.
Y_test: A vector of true probabilities for the testing set. These represent the ground truth propensity scores (0 to 1) used for evaluation.
X_pop: A matrix of covariates for the entire population.
y_pop: A vector of true probabilities for the entire population.
I: A logical vector indicating which observations belong to the training set.
X_src: Duplicate of X_train, provided for convenience.
Y_src: Vector of true probabilities for the training set (unlike Y_train which is binary).

Examples

# Simulate data with logistic link (Logit) and mixture of normal random effects
sim_bin <- simulation_prediction_binary(
  train_prop = 0.7,
  n_subject = 500,
  residual = "logistic",
  randeff = "MVN_mixture",
  seed = 123
)

SBMTrees documentation built on Feb. 6, 2026, 5:08 p.m.

SBMTrees index

Package overview README.md SBMTrees: Introduction and Usage

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SBMTrees
Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_binary: Simulate Binary Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

Simulate Binary Longitudinal Data for Prediction

Description

Usage

Arguments

Details

Value

Examples

Related to simulation_prediction_binary in SBMTrees...

R Package Documentation

Browse R Packages

We want your feedback!

SBMTrees Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_binary: Simulate Binary Longitudinal Data for Prediction In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

Simulate Binary Longitudinal Data for Prediction

Description

Usage

Arguments

Details

Value

Examples

Related to simulation_prediction_binary in SBMTrees...

R Package Documentation

Browse R Packages

We want your feedback!

SBMTrees
Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_binary: Simulate Binary Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data