simulation_imputation_LTFU: Simulate Longitudinal Data with Loss to Follow-up (LTFU) for...

View source: R/simulation_imputation.R

simulation_imputation_LTFUR Documentation

Simulate Longitudinal Data with Loss to Follow-up (LTFU) for Imputation

Description

Generates synthetic longitudinal data specifically designed to stress-test imputation methods against Loss to Follow-up (Dropout). While it includes intermittent missingness, the parameters are tuned to simulate scenarios where subjects permanently leave the study based on their characteristics at specific time points.

Usage

simulation_imputation_LTFU(
  NNY = TRUE,
  NNX = TRUE,
  n_subject = 1000,
  seed = NULL
)

Arguments

NNY

A logical value. If TRUE, the outcome Y is generated using non-normal distributions (Skew-t random effects, t-distribution residuals). If FALSE, it uses standard Normal distributions. Default: TRUE.

NNX

A logical value. If TRUE, the covariates X_7 through X_12 are generated using non-normal distributions (Mixture models, Skew-t random effects). If FALSE, they use standard Normal distributions. Default: TRUE.

n_subject

An integer specifying the number of subjects. Default: 1000.

seed

An optional integer for setting the random seed to ensure reproducibility. Default: NULL.

Details

The data generation process mirrors simulation_imputation regarding covariate structure (time-varying, non-linear, mixed types), but utilizes specific coefficients to drive the missingness mechanisms:

1. Loss to Follow-up (LTFU): Dropout is simulated based on the subject's state at time point 3. A logistic model determines the probability of dropout using:

  • The outcome Y at time 3.

  • Covariates X_1, X_2, and X_3 at time 3.

If a subject is selected for LTFU, all their observations for time points 4 and 5 are set to NA.

2. Intermittent Missingness: Variable-specific missingness is applied to X_7 through X_12 using logistic models that depend on the concurrent outcome Y, other covariates, and the previous value of the variable itself (autoregressive missingness).

Value

A list containing the following components:

data_E

A data frame of the complete data (ground truth) without any missing values.

data_M

A data frame of the incomplete data, containing NAs introduced by intermittent missingness and significant LTFU.

data_O

A duplicate of data_E used internally for generating missingness probabilities.

Z

A matrix of random predictors (intercept and time slopes) used in generation.

pair

A matrix summarizing the missing data pattern (generated via mice::md.pattern).

Examples

lt_data <- simulation_imputation_LTFU(NNY = TRUE, NNX = TRUE, n_subject = 10, seed = 42)

SBMTrees documentation built on Feb. 6, 2026, 5:08 p.m.