simulation_prediction_conti: Simulate Continuous Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_conti

R Documentation

Simulate Continuous Longitudinal Data for Prediction

Description

Generates synthetic longitudinal data with continuous outcomes, specifically designed for evaluating prediction models. The function creates a population of subjects with correlated covariates and outcomes, then splits them into training and testing sets. It offers flexible options for simulating non-normal random effects (e.g., skewed, mixtures, t-distributions) and residuals, as well as nonlinear relationships.

Usage

simulation_prediction_conti(
  train_prop = 0.7,
  n_subject = 1000,
  n_obs_per_sub = 5,
  seed = NULL,
  nonlinear = FALSE,
  residual = c("normal", "normal_mixture", "skewed_normal", "t3", "t2"),
  randeff = c("MVN", "MVN_mixture", "skewed_MVN", "MVT3", "MVT2")
)

Arguments

`train_prop`	A numeric value between 0 and 1 indicating the proportion of the population to be used for the training set. Default: `0.7`.
`n_subject`	An integer specifying the total number of subjects in the population. Default: `1000`.
`n_obs_per_sub`	An integer specifying the number of observations per subject. Default: `5`.
`seed`	An optional integer for setting the random seed to ensure reproducibility. Default: `NULL`.
`nonlinear`	A logical value. If `TRUE`, the outcome `Y` is generated using a complex nonlinear function of the covariates. If `FALSE`, `Y` is a linear combination of covariates. Default: `FALSE`.
`residual`	A character string specifying the distribution of the residual errors added to the training outcome. Options are: `"normal"`: Standard normal distribution. `"normal_mixture"`: Mixture of two normal distributions. `"skewed_normal"`: Skew-normal distribution. `"t3"`: Student's t-distribution with 3 degrees of freedom. `"t2"`: Student's t-distribution with 2 degrees of freedom.
`randeff`	A character string specifying the distribution of the random effects. Options are: `"MVN"`: Multivariate Normal distribution. `"MVN_mixture"`: Mixture of Multivariate Normal distributions. `"skewed_MVN"`: Multivariate Skew-normal distribution. `"MVT3"`: Multivariate t-distribution with 3 degrees of freedom. `"MVT2"`: Multivariate t-distribution with 2 degrees of freedom.

Details

The function first simulates correlated covariates X using a multivariate normal distribution, adding subject-specific random variations. The outcome Y is then constructed based on X (either linearly or nonlinearly) and combined with random effects Z * Bi drawn from the specified randeff distribution.

The data is split into training and testing sets based on train_prop. Crucially, residual noise (specified by residual) is added only to Y_train. The Y_test values represent the conditional mean (Fixed + Random Effects) and serve as the ground truth for prediction tasks aiming to recover the de-noised signal.

Value

A list containing the following components:

subject_id_train: A vector of subject IDs for the training set.
Z_train: A matrix of random predictors (time/intercept) for the training set.
X_train: A matrix of covariates for the training set.
Y_train: A vector of observed outcomes for the training set (Signal + Random Effects + Residual Error).
subject_id_test: A vector of subject IDs for the testing set.
Z_test: A matrix of random predictors for the testing set.
X_test: A matrix of covariates for the testing set.
Y_test: A vector of "true" outcomes for the testing set (Signal + Random Effects), without residual error.
X_pop: A matrix of covariates for the entire population.
y_pop: A vector of "true" outcomes for the entire population (Signal + Random Effects).
I: A logical vector indicating which observations belong to the training set.
X_src: Duplicate of X_train, provided for convenience.
Y_src: Duplicate of Y_train, provided for convenience.

Examples

sim_data <- simulation_prediction_conti(
  train_prop = 0.7,
  n_subject = 200,
  n_obs_per_sub = 5,
  nonlinear = TRUE,
  residual = "normal",
  randeff = "skewed_MVN",
  seed = 123
)

SBMTrees documentation built on Feb. 6, 2026, 5:08 p.m.

SBMTrees index

Package overview README.md SBMTrees: Introduction and Usage

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SBMTrees
Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_conti: Simulate Continuous Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

Simulate Continuous Longitudinal Data for Prediction

Description

Usage

Arguments

Details

Value

Examples

Related to simulation_prediction_conti in SBMTrees...

R Package Documentation

Browse R Packages

We want your feedback!

SBMTrees Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_conti: Simulate Continuous Longitudinal Data for Prediction In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

Simulate Continuous Longitudinal Data for Prediction

Description

Usage

Arguments

Details

Value

Examples

Related to simulation_prediction_conti in SBMTrees...

R Package Documentation

Browse R Packages

We want your feedback!

SBMTrees
Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data

simulation_prediction_conti: Simulate Continuous Longitudinal Data for Prediction
In SBMTrees: Longitudinal Sequential Imputation and Prediction with Bayesian Trees Mixed-Effects Models for Longitudinal Data