simulation_prediction: Simulate Longitudinal Data for Prediction

View source: R/simulation_prediction.R

simulation_predictionR Documentation

Simulate Longitudinal Data for Prediction

Description

Generates a fixed population longitudinal dataset, with random seeds to generate different training and testing sets. The function supports customization of linear/nonlinear associations, normal/non-normal random effects, and random errors. It splits the data into training and testing sets, with the testing set comprising approximately 40% of the data.

Usage

simulation_prediction(
  n_subject = 800,
  seed = NULL,
  nonlinear = FALSE,
  nonrandeff = FALSE,
  nonresidual = FALSE
)

Arguments

n_subject

Number of subjects in the dataset. Each subject has multiple observations across 6 follow-up time points. Default: 800.

seed

Random seed for reproducibility. Ensures different training-testing splits. Default: 123.

nonlinear

Logical value indicating whether the outcome model includes nonlinear associations. Default: FALSE.

nonrandeff

Logical value indicating whether the random effects are non-normal. Default: FALSE.

nonresidual

Logical value indicating whether the residuals are non-normal. Default: FALSE.

Details

The function creates a dataset with individuals observed at 6 follow-up time points. It allows users to specify whether the associations are linear or nonlinear and whether random effects and residuals follow normal or non-normal distributions. Approximately 40% of the data is randomly chosen to form the testing set, while the remaining 60% constitutes the training set.

Value

A list containing:

Y_test_true

True values of the vector of outcomes in the testing set.

X_train

Matrix of covariates in the training set.

Y_train

Vector of outcomes in the training set.

Z_train

Matrix of random predictors in the training set.

subject_id_train

Vector of subject IDs in the training set.

time_train

Vector of time point in the training set.

X_test

Matrix of covariates in the testing set.

Y_test

Vector of outcomes in the testing set.

Z_test

Matrix of random predictors in the testing set.

subject_id_test

Vector of subject IDs in the testing set.

time_test

Vector of time point in the testing set.

See Also

Mvnorm Chisquare ampute

Examples

  # Generate data with nonlinear associations and non-normal random effects and residuals
  data <- simulation_prediction(
    n_subject = 800,
    seed = 123,
    nonlinear = TRUE,
    nonrandeff = TRUE,
    nonresidual = TRUE
  )
  # Access training and testing data
  X_train <- data$X_train
  Y_train <- data$Y_train
  Z_train <- data$Z_train
  subject_id_train <- data$subject_id_train

  X_test <- data$X_test
  Y_test <- data$Y_test
  Z_test <- data$Z_test
  subject_id_test <- data$subject_id_test
  
  Y_test_true = data$Y_test_true


SBMTrees documentation built on April 3, 2025, 6:10 p.m.