View source: R/simulation_prediction.R
| simulation_prediction_conti | R Documentation |
Generates synthetic longitudinal data with continuous outcomes, specifically designed for evaluating prediction models. The function creates a population of subjects with correlated covariates and outcomes, then splits them into training and testing sets. It offers flexible options for simulating non-normal random effects (e.g., skewed, mixtures, t-distributions) and residuals, as well as nonlinear relationships.
simulation_prediction_conti(
train_prop = 0.7,
n_subject = 1000,
n_obs_per_sub = 5,
seed = NULL,
nonlinear = FALSE,
residual = c("normal", "normal_mixture", "skewed_normal", "t3", "t2"),
randeff = c("MVN", "MVN_mixture", "skewed_MVN", "MVT3", "MVT2")
)
train_prop |
A numeric value between 0 and 1 indicating the proportion of the population to be used
for the training set. Default: |
n_subject |
An integer specifying the total number of subjects in the population. Default: |
n_obs_per_sub |
An integer specifying the number of observations per subject. Default: |
seed |
An optional integer for setting the random seed to ensure reproducibility. Default: |
nonlinear |
A logical value. If |
residual |
A character string specifying the distribution of the residual errors added to the training outcome. Options are:
|
randeff |
A character string specifying the distribution of the random effects. Options are:
|
The function first simulates correlated covariates X using a multivariate normal distribution,
adding subject-specific random variations. The outcome Y is then constructed based on X
(either linearly or nonlinearly) and combined with random effects Z * Bi drawn from the
specified randeff distribution.
The data is split into training and testing sets based on train_prop. Crucially, residual noise
(specified by residual) is added only to Y_train. The Y_test values represent
the conditional mean (Fixed + Random Effects) and serve as the ground truth for prediction tasks
aiming to recover the de-noised signal.
A list containing the following components:
A vector of subject IDs for the training set.
A matrix of random predictors (time/intercept) for the training set.
A matrix of covariates for the training set.
A vector of observed outcomes for the training set (Signal + Random Effects + Residual Error).
A vector of subject IDs for the testing set.
A matrix of random predictors for the testing set.
A matrix of covariates for the testing set.
A vector of "true" outcomes for the testing set (Signal + Random Effects), without residual error.
A matrix of covariates for the entire population.
A vector of "true" outcomes for the entire population (Signal + Random Effects).
A logical vector indicating which observations belong to the training set.
Duplicate of X_train, provided for convenience.
Duplicate of Y_train, provided for convenience.
sim_data <- simulation_prediction_conti(
train_prop = 0.7,
n_subject = 200,
n_obs_per_sub = 5,
nonlinear = TRUE,
residual = "normal",
randeff = "skewed_MVN",
seed = 123
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.