View source: R/simulate_prcmlpmm_data.R
simulate_prcmlpmm_data | R Documentation |
This function allows to simulate a survival outcome from longitudinal predictors following the PRC MLPMM model presented in Signorelli et al. (2021). Specifically, the longitudinal predictors are simulated from multivariate latent process mixed models (MLPMMs), and the survival outcome from a Weibull model where the time to event depends on the random effects from the MLPMMs.
simulate_prcmlpmm_data(n = 100, p = 5, p.relev = 2, n.items = c(3, 2,
3, 4, 1), type = "u", t.values = c(0, 0.5, 1, 2),
landmark = max(t.values), seed = 1, lambda = 0.2, nu = 2,
cens.range = c(landmark, 10), base.age.range = c(3, 5), tau.age = 0.2)
n |
sample size |
p |
number of longitudinal latent processes |
p.relev |
number of latent processes that are associated with the survival outcome (min: 1, max: p) |
n.items |
number of items that are observed for each
latent process of interest. It must be either a scalar, or
a vector of length |
type |
the type of relation between the longitudinal outcomes and survival time. Two values can be used: 'u' refers to the PRC-MLPMM(U) model, and 'u+b' to the PRC-MLPMM(U+B) model presented in Section 2.3 of Signorelli et al. (2021). See the article for the mathematical details |
t.values |
vector specifying the time points
at which longitudinal measurements are collected
(NB: for simplicity, this function assumes a balanced
designed; however, |
landmark |
the landmark time up until which all individuals survived.
Default is equal to |
seed |
random seed (defaults to 1) |
lambda |
Weibull location parameter, positive |
nu |
Weibull scale parameter, positive |
cens.range |
range for censoring times. By default, the minimum
of this range is equal to the |
base.age.range |
range for age at baseline (set it equal to c(0, 0) if you want all subjects to enter the study at the same age) |
tau.age |
the coefficient that multiplies baseline age in the linear predictor (like in formulas (7) and (8) from Signorelli et al. (2021)) |
A list containing the following elements:
a dataframe long.data
with data on the longitudinal
predictors, comprehensive of a subject id (id
),
baseline age (base.age
), time from baseline
(t.from.base
) and the longitudinal biomarkers;
a dataframe surv.data
with the survival data:
a subject id (id
), baseline age (baseline.age
),
the time to event outcome (time
) and a binary vector
(event
) that is 1 if the event
is observed, and 0 in case of right-censoring;
perc.cens
the proportion of censored individuals
in the simulated dataset.
Mirko Signorelli
Signorelli, M. (2024). pencal: an R Package for the Dynamic Prediction of Survival with Many Longitudinal Predictors. To appear in: The R Journal. Preprint: arXiv:2309.15600
Signorelli, M., Spitali, P., Al-Khalili Szigyarto, C, The MARK-MD Consortium, Tsonaka, R. (2021). Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Statistics in Medicine, 40 (27), 6178-6196. DOI: 10.1002/sim.9178
# generate example data
simdata = simulate_prcmlpmm_data(n = 40, p = 6,
p.relev = 3, n.items = c(3,4,2,5,4,2),
type = 'u+b', t.values = c(0, 0.5, 1, 2),
landmark = 2, seed = 19931101)
# names of the longitudinal outcomes:
names(simdata$long.data)
# markerx_y is the y-th item for latent process (LP) x
# we have 6 latent processes of interest, and for LP1
# we measure 3 items, for LP2 4, for LP3 2 items, and so on
# visualize trajectories of marker1_1
if(requireNamespace("ptmixed")) {
ptmixed::make.spaghetti(x = age, y = marker1_1,
id = id, group = id,
data = simdata$long.data,
legend.inset = - 1)
}
# proportion of censored subjects
simdata$censoring.prop
# visualize KM estimate of survival
library(survival)
surv.obj = Surv(time = simdata$surv.data$time,
event = simdata$surv.data$event)
kaplan <- survfit(surv.obj ~ 1,
type="kaplan-meier")
plot(kaplan)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.