Recurrent events are common in longitudinal studies. They are often neglected in typical time-to-event analyses, which only take the first occurrence of an event into account. In this scenario, any event that occurs after the first event is disregarded. Sometimes, the first event makes a subsequent event more likely. The second event may be of the same type as the first (such as serial myocardial infarctions [MI]), or of a different type (such as heart failure [HF]) that is associated with the first. Multi-type recurrent events (MTREs) arise naturally in situations where more than one event type is of interest, and may possibly recur; a possible MTRE scenario would be an individual who experiences 2 back-to-back MIs, followed by HF, and subsequently dies. Death in this case would be a terminating event, after which no further events can occur.
Existing packages in R that handle recurrent events include the survival package
that uses the coxph() function to fit Andersen-Gill (AG) models with robust
variances. The coxph() function can also be utilized to fit multiple types of
events, possibly recurrent, by fitting separate AG models for each event type. Frailty models provide an alternative approach for modeling recurrent events. The
frailtypack package can fit joint MTRE models with time-dependent covariates, including up to 2 terminating event types; the possible dependence structure between multiple event types that are
attributed to the unobserved random effects is modeled with one or more
parameters. Usually, the parameter estimate of the dependence structure is
interpreted as respective pairwise correlation measures between 2 or more event
types. The msm package fits multi-state models, which are often used to
model disease states; a typical scenario is a chronic disease with staged
progression, illustrated by the 3-state illness-death model (comprising health, illness, and death states, respectively). Multi-state models can be
adapted to model multiple types of events, including recurrent events and a
terminating event as an absorbing state. If we wish to model MIs and HFs, both
of which can recur, there are many distinct states that can arise from the
simple case of a participant experiencing 2 MIs and 2 HFs. Such a participant
could experience MI → MI → HF → HF; MI → HF → MI →
HF; etc. If we make the reasonable assumption that the transition from the first
MI to the second MI (with no events in between) is different from the transition
between the first MI to the first HF, and so on, then MSM models can rapidly
grow in complexity given the many possible permutations of successive states.
multiRec and common scenarios for its useWe wanted to develop an MTRE model that readily handles the aforementioned scenario of related events, one or more of which can recur, and one or more of which can be terminating. We wanted the model to be flexible and intuitive, capable of handling a wide variety of common scenarios that include multiple events, where the goal is to explicitly model the risk conferred by one event to one or more subsequent events. This has practical uses. For example, it is well known that stroke increases the risk of HF, and vice versa; it is also known that atrial fibrillation (AF) increases the risk of both stroke and HF. Suppose we wish to characterize the bidirectional relationship between stroke and HF. We would want to estimate the risk that a stroke imparts to a future HF, and how much of that risk of HF that arises from the stroke is due to AF, and likewise estimate how much of the risk conferred by HF on future stroke is due to AF. This is of clinical importance to the physician who treats a patient with AF, as good management of their AF could prevent both future stroke and HF. Moreover, the risk of HF that arises from a prior stroke may differ between men and women, or between older vs younger age groups, or between individuals who have diabetes vs those who do not. There are of course many other clinical and demographic variables that may capture subgroups of individuals that experience a disease course differently, including response to therapeutic management. Diabetes itself is known to have a bidirectional relationship with HF; these complex relationships are readily captured by a model that explicitly models the risk of multiple event types that arise from prior events of one or more types, and how that risk is modified by one or more subject characteristics (such as demographics, one or more disease phenotypes, or clinical indices such as biomarkers). The most common diseases are often chronic, with a natural history of complications (including other disease processes). Diabetes is once again a typical example; it is increasing in both prevalence and incidence worldwide and rapidly becoming a public health crisis. Diabetes greatly increases the risk of cardiovascular disease (CVD), chronic kidney disease (CKD), and even certain types of cancer, as well as dementia. A comprehensive risk analysis guides optimal therapy and management of these individuals, and can be used to develop risk scores and refine existing ones for CVD, CKD, and other sequelae.
Another important application of our MTRE model is its ability to address informative censoring, a common challenge in longitudinal studies when participants drop out for reasons that may be related to study factors. By explicitly modeling study dropout as an additional event type, the MTRE framework can characterize the risk factors associated with dropout—such as participant characteristics or the occurrence of other event types that may be influenced by the study intervention (or lack thereof). Treating study dropout as an event also improves the estimation of parameters for other event types, as it mitigates the bias introduced when potentially informative dropout is instead treated as noninformative censoring.
In our previous work [1], we proposed a joint MTRE model and
applied the model to a large cardiovascular clinical trial with 3 types of
recurrent events: MI, stroke, and HF; and a terminating event, death. We
developed an R package called multiRec to fit our MTRE models. The technical
details of our model including the formulation, construction of the full
likelihood, parameter estimation, variance correction via robust standard
errors, model selection and diagnostics, and relative efficiency under common
misspecifications are described in our main manuscript and its appendices (Web
Appendix 1 and 2, respectively). In the following sections, we describe how to
use multiRec to fit MTRE models for common clinical scenarios.
multiRecTo illustrate our MTRE model, we include 2 datasets within the multiRec package,
one with two types of events (multiRecCVD2) and one with four types of events
(multiRecCVD4). Both multiRecCVD2 and multiRecCVD4 were adapted from the nafld1 and nafld3 datasets in the survival package. The nafld1 and nafld3 datasets contain longitudinal information on participants with nonalcoholic fatty liver disease (NAFLD), including multiple recurrent events. We processed and restructured these data using the tmerge() function from survival to construct interval-based datasets (multiRecCVD2 and multiRecCVD4, respectively) suitable for MTRE modeling.
library(multiRec) data(multiRecCVD2) data(multiRecCVD4) head(multiRecCVD2) head(multiRecCVD4)
Any dataset for fitting models in multiRec needs to have the following
structure. The id variable denotes the participant, who can have one or more rows;
each row represents a time interval, bookended by the tstart and tstop
variables. The tstart variable represents the beginning of the time interval,
and the tstop variable represents the end of the time interval at which an event
may occur (only 1 event allowed per time interval), or no event occurs
(considered to be noninformative censoring). Therefore, a minimum of 4 columns
are required for the dataset:
id, which links the rows to the participant;tstart, which marks the start of the time interval; tstop, which marks the end of the time interval; and event, which denotes what type of event occurred (if any). If an event type
is terminating (e.g. death), the row that contains it should be the last row for
that participant, since no further events may occur in that
participant after that interval ends.This type of multi-interval dataset is common in survival analysis and can be
constructed using the tmerge() function in the survival package. The parent
dataset must include event types and corresponding event times for each
participant. The survival package contains instructions for how to use tmerge().
Note that the event is assumed to occur at the end of the interval. Moreover, the names id, tstart, tstop and event are the default ones
and can be overridden by using the idVar=, startTimeVar=, stopTimeVar=,
and eventVar= arguments in multiRec().
Beyond the 4 required columns, additional covariates that may represent
participant characteristics (e.g. demographic variables, clinical indices,
metabolic data, biomarkers, etc.) can be included as columns. For each such
column/variable, the value at each row represents the value at the beginning of
the time interval (denoted by tstart). So for example for participant 1, who is
a male, in the multiRecCVD2 dataset above, the second row starts at time 0.5364
(in years), and so the BMI of 36.4 represents his BMI at time 0.5364. It is
assumed that his BMI remains at 36.4 until the end of that interval, which
occurs at time 1.2405. The beginning of the very next interval (in row 3) picks
up right at that point (time 1.2405), and his BMI is then assumed to be 34.7.
His BMI remains at 34.7 until time 1.59137, and so on.
We will now fit several models to illustrate how to fit MTRE models using
multiRec(). We start with a simple model:
fit = multiRec(afib ~ age + male, # The model for the AF hazard hf ~ age + male + bmi, # The model for the HF hazard data = multiRecCVD2 # The dataset used )
In Model 1, we utilize the dataset multiRecCVD2 to jointly model 2 event types:
AF (denoted by variable name afib) and HF (denoted by variable name hf),
with specific covariates for each event type. Of note, multiRec() requires a model formula
for the hazard of each event type in the dataset. The event name must appear
before the ~, spelled exactly as it appears in the dataset. The terms to the
right of the ~ specify the linear model for the hazard, in the usual R
notation. In the example above, the hazard for AF depends on age and male
and the hazard for HF depends on age, male, and bmi.
The idVar argument specifies the name of the variable that contains the
participant ID (unique to each participant, and which links one or more rows of
observations to that participant).
The multiRec() function accepts several additional parameters, including
link=, which selects the link function by which we model the hazards; the
default is the log link which yields multiplicative hazards.robust= which requests robust standard errors if set to TRUE, and naive
standard errors if set to FALSE. In practice, robust=TRUE is preferred.na.action=, which determines how to handle missing values; the default is
na.omit, which removes missing values.method=, which specifies the optimization method (e.g. BFGS, CG, SANN, or
Nelder-Mead) to be used; the default is BFGS. Please consult the optim()
function for details.SANN.init=, which sets the number of simulated annealing iterations that will
be used to initially refine the starting values of the parameters to be
estimated, after which the optimization method specified in the method=
argument will be called. This may help in cases where multiRec() has trouble
converging.fitDetails=, which includes additional information in the fit object when TRUE.
This is required for model fit diagnostics.To examine the Model 1 results, we simply type fit at the R console to return the fit object:
fit
Keep in mind that the default link is the log link, so all parameter estimates in this model are for the log of the hazard. For AF, we interpret the parameter estimates as follows. The intrinsic risk of AF is the absolute risk that arises from subject characteristics, in this case, age and sex, respectively. Thus, a female of age 65 years would have an absolute risk of AF calculated to be: $\exp(-6.5746+0.0552\times 65+1.0015\times 0)=0.05$; therefore, she would have a 5% chance of experiencing AF in the next year. If the individual was male, and also 65 years of age, his absolute risk of AF would then be: $\exp(-6.5746+0.0552\times 65+1.0015\times 1)=0.137$, and his risk would be $\exp(1.0015)=2.72$ times the risk of his female counterpart, so a relative risk (RR) of 2.72 for males vs females in this cohort in experiencing AF, if all other covariates are held constant. In the same manner, the intrinsic risk of HF for a 65-year-old male with a BMI of 30 would be: $\exp(-1.9163+0.0041\times 65+0.2425\times 1+30\times (-0.0139))=0.16$ in this cohort. Since BMI is not significantly associated with HF in this model, it would be reasonable to refit the model without BMI and estimate the parameters of the more parsimonious model.
In this example, AF and HF can both recur multiple times for the same person. More generally, the model can include terminating events such as death or study dropout, after which no further events are possible.
We now fit a more complex model using the dataset multiRecCVD4, which contains 4 event types: stroke, HF, AF, and death. Among these, stroke, HF, and AF may each recur within a participant, whereas death — being a terminating event — can only occur once. Since this dataset contains 4 event types in total, the first 4 arguments of the multiRec() function must
consist of 4 formulas, one for each of these event types. Note that for the
multiRec() function, all event types specified in the event column of the
dataset must be included in the model as events to be modeled jointly.
fit = multiRec(stroke ~ nPrior.hf(), afib ~ age, hf ~ nPrior.stroke(), death ~ age + nPrior.stroke() + nPrior.afib() + nPrior.hf(), data = multiRecCVD4, robust = TRUE, SANN.init = 50 )
In the model, the nPrior.() functions indicate that the hazard depends on the
number of prior events of the type specified after the period. For example, the
model assumes that the risk of stroke at any time depends on the number of HFs the participant experiences in the past. As another example, the risk of death at any time depends on age and the numbers of prior strokes, AFs and HFs.
To examine the Model 2A results, we again type fit at the R console to return
the fit object:
fit
We begin our interpretation of this model with stroke. The intrinsic risk of stroke, which has no covariates, comes from the intercept term, which is exponentiated (since we are using the log link) to yield $\exp(-3.6222)=0.0267$. Note that this result can also be obtained directly from the 1st row of the 4th column, as the 4th column exponentiates the parameter estimates from the 3rd column. Thus, the absolute risk for stroke in the next year for all participants who have not experienced HF is 0.0267. For each prior HF event, the absolute risk of future stroke increases multiplicatively by 1.5127 (p<0.001). Thus, the risk of stroke for a participant who had two prior HFs is $\exp(-3.6222 + 1.5127\times 2)=0.55$.
On the other hand, the intrinsic risk of HF, which likewise has no covariates in this model, also arises from the intercept term to yield 0.0324. For each prior stroke event, the absolute risk of future HF increases multiplicatively by 1.7127 (p<0.0001). We can see from our model results thus far that prior stroke increases the risk of HF, and HF in turn increases the risk of future stroke.
The interpretation of the absolute risk of AF is straightforward and can be calculated in the same manner as described for Model 1. We now turn to death, which is the terminating event in this model. Death here has both an intrinsic covariate (age) and all 3 prior event types (stroke, AF, and HF). We can see that age significantly increases the risk of death as we expect (relative risk of death is 1.01 for every 1-year increase in age) (p<0.05). The risk of death increases the most from prior HF (relative risk of death is 2.50 for every HF event [p<0.0001]), followed by AF (relative risk of death is 1.80 for every AF event [p<0.0001]), and finally stroke (relative risk of death is 1.53 for every stroke event [p<0.0001]).
Let us now examine the potential role of AF in the association between stroke
and HF. We again utilize the dataset multiRecCVD4 and fit the 4 event types
(stroke, AF, HF, and death); this time, we include prior AF event as a covariate
of stroke and HF, respectively. Of note, for more complex models, the optional method.seed argument can be used to make the stochastic optimization step (simulated annealing) reproducible.
fit = multiRec(stroke ~ nPrior.hf() + nPrior.afib(), afib ~ nPrior.hf() + nPrior.stroke(), hf ~ nPrior.stroke() + nPrior.afib(), death ~ age + nPrior.stroke() + nPrior.afib() + nPrior.hf(~ male, alpha=NA), data = multiRecCVD4, robust = TRUE, SANN.init = 50, method.seed = 1)
Displaying the Model 2B results:
fit
We can see that AF indeed increases the risk of both stroke and HF, respectively; the relative risk of stroke arising from prior AF is 1.36 (p=0.04) and the relative risk of HF from prior AF is 2.26 (p<0.001). Notably, the risk that stroke and HF confer on each other is lower when AF is accounted for in the model: in Model 2B, the relative risk of stroke from HF is 1.44 when AF is accounted for (vs in Model 2A, where the relative risk of stroke from HF is 1.51); and the relative risk of HF from stroke is 1.51 when AF is accounted (vs in Model 2A, where the relative risk of HF from stroke is 1.71). This suggests that AF may be a potential mediator and partially explains the association between stroke and HF. This type of analysis is particularly useful when investigating potential mediation pathways between two or more disease states or occurrences.
We now briefly describe the alpha parameter that we estimate for illustrative
purposes in Model 2B for death as a terminating event. Here, alpha is a power
function that describes whether the relative risk of death that arises from
prior HF stays the same, or increases, or decreases with each additional prior
HF event that has occurred. For example, if alpha is estimated to be 1 (i.e. 1
lies within the 95% confidence interval), then we interpret that to mean that
the power function for HF is linear, so that each HF increases the relative risk
of death by, in this case, 1.94 (for females; for males, the relative risk
arising from HF is higher compared to females, but the attendant p-value is not
significant). So, 2 prior HF events would increase the relative risk of death by
$1.94^2=3.76$. If the alpha is estimated to be <1, then the relative risk of
death from prior HFs diminishes with every successive HF (as an example: the
relative risk of death from 2 prior HFs would be less than $1.94^2=3.76$). If the alpha
is estimated to be >1, then the relative risk of death from prior HFs increases
with each successive HF. The linear power function (alpha=1) provides a
reasonable estimate for relative risk arising from prior events in many or most
cases, in which case alpha can be assumed to be 1 and does not need to be
estimated. Of note, when estimating the alpha parameter for prior events on a
future event, it is important to verify that there are multiple prior events of
that type in multiple participants so that there is enough data for the effect
of those prior events to be properly estimated.
The AIC and BIC provide straightforward, relative comparisons of model fit
between any 2 or more models, and is particularly helpful when comparing
non-nested models (for example, models with different link functions). For
nested models, the likelihood ratio test can be used. The AIC and BIC are
automatically provided in every model fit and is returned as part of the output
for the fit object. In our preceding examples using multiRecCVD4, the AIC and
BIC were higher for Model 2A than for Model 2B, which suggests that Model 2B is
a better fit compared to Model 2A. The resid() function returns martingale
residuals (the default) and if desired, score residuals can be obtained using
the score option. Residuals can be summarized in various ways, including
plots, to investigate outliers. Goodness of fit for a selected model can be
assessed by comparing the observed number of events of each type to the expected
number of events of each type and calculating the chi-squared test statistic for
observed vs expected events. The technical details of a goodness of fit
calculation for an MTRE model are provided in our companion manuscript and its
supplemental material (Web Appendix 3).
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.