simrec: An R-Package for Simulation of Recurrent Event Data

library(knitr)
opts_chunk$set(
  fig.align = 'center', 
  fig.show = 'asis', 
  eval = TRUE,
  fig.width = 6,
  fig.height = 6,
  message = FALSE,
  size = 'small',
  comment = '##',
  prompt = FALSE,
  echo = TRUE, # set to true for the vignette!
  results = 'hold',
  tidy = FALSE)

options(replace.assign = TRUE,
        width = 90
        # prompt="R> "
        )

Abstract {-}

simrec allows simulation of recurrent event data following the multiplicative intensity model described in [@Andersen1982] with the baseline hazard being a function of the total/calendar time. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated. Furthermore, via simreccomp simulation of data following a multistate model with recurrent event data of one type and a competing event is possible. simrecint gives the possibility to additionally simulate a recruitment time for each individual and cut the data to an interim data set. With simrecPlot and simreccompPlot the data can be plotted.

Keywords: recurrent event data, competing event, frailty, simulation, total-time model

Introduction

The simrec package includes the functions simrec, simreccomp and simrecint and allows simulation of recurrent event data. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated. Via simreccomp time-to-event data that follow a multistate model with recurrent event data of one type and a competing event can be simulated. Data output is in the counting-process format. simrecint gives the possibility to additionally simulate a recruitment time for each individual and cut the data to an interim data set. With simrecPlot and simreccompPlot the data can be plotted.

The simrec function

Description {-}

The function simrec allows simulation of recurrent event data following the multiplicative intensity model described in Andersen and Gill [@Andersen1982] with the baseline hazard being a function of the total/calendar time. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated. Data for individual $i$ are generated according to the intensity process $$Y_i(t)\cdot \lambda_0(t)\cdot Z_i \cdot \exp(\beta^t X_i)$$ where $X_i$ defines the covariate vector, and $\beta$ the regression coefficient vector. $\lambda_0(t)$ denotes the baseline hazard, being a function of the total/calendar time $t$, and $Y_i(t)$ the predictable process that equals one as long as individual $i$ is under observation and at risk for experiencing events. $Z_i$ denotes the frailty variable with $(Z_i)_i$ iid with $E(Z_i)=1$ and $Var(Z_i)=\theta$. The parameter $\theta$ describes the degree of between-subject-heterogeneity. Data output is in the counting process format.

simrec(N, fu.min, fu.max, cens.prob = 0, dist.x = "binomial", par.x = 0,
  beta.x = 0, dist.z = "gamma", par.z = 0, dist.rec, par.rec, pfree = 0,
  dfree = 0)

Parameters {-}

Output {-}

The output is a data.frame consisting of the columns:

For each individual there are as many lines as it experiences events, plus one line if being censored. The data format corresponds to the counting process format.

Details {-}

Data are simulated by extending the methods proposed by Bender [@Bender2005] to the multiplicative intensity model. You can read more on this in our work [@Jahn-Eimermacher2015].

Example {-}

library(simrec)
### Example:
### A sample of 10 individuals

N <- 10

### with a binomially distributed covariate with a regression coefficient
### of beta=0.3, and a standard normally distributed covariate with a
### regression coefficient of beta=0.2,

dist.x <- c("binomial", "normal")
par.x <- list(0.5, c(0, 1))
beta.x <- c(0.3, 0.2)

### a gamma distributed frailty variable with variance 0.25

dist.z <- "gamma"
par.z <- 0.25

### and a Weibull-shaped baseline hazard with shape parameter lambda=1
### and scale parameter nu=2.

dist.rec <- "weibull"
par.rec <- c(1,2)

### Subjects are to be followed for two years with 20\% of the subjects
### being censored according to a uniformly distributed censoring time
### within [0,2] (in years).

fu.min <- 2
fu.max <- 2
cens.prob <- 0.2

### After each event a subject is not at risk for experiencing further events
### for a period of 30 days with a probability of 50\%.

dfree <- 30/365
pfree <- 0.5

simdata <- simrec(N, fu.min, fu.max, cens.prob, dist.x, par.x, beta.x,
                  dist.z, par.z, dist.rec, par.rec, pfree, dfree)
print(simdata[1:10,])
DT::datatable(simdata)

The simreccomp function

Description {-}

The function simreccomp allows simulation of time-to-event-data that follow a multistate-model with recurrent events of one type and a competing event. The baseline hazard for the cause-specific hazards are here functions of the total/calendar time. To induce between-subject-heterogeneity a random effect covariate (frailty term) can be incorporated for the recurrent and the competing event.

Data for the recurrent events of the individual $i$ are generated according to the cause-specific hazards $$\lambda_{0r}(t)\cdot Z_{ri} \cdot \exp(\beta_r^t X_i)$$ where $X_i$ defines the covariate vector and $\beta_r$ the regression coefficient vector. $\lambda_{0r}(t)$ denotes the baseline hazard, being a function of the total/calendar time $t$ and $Z_{ri}$ denotes the frailty variables with $(Z_{ri})i$ iid with $E(Z{ri})=1$ and $Var(Z_{ri})=\theta_r$. The parameter $\theta_r$ describes the degree of between-subject-heterogeneity for the recurrent event. Analougously the competing event is generated according to the cause-specific hazard conditionally on the frailty variable and covariates: $$\lambda_{0c}(t)\cdot Z_{ci} \cdot \exp(\beta_c^t X_i)$$ Data output is in the counting process format.

simreccomp(N, fu.min, fu.max, cens.prob = 0, dist.x = "binomial", par.x = 0,
           beta.xr = 0, beta.xc = 0, dist.zr = "gamma", par.zr = 0, a = NULL,
           dist.zc = NULL, par.zc = NULL, dist.rec, par.rec,
           dist.comp, par.comp, pfree = 0, dfree = 0)

Parameters {-}

Output {-}

The output is a data.frame consisting of the columns:

For each individual there are as many lines as it experiences events, plus one line if being censored. The data format corresponds to the counting process format.

Example {-}

library(simrec)
### Example:
### A sample of 10 individuals

N <- 10

### with a binomially distributed covariate and a standard normally distributed
### covariate with regression coefficients of beta.xr=0.3 and beta.xr=0.2,
### respectively, for the recurrent events,
### as well as regression coefficients of beta.xc=0.5 and beta.xc=0.25,
### respectively, for the competing event.

dist.x  <- c("binomial", "normal")
par.x <- list(0.5, c(0, 1))
beta.xr <- c(0.3, 0.2)
beta.xc <- c(0.5, 0.25)

### a gamma distributed frailty variable for the recurrent event with
### variance 0.25 and for the competing event with variance 0.3,

dist.zr <- "gamma"
par.zr <- 0.25

dist.zc <- "gamma"
par.zc <- 0.3

### alternatively the frailty variable for the competing event can be computed
### via a:
a <- 0.5

### Furthermore a Weibull-shaped baseline hazard for the recurrent event with
### shape parameter lambda=1 and scale parameter nu=2,

dist.rec <- "weibull"
par.rec <- c(1, 2)

### and a Weibull-shaped baseline hazard for the competing event with
### shape parameter lambda=1 and scale parameter nu=2

dist.comp <- "weibull"
par.comp <- c(1, 2)

### Subjects are to be followed for two years with 20% of the subjects
### being censored according to a uniformly distributed censoring time
### within [0,2] (in years).

fu.min <- 2
fu.max <- 2
cens.prob <- 0.2

### After each event a subject is not at risk for experiencing further events
### for a period of 30 days with a probability of 50%.

dfree <- 30/365
pfree <- 0.5

simdata1 <- simreccomp(N = N, fu.min = fu.min, fu.max = fu.max, cens.prob = cens.prob,
                       dist.x = dist.x, par.x = par.x, beta.xr = beta.xr,
                       beta.xc = beta.xc, dist.zr = dist.zr, par.zr = par.zr, a = a,
                       dist.rec = dist.rec, par.rec = par.rec, dist.comp = dist.comp,
                       par.comp = par.comp, pfree = pfree, dfree = dfree)
simdata2 <- simreccomp(N = N, fu.min = fu.min, fu.max = fu.max, cens.prob = cens.prob,
                       dist.x = dist.x, par.x = par.x, beta.xr = beta.xr,
                       beta.xc = beta.xc, dist.zr = dist.zr, par.zr = par.zr,
                       dist.zc = dist.zc, par.zc = par.zc, dist.rec = dist.rec,
                       par.rec = par.rec, dist.comp = dist.comp,
                       par.comp = par.comp, pfree = pfree, dfree = dfree)

print(simdata1[1:10, ])
print(simdata2[1:10, ])
DT::datatable(simdata1)
DT::datatable(simdata2)

The simrecint function

Description {-}

With this function previously simulated data (for example simulated by the use of simrec or simreccomp) can be cut to an interim data set. The simulated data must be in patient time (i.e. time since the patient entered the study, for an example see below), and must be in the counting process format. Furthermore, the dataset must have the variables id, start,stop, and status, like data simulated by the use of simrec or simreccomp. Then for every individual additionally a recruitment time is generated in study time (i.e. time since start of the study, for an example see below), which is uniformly distributed on $[0, t_R]$. The timing of the interim analysis $t_I$ is set in study time and data are being cut to all data, that are available at the interim analysis. If you only wish to simulate a recruitment time, $t_I$ can be set to $t_R + fu.max$ or something else beyond the end of the study.

Simulated data in patient time:

\begin{tikzpicture}
\draw[->] (0,0) -- (0,4.5) node [anchor=east,text width=4.1em] {individual};
\draw[->] (0,0) -- (7,0) node [below=2pt] {patient time};
\draw[thick] (0,1) -- (4,1) node[circle,fill=black,inner sep=1.5pt]{}  node at (1,1) [fill=black, inner sep=1.5pt]{}  node at (1.8,1) [fill=black, inner sep=1.5pt]{}  node at (3.2,1) [fill=black, inner sep=1.5pt]{}  node at (0,1)[left=2pt]{1} ;
\draw[thick] (0,2) -- (4,2) node[circle,fill=black,inner sep=1.5pt]{} node at (1,2) [fill=black, inner sep=1.5pt]{} node at (2,2) [fill=black, inner sep=1.5pt]{} node at (3.3,2) [fill=black, inner sep=1.5pt]{} node at (0,2)[left=2pt]{2} ;
\draw[thick] (0,3) -- (4,3) node[circle,fill=black,inner sep=1.5pt]{} node at (2,3) [fill=black, inner sep=1.5pt]{} node at (0,3)[left=2pt]{3} ;
\draw[thick] (0,4) -- (4,4) node[circle,fill=black,inner sep=1.5pt]{} node at (0.9,4) [fill=black, inner sep=1.5pt]{} node at (1.4,4) [fill=black, inner sep=1.5pt]{} node at (0,4)[left=2pt]{4} ;
\end{tikzpicture}
knitr::include_graphics("tikzout_1.png")

Simulated data in study time with time of interim analysis ($t_I$), end of recruitment period ($t_R$) and end of study

\begin{tikzpicture}
\draw[->] (0,0) -- (0,4.5) node [anchor=east,text width=4.1em] {individual};
\draw[->] (0,0) -- (7,0) node [below=2pt] {study time};
\draw[thick] (0,0.8) -- (4,0.8) node[circle,fill=black,inner sep=1.5pt]{}  node at (1,0.8) [fill=black, inner sep=1.5pt]{}  node at (1.6,0.8) [fill=black, inner sep=1.5pt]{}  node at (3.2,0.8) [fill=black, inner sep=1.5pt]{}  node at (0,0.8)[left=2pt]{1} ;
\draw[thick] (0.5,1.6) -- (4.5,1.6) node[circle,fill=black,inner sep=1.5pt]{} node at (1.5,1.6) [fill=black, inner sep=1.5pt]{} node at (2.5,1.6) [fill=black, inner sep=1.5pt]{} node at (3.8,1.6) [fill=black, inner sep=1.5pt]{} node at (0,1.6)[left=2pt]{2} ;
\draw[thick] (1,2.4) -- (5,2.4) node[circle,fill=black,inner sep=1.5pt]{} node at (3.2,2.4) [fill=black, inner sep=1.5pt]{} node at (0,2.4)[left=2pt]{3} ;
\draw[thick] (1.5,3.2) -- (5.5,3.2) node[circle,fill=black,inner sep=1.5pt]{} node at (2.4,3.2) [fill=black, inner sep=1.5pt]{} node at (2.9,3.2) [fill=black, inner sep=1.5pt]{} node at (0,3.2)[left=2pt]{4} ;
\draw[thick] (2,4) -- (6,4) node[circle,fill=black,inner sep=1.5pt]{} node at (2.7,4) [fill=black, inner sep=1.5pt]{} node at (4,4.5)[above]{event} node at (5.4,4) [fill=black, inner sep=1.5pt]{} node at (0,4)[left=2pt]{5} ;
\draw[->] (3.6,4.5) -- (2.9,4.2);
\draw[->] (4.4,4.5) -- (5.2,4.2);
\draw[thick, color=green] (2,-0.1) -- (2, 4.3) node at (2,-0.6)[below]{end of recruitment ($t_R$)};
\draw[thick, color=blue] (1.7,-0.1) -- (1.7, 4.3) node at (1.7,-0.1)[below]{interim analysis ($t_I$)};
\draw[thick, color=red] (6,-0.1) -- (6, 4.3) node at (6, -0.6)[below]{end of study};
\end{tikzpicture}
knitr::include_graphics("tikzout_2.png")
simrecint(data, N, tR, tI)

Parameters {-}

Output {-}

The output is a data.frame consisting of the columns, that were put into, and additionally the following columns:

Individuals that are not already recruited at the interim analysis are left out here.

Example {-}

### Example - see example for simrec
library(simrec)
N         <- 10
dist.x    <- c("binomial", "normal")
par.x     <- list(0.5, c(0,1))
beta.x    <- c(0.3, 0.2)
dist.z    <- "gamma"
par.z     <- 0.25
dist.rec  <- "weibull"
par.rec   <- c(1,2)
fu.min    <- 2
fu.max    <- 2
cens.prob <- 0.2

simdata <- simrec(N, fu.min, fu.max, cens.prob, dist.x, par.x, beta.x, dist.z,
                  par.z, dist.rec, par.rec)

### Now simulate for each patient a recruitment time in [0,tR=2]
### and cut data to the time of the interim analysis at tI=1:

simdataint <- simrecint(simdata, N = N, tR = 2, tI = 1)
print(simdataint)  # only run for small N!
DT::datatable(simdataint)

Plotting the Data

Description {-}

The functions simrecPlot and simreccompPlot allow plotting of recurrent event data, possibly with a competing event.

simrecPlot(simdata, id = "id", start = "start", stop = "stop", status = "status")
simreccompPlot(simdata, id = "id", start = "start", stop = "stop", status = "status")

Parameters {-}

Output {-}

The output is a plot of the data with a bullet ($\bullet$) indicating a recurrent event, a circle ($\circ$) indicating censoring and $\times$ indicating the competing event.

Example {-}

simrecPlot(simdata)
simreccompPlot(simdata1)

References {-}



Try the simrec package in your browser

Any scripts or data that you put into this service are public.

simrec documentation built on Nov. 20, 2020, 5:08 p.m.