In stc04003/reReg: Recurrent Event Regression

knitr::opts_chunk$set(prompt = TRUE, comment = "")

In this vignette, we demonstrate how to use the simGSC() function in reReg package to simulate recurrent event data from a general scale-change model. Since the scale-change model includes the Cox-type model and the accelerated mean model as special cases, simGSC() can also be used to generate data from these sub-models. The simGSC() function allows the censoring time to be non-informative (independent given covariate) or informative about the recurrent event process.

Notations

Let $N(t)$ be the number of recurrent events occurring over the interval $[0, t]$ and $D$ be the failure time of interest subjects to right censoring by $C$. Define the composite censoring time $Y = \min(D, C, \tau)$ and the failure event indicator $\Delta = I{D\le \min(C, \tau)}$, where $\tau$ is the maximum follow-up time. We assume the recurrent event process $N(\cdot)$ is observed up to $Y$. Let $X$ be a $p$-dimensional covariate vector. Consider a random sample of $n$ subjects, the observed data are independent and identically distributed (iid) copies of ${Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i}$, where the subscript $i$ denotes the index of a subject for $i= 1, \ldots, n$. Let $m_i = N_i(Y_i)$ be the number of recurrent events the $i$th subject experienced before time $Y_i$, then the jump times of $N_i(t)$ give the observed recurrent event times $t_{i1}, \ldots, t_{im_i}$ when $m_i > 0$. Thus, the observed data can also be expressed as iid copies of ${Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})}$. The primary interests in recurrent event data analysis often lie in making inference about the recurrent event process and the failure event and understanding the corresponding covariate effects. The built-in dataset, simDat, is a simulated example of a recurrent event data that is generated from the simGSC() function:

library(reReg)
packageVersion("reReg")
data(simDat)

The `simGSC` function {-}

The function simGSC generates the recurrent times from the joint model \begin{equation} \begin{cases} \lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\ h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau], \end{cases} \end{equation} where $\lambda_0(t)$ is the baseline rate function for the recurrent event process, $h_0(t)$ is the baseline hazard function for the failure time, $Z$ is a non-negative subject-specific latent frailty variable. The $p$-dimensional regression coefficients $(\alpha, \eta)$ and $(\beta, \theta)$ correspond to the shape and the size parameters, respectively. The arguments in simGSC are as follow

args(simGSC)

The arguments are as follows

n is the number of individual
summary is a logical value indicating whether a brief data summary will be printed.
para is a list of regression parameters. The list members alpha, beta, eta, and theta correspond to $\alpha$, $\beta$, $\eta$, and $\theta$ in (1), respectively.
xmat is the $n\times p$ design matrix.
censoring is a $n$-dimensional numerical vector to specify the independent censoring time, $C$.
frailty is a $n$-dimensional numerical vector to specify the frailty variable, $Z$.
tau is the maximum follow-up time, $\tau$.
origin is the time origin for each subject.
Lam0 is a single argument function used to specify the baseline cumulative rate function, $\Lambda_0(t)$.
Haz0 is a single argument function used to specify the baseline cumulative hazard function, $H_0(t)$.

At the default setting, the simGSC() function assumes $p = 2$ and the regression parameters to be $\alpha = \eta=(0, 0)^\top$, $\beta = (-1, -1)^\top$, and $\theta = (1, 1)^\top$. When the xmat and the censoring arguments are not specified, the simGSC() function assumes $X_i$ is a two-dimensional vector $X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n$, where $X_{i1}$ is a Bernoulli variable with rate 0.5 and $X_{i2}$ is a standard normal variable. With the default xmat, the censoring time $C$ is generated from an independent uniform distribution in $[0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]$. Thus, the censoring distribution is covariate dependent and is informative when $Z$ is not a constant. On the other hand, when xmat is specified by the users, the censoring time $C$ is generated from an independent uniform distribution $[0, 2 \tau]$. When the frailty argument is not specified, the frailty variable $Z$ is generated from a gamma distribution with a unit mean and a variance of 0.25. The default values for tau and origin are 60 and 0, respectively. When arguments Lam0 and Haz0 are left unspecified, the simGSC() function uses $\Lambda_0(t) = 2\log(1 + t)$ and $H_0(t) = \log(1 + t) / 5$, respectively. This is equivalent to setting Lam0 = function(x) 2 * log(1 + x) and Haz0 = function(x) log(1 + x) / 5. In summary, the default specifications generate the recurrent events and the terminal events from the model: $$\begin{cases} \lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\ h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60]. \end{cases}$$

Examples

`simDat`

The simGSC() function generates simulated data from the above specification and returns a data.frame in the same format as the built-in data set, simDat. Specifically, simDat was generated using the default settings of simGSC().

data(simDat)
set.seed(0); dat <- simGSC(200, summary = TRUE)
identical(dat, simDat)

Simulating from general scale-change models

The simDat is an example that simulates recurrent event data when both the rate function and the hazard function are Cox-type model.

set.seed(0); datCox <- simGSC(200, summary = TRUE)

The hypothesis test proposed by @xu2019generalized shows a strong evidence that the rate function has a Cox structure. ```{R, cache = TRUE} fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2, model = "gsc", B = 200, se = "sand", data = datCox) summary(fit, test = TRUE)

Similarly, the hypothesis test shows a strong evidence that the rate function has
an accelerated mean or accelerated rate structure when the data is simulated under
these.

When rate has an accelerated mean structure:
```{R, cache = TRUE}
set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1)))
summary(update(fit, data = datAM), test = TRUE)

When rate has an accelerated rate structure: {R, cache = TRUE} set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0))) summary(update(fit, data = datAR), test = TRUE)