knitr::opts_chunk$set(prompt = TRUE, comment = "")
In this vignette, we demonstrate how to use the simGSC()
function in reReg
package
to simulate recurrent event data from a general scale-change model.
Since the scale-change model includes the Cox-type model and the
accelerated mean model as special cases, simGSC()
can also be used to generate data from these sub-models.
The simGSC()
function allows the censoring time to be non-informative (independent given covariate)
or informative about the recurrent event process.
Let $N(t)$ be the number of recurrent events occurring over the interval $[0, t]$ and
$D$ be the failure time of interest subjects to right censoring by $C$.
Define the composite censoring time $Y = \min(D, C, \tau)$ and
the failure event indicator $\Delta = I{D\le \min(C, \tau)}$, where
$\tau$ is the maximum follow-up time.
We assume the recurrent event process $N(\cdot)$ is observed up to $Y$.
Let $X$ be a $p$-dimensional covariate vector.
Consider a random sample of $n$ subjects,
the observed data are independent and identically distributed (iid) copies of
${Y_i, \Delta_i, X_i, N_i(t), 0\le t\le Y_i}$,
where the subscript $i$ denotes the index of a subject for $i= 1, \ldots, n$.
Let $m_i = N_i(Y_i)$ be the number of recurrent events the $i$th subject experienced
before time $Y_i$, then the jump times of $N_i(t)$ give the observed recurrent
event times
$t_{i1}, \ldots, t_{im_i}$ when $m_i > 0$.
Thus, the observed data can also be expressed as iid copies
of ${Y_i, \Delta_i, X_i, m_i, (t_{i1}, \ldots, t_{m_i})}$.
The primary interests in recurrent event data analysis often lie in
making inference about the recurrent event process and the failure event
and understanding the corresponding covariate effects.
The built-in dataset, simDat
, is a simulated example of a recurrent event data that is generated from the
simGSC()
function:
library(reReg) packageVersion("reReg") data(simDat)
simGSC
function {-}The function simGSC
generates the recurrent times from the joint model
\begin{equation}
\begin{cases}
\lambda(t) = \displaystyle Z\lambda_0(te^{X^\top\alpha})e^{X^\top\beta},&\
h(t) = \displaystyle Zh_0(te^{X^\top\eta})e^{X^\top\theta}, &t\in[0, \tau],
\end{cases}
\end{equation}
where $\lambda_0(t)$ is the baseline rate function for the recurrent event process,
$h_0(t)$ is the baseline hazard function for the failure time,
$Z$ is a non-negative subject-specific latent frailty variable.
The $p$-dimensional regression coefficients
$(\alpha, \eta)$ and $(\beta, \theta)$
correspond to the shape and the size parameters, respectively.
The arguments in simGSC
are as follow
args(simGSC)
The arguments are as follows
n
is the number of individualsummary
is a logical value indicating whether a brief data summary will be printed.para
is a list of regression parameters.
The list members alpha
, beta
, eta
, and theta
correspond to $\alpha$, $\beta$, $\eta$, and $\theta$ in (1), respectively.xmat
is the $n\times p$ design matrix. censoring
is a $n$-dimensional numerical vector to specify the independent censoring time, $C$.frailty
is a $n$-dimensional numerical vector to specify the frailty variable, $Z$.tau
is the maximum follow-up time, $\tau$.origin
is the time origin for each subject.Lam0
is a single argument function used to specify the baseline cumulative rate function, $\Lambda_0(t)$.Haz0
is a single argument function used to specify the baseline cumulative hazard function, $H_0(t)$.At the default setting, the simGSC()
function assumes $p = 2$ and the regression
parameters to be $\alpha = \eta=(0, 0)^\top$, $\beta = (-1, -1)^\top$, and $\theta = (1, 1)^\top$.
When the xmat
and the censoring
arguments are not specified,
the simGSC()
function assumes $X_i$ is a two-dimensional vector $X_i = (X_{i1}, X_{i2}), i = 1, \ldots, n$,
where $X_{i1}$ is a Bernoulli variable with rate 0.5 and $X_{i2}$ is a standard normal variable.
With the default xmat
, the censoring time $C$ is generated from
an independent uniform distribution in $[0, 2\tau X_{i1} + 2Z^2\tau(1 - X_{i1})]$.
Thus, the censoring distribution is covariate dependent and
is informative when $Z$ is not a constant.
On the other hand, when xmat
is specified by the users,
the censoring time $C$ is generated from an independent uniform distribution $[0, 2 \tau]$.
When the frailty
argument is not specified, the frailty variable $Z$ is generated
from a gamma distribution with a unit mean and a variance of 0.25.
The default values for tau
and origin
are 60 and 0, respectively.
When arguments Lam0
and Haz0
are left unspecified,
the simGSC()
function uses $\Lambda_0(t) = 2\log(1 + t)$
and $H_0(t) = \log(1 + t) / 5$, respectively.
This is equivalent to setting
Lam0 = function(x) 2 * log(1 + x)
and Haz0 = function(x) log(1 + x) / 5
.
In summary, the default specifications generate the recurrent events and the terminal events
from the model:
$$\begin{cases}
\lambda(t) = \displaystyle \frac{2Z}{1 + te^{-X_{i1} - X_{i2}}}, &\
h(t) = \displaystyle \frac{Z}{5(1 + te^{X_{i1} + X_{i2}})}, & t\in[0, 60].
\end{cases}$$
simDat
The simGSC()
function generates simulated data from the above specification and returns a
data.frame
in the same format as the built-in data set, simDat
.
Specifically, simDat
was generated using the default settings of simGSC()
.
data(simDat) set.seed(0); dat <- simGSC(200, summary = TRUE) identical(dat, simDat)
The simDat
is an example that simulates recurrent event data when both the rate function and the hazard function
are Cox-type model.
set.seed(0); datCox <- simGSC(200, summary = TRUE)
The hypothesis test proposed by @xu2019generalized shows a strong evidence that the rate function has a Cox structure. ```{R, cache = TRUE} fit <- reReg(Recur(t.start %to% t.stop, id, event, status) ~ x1 + x2, model = "gsc", B = 200, se = "sand", data = datCox) summary(fit, test = TRUE)
Similarly, the hypothesis test shows a strong evidence that the rate function has an accelerated mean or accelerated rate structure when the data is simulated under these. When rate has an accelerated mean structure: ```{R, cache = TRUE} set.seed(0); datAM <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(-1, -1))) summary(update(fit, data = datAM), test = TRUE)
When rate has an accelerated rate structure:
{R, cache = TRUE}
set.seed(0); datAR <- simGSC(200, para = list(alpha = c(-1, -1), beta = c(0, 0)))
summary(update(fit, data = datAR), test = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.