Description Usage Arguments Details Value Author(s) References Examples

`sim.survdata()`

randomly generates data frames containing a user-specified number
of observations, time points, and covariates. It generates durations, a variable indicating
whether each observation is right-censored, and "true" marginal effects.
It can accept user-specified coefficients, covariates, and baseline hazard functions, and it can
output data with time-varying covariates or using time-varying coefficients.

1 2 3 4 5 |

`N` |
Number of observations in each generated data frame. Ignored if |

`T` |
The latest time point during which an observation may fail. Failures can occur as early as 1 and as late as T |

`type` |
If "none" (the default) data are generated with no time-varying covariates or coefficients. If "tvc", data are generated with time-varying covariates, and if "tvbeta" data are generated with time-varying coefficients (see details) |

`hazard.fun` |
A user-specified R function with one argument, representing time, that outputs the baseline hazard function.
If |

`num.data.frames` |
The number of data frames to be generated |

`fixed.hazard` |
If |

`knots` |
The number of points to draw while using the flexible-hazard method to generate hazard functions (default is 8).
Ignored if |

`spline` |
If |

`X` |
A user-specified data frame containing the covariates that condition duration. If |

`beta` |
Either a user-specified vector containing the coefficients that for the linear part of the duration model, or
a user specified matrix with rows equal to |

`xvars` |
The number of covariates to generate. Ignored if |

`mu` |
If scalar, all covariates are generated to have means equal to this scalar. If a vector, it specifies the mean of each covariate separately,
and it must be equal in length to |

`sd` |
If scalar, all covariates are generated to have standard deviations equal to this scalar. If a vector, it specifies the standard deviation
of each covariate separately, and it must be equal in length to |

`covariate` |
Specification of the column number of the covariate in the |

`low` |
The low value of the covariate for which to calculate a marginal effect |

`high` |
The high value of the covariate for which to calculate a marginal effect |

`compare` |
The statistic to employ when examining the two new vectors of expected durations (see details). The default is |

`censor` |
The proportion of observations to designate as being right-censored |

`censor.cond` |
Whether to make right-censoring conditional on the covariates (default is |

The `sim.survdata`

function generates simulated duration data. It can accept a user-supplied
hazard function, or else it uses the flexible-hazard method described in Harden and Kropko (2018) to generate
a hazard that does not necessarily conform to any parametric hazard function. It can generate data with time-varying
covariates or coefficients. For time-varying covariates `type="tvc"`

it employs the permutational algorithm by Sylvestre and Abrahamowicz (2008).
For time-varying coefficients with `type="tvbeta"`

, the first beta coefficient that is either supplied by the user or generated by
the function is multiplied by the natural log of the failure time under consideration.

If `fixed.hazard=TRUE`

, one baseline hazard is generated and the same function is used to generate all of the simulated
datasets. If `fixed.hazard=FALSE`

(the default), a new hazard function is generated with each simulation iteration.

The flexible-hazard method employed when `hazard.fun`

is `NULL`

generates a unique baseline hazard by fitting a curve to
randomly-drawn points. This produces a wide variety
of shapes for the baseline hazard, including those that are unimodal, multimodal, monotonically increasing or decreasing, and many other
shapes. The method then generates a density function based on each baseline hazard and draws durations from it in a way that circumvents
the need to calculate the inverse cumulative baseline hazard. Because the shape of the baseline hazard can vary considerably, this approach
matches the Cox model’s inherent flexibility and better corresponds to the assumed data generating process (DGP) of the Cox model. Moreover,
repeating this process over many iterations in a simulation produces simulated samples of data that better reflect the considerable
heterogeneity in data used by applied researchers. This increases the generalizability of the simulation results. See Harden and Kropko (2018)
for more detail.

When generating a marginal effect, first the user specifies a covariate by typing its column number in the `X`

matrix into the `covariate`

argument, then specifies the high and low values at which to fix this covariate. The function calculates the differences in expected duration for each
observation when fixing the covariate to the high and low values. If `compare`

is `median`

, the function reports the median of these differences,
and if `compare`

is `mean`

, the function reports the median of these differences, but any function may be employed that takes a vector as input and
outputs a scalar.

If `censor.cond`

is `FALSE`

then a proportion of the observations specified by `censor`

is randomly and uniformly selected to be right-censored.
If `censor.cond`

is `TRUE`

then censoring depends on the covariates as follows: new coefficients are drawn from normal distributions with mean 0 and
standard deviation of 0.1, and these new coefficients are used to create a new linear predictor using the `X`

matrix. The observations with the largest
(100 x `censor`

) percent of the linear predictors are designated as right-censored.

Returns an object of class "`simSurvdata`

" which is a list of length `num.data.frames`

for each iteration of data simulation.
Each element of this list is itself a list with the following components:

`data` | The simulated data frame, including the simulated durations, the censoring variable, and covariates |

`xdata` | The simulated data frame, containing only covariates |

`baseline` | A data frame containing every potential failure time and the baseline failure PDF, baseline failure CDF, baseline survivor function, and baseline hazard function at each time point. |

`xb` | The linear predictor for each observation |

`exp.xb` | The exponentiated linear predictor for each observation |

`betas` | The coefficients, varying over time if `type` is "tvbeta" |

`ind.survive` | An (`N` x `T` ) matrix containing the individual survivor function at
time t for the individual represented by row n |

`marg.effect` | The simulated marginal change in expected duration comparing the high and low values of
the variable specified with `covariate` |

`marg.effect.data` | The `X` matrix and vector of durations for the low and high conditions |

Jonathan Kropko <jkropko@virginia.edu> and Jeffrey J. Harden <jharden2@nd.edu>

Harden, J. J. and Kropko, J. (2018). Simulating Duration Data for the Cox Model.
*Political Science Research and Methods* https://doi.org/10.1017/psrm.2018.19

Sylvestre M.-P., Abrahamowicz M. (2008) Comparison of algorithms to generate event times conditional on time-dependent covariates. *Statistics in Medicine* **27(14)**:2618–34.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
simdata <- sim.survdata(N=1000, T=100, num.data.frames=2)
require(survival)
data <- simdata[[1]]$data
model <- coxph(Surv(y, failed) ~ X1 + X2 + X3, data=data)
model$coefficients ## model-estimated coefficients
simdata[[1]]$betas ## "true" coefficients
## User-specified baseline hazard
my.hazard <- function(t){ #lognormal with mean of 50, sd of 10
dnorm((log(t) - log(50))/log(10)) /
(log(10)*t*(1 - pnorm((log(t) - log(50))/log(10))))
}
simdata <- sim.survdata(N=1000, T=100, hazard.fun = my.hazard)
## A simulated data set with time-varying covariates
## Not run: simdata <- sim.survdata(N=1000, T=100, type="tvc", xvars=5, num.data.frames=1)
summary(simdata$data)
model <- coxph(Surv(start, end, failed) ~ X1 + X2 + X3 + X4 + X5, data=simdata$data)
model$coefficients ## model-estimated coefficients
simdata$betas ## "true" coefficients
## End(Not run)
## A simulated data set with time-varying coefficients
simdata <- sim.survdata(N=1000, T=100, type="tvbeta", num.data.frames = 1)
simdata$betas
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.