sim.survdata: Simulating duration data for the Cox proportional hazards...
In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

Description Usage Arguments Details Value Author(s) References Examples

sim.survdata() randomly generates data frames containing a user-specified number of observations, time points, and covariates. It generates durations, a variable indicating whether each observation is right-censored, and "true" marginal effects. It can accept user-specified coefficients, covariates, and baseline hazard functions, and it can output data with time-varying covariates or using time-varying coefficients.

sim.survdata(N = 1000, T = 100, type = "none", hazard.fun = NULL,
  num.data.frames = 1, fixed.hazard = FALSE, knots = 8,
  spline = TRUE, X = NULL, beta = NULL, xvars = 3, mu = 0,
  sd = 0.5, covariate = 1, low = 0, high = 1, compare = median,
  censor = 0.1, censor.cond = FALSE)

`N`	Number of observations in each generated data frame. Ignored if `X` is not `NULL`
`T`	The latest time point during which an observation may fail. Failures can occur as early as 1 and as late as T
`type`	If "none" (the default) data are generated with no time-varying covariates or coefficients. If "tvc", data are generated with time-varying covariates, and if "tvbeta" data are generated with time-varying coefficients (see details)
`hazard.fun`	A user-specified R function with one argument, representing time, that outputs the baseline hazard function. If `NULL`, a baseline hazard function is generated using the flexible-hazard method as described in Harden and Kropko (2018) (see details)
`num.data.frames`	The number of data frames to be generated
`fixed.hazard`	If `TRUE`, the same hazard function is used to generate each data frame. If `FALSE` (the default), different drawn hazard functions are used to generate each data frame. Ignored if `hazard.fun` is not `NULL` or if `num.data.frames` is 1
`knots`	The number of points to draw while using the flexible-hazard method to generate hazard functions (default is 8). Ignored if `hazard.fun` is not `NULL`
`spline`	If `TRUE` (the default), a spline is employed to smooth the generated cumulative baseline hazard, and if `FALSE` the cumulative baseline hazard is specified as a step function with steps at the knots. Ignored if `hazard.fun` is not `NULL`
`X`	A user-specified data frame containing the covariates that condition duration. If `NULL`, covariates are generated from normal distributions with means given by the `mu` argument and standard deviations given by the `sd` argument
`beta`	Either a user-specified vector containing the coefficients that for the linear part of the duration model, or a user specified matrix with rows equal to `T` for pre-specified time-varying coefficients. If `NULL`, coefficients are generated from normal distributions with means of 0 and standard deviations of 0.1
`xvars`	The number of covariates to generate. Ignored if `X` is not `NULL`
`mu`	If scalar, all covariates are generated to have means equal to this scalar. If a vector, it specifies the mean of each covariate separately, and it must be equal in length to `xvars`. Ignored if `X` is not `NULL`
`sd`	If scalar, all covariates are generated to have standard deviations equal to this scalar. If a vector, it specifies the standard deviation of each covariate separately, and it must be equal in length to `xvars`. Ignored if `X` is not `NULL`
`covariate`	Specification of the column number of the covariate in the `X` matrix for which to generate a simulated marginal effect (default is 1). The marginal effect is the difference in expected duration when the covariate is fixed at a high value and the expected duration when the covariate is fixed at a low value
`low`	The low value of the covariate for which to calculate a marginal effect
`high`	The high value of the covariate for which to calculate a marginal effect
`compare`	The statistic to employ when examining the two new vectors of expected durations (see details). The default is `median`
`censor`	The proportion of observations to designate as being right-censored
`censor.cond`	Whether to make right-censoring conditional on the covariates (default is `FALSE`, but see details)

The sim.survdata function generates simulated duration data. It can accept a user-supplied hazard function, or else it uses the flexible-hazard method described in Harden and Kropko (2018) to generate a hazard that does not necessarily conform to any parametric hazard function. It can generate data with time-varying covariates or coefficients. For time-varying covariates type="tvc" it employs the permutational algorithm by Sylvestre and Abrahamowicz (2008). For time-varying coefficients with type="tvbeta", the first beta coefficient that is either supplied by the user or generated by the function is multiplied by the natural log of the failure time under consideration.

If fixed.hazard=TRUE, one baseline hazard is generated and the same function is used to generate all of the simulated datasets. If fixed.hazard=FALSE (the default), a new hazard function is generated with each simulation iteration.

The flexible-hazard method employed when hazard.fun is NULL generates a unique baseline hazard by fitting a curve to randomly-drawn points. This produces a wide variety of shapes for the baseline hazard, including those that are unimodal, multimodal, monotonically increasing or decreasing, and many other shapes. The method then generates a density function based on each baseline hazard and draws durations from it in a way that circumvents the need to calculate the inverse cumulative baseline hazard. Because the shape of the baseline hazard can vary considerably, this approach matches the Cox model’s inherent flexibility and better corresponds to the assumed data generating process (DGP) of the Cox model. Moreover, repeating this process over many iterations in a simulation produces simulated samples of data that better reflect the considerable heterogeneity in data used by applied researchers. This increases the generalizability of the simulation results. See Harden and Kropko (2018) for more detail.

When generating a marginal effect, first the user specifies a covariate by typing its column number in the X matrix into the covariate argument, then specifies the high and low values at which to fix this covariate. The function calculates the differences in expected duration for each observation when fixing the covariate to the high and low values. If compare is median, the function reports the median of these differences, and if compare is mean, the function reports the median of these differences, but any function may be employed that takes a vector as input and outputs a scalar.

If censor.cond is FALSE then a proportion of the observations specified by censor is randomly and uniformly selected to be right-censored. If censor.cond is TRUE then censoring depends on the covariates as follows: new coefficients are drawn from normal distributions with mean 0 and standard deviation of 0.1, and these new coefficients are used to create a new linear predictor using the X matrix. The observations with the largest (100 x censor) percent of the linear predictors are designated as right-censored.

Returns an object of class "simSurvdata" which is a list of length num.data.frames for each iteration of data simulation. Each element of this list is itself a list with the following components:

`data`	The simulated data frame, including the simulated durations, the censoring variable, and covariates
`xdata`	The simulated data frame, containing only covariates
`baseline`	A data frame containing every potential failure time and the baseline failure PDF, baseline failure CDF, baseline survivor function, and baseline hazard function at each time point.
`xb`	The linear predictor for each observation
`exp.xb`	The exponentiated linear predictor for each observation
`betas`	The coefficients, varying over time if `type` is "tvbeta"
`ind.survive`	An (`N` x `T`) matrix containing the individual survivor function at time t for the individual represented by row n
`marg.effect`	The simulated marginal change in expected duration comparing the high and low values of the variable specified with `covariate`
`marg.effect.data`	The `X` matrix and vector of durations for the low and high conditions

Jonathan Kropko <jkropko@virginia.edu> and Jeffrey J. Harden <jharden2@nd.edu>

Harden, J. J. and Kropko, J. (2018). Simulating Duration Data for the Cox Model. Political Science Research and Methods https://doi.org/10.1017/psrm.2018.19

Sylvestre M.-P., Abrahamowicz M. (2008) Comparison of algorithms to generate event times conditional on time-dependent covariates. Statistics in Medicine 27(14):2618–34.

simdata <- sim.survdata(N=1000, T=100, num.data.frames=2)
require(survival)
data <- simdata[[1]]$data
model <- coxph(Surv(y, failed) ~ X1 + X2 + X3, data=data)
model$coefficients ## model-estimated coefficients
simdata[[1]]$betas ## "true" coefficients

## User-specified baseline hazard
my.hazard <- function(t){ #lognormal with mean of 50, sd of 10
dnorm((log(t) - log(50))/log(10)) /
     (log(10)*t*(1 - pnorm((log(t) - log(50))/log(10))))
}
simdata <- sim.survdata(N=1000, T=100, hazard.fun = my.hazard)

## A simulated data set with time-varying covariates
## Not run: simdata <- sim.survdata(N=1000, T=100, type="tvc", xvars=5, num.data.frames=1)
summary(simdata$data)
model <- coxph(Surv(start, end, failed) ~ X1 + X2 + X3 + X4 + X5, data=simdata$data)
model$coefficients ## model-estimated coefficients
simdata$betas ## "true" coefficients

## End(Not run)

## A simulated data set with time-varying coefficients
simdata <- sim.survdata(N=1000, T=100, type="tvbeta", num.data.frames = 1)
simdata$betas

coxed documentation built on Aug. 2, 2020, 9:07 a.m.

coxed index

Package overview How to simulate survival data with the `sim.survdata` function How to use the `coxed` function

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

coxed
Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

sim.survdata: Simulating duration data for the Cox proportional hazards...
In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to sim.survdata in coxed...

R Package Documentation

Browse R Packages

We want your feedback!

coxed Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

sim.survdata: Simulating duration data for the Cox proportional hazards... In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to sim.survdata in coxed...

R Package Documentation

Browse R Packages

We want your feedback!

coxed
Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

sim.survdata: Simulating duration data for the Cox proportional hazards...
In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model