idmstan: Bayesian multi-state models via Stan
In csetraynor/rms: Multi-State models for survival data in R and Stan

Description Usage Arguments Details Examples

Bayesian inference for multi-state models (sometimes known as models for competing risk data). Currently, the command fits standard parametric (exponential, Weibull and Gompertz) and flexible parametric (cubic spline-based) hazard models on the hazard scale, with covariates included under assumptions of either proportional or non-proportional hazards. Where relevant, non-proportional hazards are modelled using a flexible cubic spline-based function for the time-dependent effect (i.e. the time-dependent hazard ratio).

idmstan(formula01, formula02, formula12, data, h = 3, basehaz01 = "ms",
  basehaz02 = "ms", basehaz12 = "ms", basehaz_ops01, basehaz_ops02,
  basehaz_ops12, prior01 = rstanarm::normal(),
  prior_intercept01 = rstanarm::normal(),
  prior_aux01 = rstanarm::normal(), prior02 = rstanarm::normal(),
  prior_intercept02 = rstanarm::normal(),
  prior_aux02 = rstanarm::normal(), prior12 = rstanarm::normal(),
  prior_intercept12 = rstanarm::normal(),
  prior_aux12 = rstanarm::normal(), prior_PD = FALSE,
  algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = 0.95, ...)

`data`	A data frame containing the variables specified in `formula`.
`formula`	A two-sided formula object describing the model. The left hand side of the formula should be a `Surv()` object. Left censored, right censored, and interval censored data are allowed, as well as delayed entry (i.e. left truncation). See `Surv` for how to specify these outcome types. If you wish to include time-dependent effects (i.e. time-dependent coefficients, also known as non-proportional hazards) in the model then any covariate(s) that you wish to estimate a time-dependent coefficient for should be specified as `tde(varname)` where `varname` is the name of the covariate. See the Details section for more information on how the time-dependent effects are formulated, as well as the Examples section.
`basehaz`	A character string indicating which baseline hazard to use for the event submodel. Current options are: `"ms"`: a flexible parametric model using cubic M-splines to model the baseline hazard. The default locations for the internal knots, as well as the basis terms for the splines, are calculated with respect to time. If the model does not include any time-dependendent effects then a closed form solution is available for both the hazard and cumulative hazard and so this approach should be relatively fast. On the other hand, if the model does include time-dependent effects then quadrature is used to evaluate the cumulative hazard at each MCMC iteration and, therefore, estimation of the model will be slower. `"bs"`: a flexible parametric model using cubic B-splines to model the log baseline hazard. The default locations for the internal knots, as well as the basis terms for the splines, are calculated with respect to time. A closed form solution for the cumulative hazard is not available regardless of whether or not the model includes time-dependent effects; instead, quadrature is used to evaluate the cumulative hazard at each MCMC iteration. Therefore, if your model does not include any time-dependent effects, then estimation using the `"ms"` baseline hazard will be faster. `"exp"`: an exponential distribution for the event times. (i.e. a constant baseline hazard) `"weibull"`: a Weibull distribution for the event times. `"gompertz"`: a Gompertz distribution for the event times.
`basehaz_ops`	A named list specifying options related to the baseline hazard. Currently this can include: `df`: a positive integer specifying the degrees of freedom for the M-splines or B-splines. An intercept is included in the spline basis and included in the count of the degrees of freedom, such that two boundary knots and `df - 4` internal knots are used to generate the cubic spline basis. The default is `df = 6`; that is, two boundary knots and two internal knots. `knots`: An optional numeric vector specifying internal knot locations for the M-splines or B-splines. Note that `knots` cannot be specified if `df` is specified. If `knots` are not specified, then `df - 4` internal knots are placed at equally spaced percentiles of the distribution of uncensored event times. Note that for the M-splines and B-splines - in addition to any internal `knots` - a lower boundary knot is placed at the earliest entry time and an upper boundary knot is placed at the latest event or censoring time. These boundary knot locations are the default and cannot be changed by the user.
`qnodes`	The number of nodes to use for the Gauss-Kronrod quadrature that is used to evaluate the cumulative hazard when `basehaz = "bs"` or when time-dependent effects (i.e. non-proportional hazards) are specified. Options are 15 (the default), 11 or 7.
`prior_intercept`	The prior distribution for the intercept. Note that there will only be an intercept parameter when `basehaz` is set equal to one of the standard parametric distributions, i.e. `"exp"`, `"weibull"` or `"gompertz"`, in which case the intercept corresponds to the parameter log(lambda) as defined in the stan_surv: Survival (Time-to-Event) Models vignette. For the cubic spline-based baseline hazards there is no intercept parameter since it is absorbed into the spline basis and, therefore, the prior for the intercept is effectively specified as part of `prior_aux`. Where relevant, `prior_intercept` can be a call to `normal`, `student_t` or `cauchy`. See the priors help page for details on these functions. Note however that default scale for `prior_intercept` is 20 for `stan_surv` models (rather than 10, which is the default scale used for `prior_intercept` by most rstanarm modelling functions). To omit a prior on the intercept —i.e., to use a flat (improper) uniform prior— `prior_intercept` can be set to `NULL`.
`prior_aux`	The prior distribution for "auxiliary" parameters related to the baseline hazard. The relevant parameters differ depending on the type of baseline hazard specified in the `basehaz` argument. The following applies: `basehaz = "ms"`: the auxiliary parameters are the coefficients for the M-spline basis terms on the baseline hazard. These parameters have a lower bound at zero. `basehaz = "bs"`: the auxiliary parameters are the coefficients for the B-spline basis terms on the log baseline hazard. These parameters are unbounded. `basehaz = "exp"`: there is no auxiliary parameter, since the log scale parameter for the exponential distribution is incorporated as an intercept in the linear predictor. `basehaz = "weibull"`: the auxiliary parameter is the Weibull shape parameter, while the log scale parameter for the Weibull distribution is incorporated as an intercept in the linear predictor. The auxiliary parameter has a lower bound at zero. `basehaz = "gompertz"`: the auxiliary parameter is the Gompertz scale parameter, while the log shape parameter for the Gompertz distribution is incorporated as an intercept in the linear predictor. The auxiliary parameter has a lower bound at zero. Currently, `prior_aux` can be a call to `normal`, `student_t` or `cauchy`. See `priors` for details on these functions. To omit a prior —i.e., to use a flat (improper) uniform prior— set `prior_aux` to `NULL`.
`prior_smooth`	This is only relevant when time-dependent effects are specified in the model (i.e. the `tde()` function is used in the model formula. When that is the case, `prior_smooth` determines the prior distribution given to the hyperparameter (standard deviation) contained in a random-walk prior for the cubic B-spline coefficients used to model the time-dependent coefficient. Lower values for the hyperparameter yield a less a flexible smooth function for the time-dependent coefficient. Specifically, `prior_smooth` can be a call to `exponential` to use an exponential distribution, or `normal`, `student_t` or `cauchy`, which results in a half-normal, half-t, or half-Cauchy prior. See `priors` for details on these functions. To omit a prior —i.e., to use a flat (improper) uniform prior— set `prior_smooth` to `NULL`. The number of hyperparameters depends on the model specification (i.e. the number of time-dependent effects specified in the model) but a scalar prior will be recylced as necessary to the appropriate length.

Time dependent effects (i.e. non-proportional hazards)

By default, any covariate effects specified in the formula are included in the model under a proportional hazards assumption. To relax this assumption, it is possible to estimate a time-dependent coefficient for a given covariate. This can be specified in the model formula by wrapping the covariate name in the tde() function (note that this function is not an exported function, rather it is an internal function that can only be evaluated within the formula of a stan_surv call).

For example, if we wish to estimate a time-dependent effect for the covariate sex then we can specify tde(sex) in the formula, e.g. Surv(time, status) ~ tde(sex) + age + trt. The coefficient for sex will then be modelled using a flexible smooth function based on a cubic B-spline expansion of time.

The flexibility of the smooth function can be controlled in two ways:

First, through control of the prior distribution for the cubic B-spline coefficients that are used to model the time-dependent coefficient. Specifically, one can control the flexibility of the prior through the hyperparameter (standard deviation) of the random walk prior used for the B-spline coefficients; see the prior_smooth argument.
Second, one can increase or decrease the number of degrees of freedom used for the cubic B-spline function that is used to model the time-dependent coefficient. By default the cubic B-spline basis is evaluated using 3 degrees of freedom (that is a cubic spline basis with boundary knots at the limits of the time range, but no internal knots). If you wish to increase the flexibility of the smooth function by using a greater number of degrees of freedom, then you can specify this as part of the tde function call in the model formula. For example, to use cubic B-splines with 7 degrees of freedom we could specify tde(sex, df = 7) in the model formula instead of just tde(sex). See the Examples section below for more details.

In practice, the default tde() function should provide sufficient flexibility for model most time-dependent effects. However, it is worth noting that the reliable estimation of a time-dependent effect usually requires a relatively large number of events in the data (e.g. >1000).

#---------- Proportional hazards

# Simulated data
library("SemiCompRisks")
library("scales")


# Data generation parameters
n <- 1500
beta1.true <- c(0.1, 0.1)
beta2.true <- c(0.2, 0.2)
beta3.true <- c(0.3, 0.3)
alpha1.true <- 0.12
alpha2.true <- 0.23
alpha3.true <- 0.34
kappa1.true <- 0.33
kappa2.true <- 0.11
kappa3.true <- 0.22
theta.true <- 0

# Make design matrix with single binary covariate
x_c <- rbinom(n, size = 1, prob = 0.7)
x_m <- cbind(1, x_c)

# Generate semicompeting data
dat_ID <- SemiCompRisks::simID(x1 = x_m, x2 = x_m, x3 = x_m,
                               beta1.true = beta1.true, 
                               beta2.true = beta2.true, 
                               beta3.true = beta3.true,
                               alpha1.true = alpha1.true, 
                               alpha2.true = alpha2.true, 
                               alpha3.true = alpha3.true,
                               kappa1.true = kappa1.true, 
                               kappa2.true = kappa2.true, 
                               kappa3.true = kappa3.true,
                               theta.true = theta.true,
                               cens = c(240, 360))

dat_ID$x_c <- x_c
colnames(dat_ID)[1:4] <- c("R", "delta_R", "tt", "delta_T")

              
              
formula01 <- Surv(time=rr,event=delta_R)~x_c
formula02 <- Surv(time=tt,event=delta_T)~x_c
formula12 <- Surv(time=difft,event=delta_T)~x_c