idmstan: Bayesian multi-state models via Stan

Description Usage Arguments Details Examples

Description

http://mc-stan.org/about/logo/ Bayesian inference for multi-state models (sometimes known as models for competing risk data). Currently, the command fits standard parametric (exponential, Weibull and Gompertz) and flexible parametric (cubic spline-based) hazard models on the hazard scale, with covariates included under assumptions of either proportional or non-proportional hazards. Where relevant, non-proportional hazards are modelled using a flexible cubic spline-based function for the time-dependent effect (i.e. the time-dependent hazard ratio).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
idmstan(formula01, formula02, formula12, data, h = 3, basehaz01 = "ms",
  basehaz02 = "ms", basehaz12 = "ms", basehaz_ops01, basehaz_ops02,
  basehaz_ops12, prior01 = rstanarm::normal(),
  prior_intercept01 = rstanarm::normal(),
  prior_aux01 = rstanarm::normal(), prior02 = rstanarm::normal(),
  prior_intercept02 = rstanarm::normal(),
  prior_aux02 = rstanarm::normal(), prior12 = rstanarm::normal(),
  prior_intercept12 = rstanarm::normal(),
  prior_aux12 = rstanarm::normal(), prior_PD = FALSE,
  algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = 0.95, ...)

Arguments

data

A data frame containing the variables specified in formula.

formula

A two-sided formula object describing the model. The left hand side of the formula should be a Surv() object. Left censored, right censored, and interval censored data are allowed, as well as delayed entry (i.e. left truncation). See Surv for how to specify these outcome types. If you wish to include time-dependent effects (i.e. time-dependent coefficients, also known as non-proportional hazards) in the model then any covariate(s) that you wish to estimate a time-dependent coefficient for should be specified as tde(varname) where varname is the name of the covariate. See the Details section for more information on how the time-dependent effects are formulated, as well as the Examples section.

basehaz

A character string indicating which baseline hazard to use for the event submodel. Current options are:

  • "ms": a flexible parametric model using cubic M-splines to model the baseline hazard. The default locations for the internal knots, as well as the basis terms for the splines, are calculated with respect to time. If the model does not include any time-dependendent effects then a closed form solution is available for both the hazard and cumulative hazard and so this approach should be relatively fast. On the other hand, if the model does include time-dependent effects then quadrature is used to evaluate the cumulative hazard at each MCMC iteration and, therefore, estimation of the model will be slower.

  • "bs": a flexible parametric model using cubic B-splines to model the log baseline hazard. The default locations for the internal knots, as well as the basis terms for the splines, are calculated with respect to time. A closed form solution for the cumulative hazard is not available regardless of whether or not the model includes time-dependent effects; instead, quadrature is used to evaluate the cumulative hazard at each MCMC iteration. Therefore, if your model does not include any time-dependent effects, then estimation using the "ms" baseline hazard will be faster.

  • "exp": an exponential distribution for the event times. (i.e. a constant baseline hazard)

  • "weibull": a Weibull distribution for the event times.

  • "gompertz": a Gompertz distribution for the event times.

basehaz_ops

A named list specifying options related to the baseline hazard. Currently this can include:

  • df: a positive integer specifying the degrees of freedom for the M-splines or B-splines. An intercept is included in the spline basis and included in the count of the degrees of freedom, such that two boundary knots and df - 4 internal knots are used to generate the cubic spline basis. The default is df = 6; that is, two boundary knots and two internal knots.

  • knots: An optional numeric vector specifying internal knot locations for the M-splines or B-splines. Note that knots cannot be specified if df is specified. If knots are not specified, then df - 4 internal knots are placed at equally spaced percentiles of the distribution of uncensored event times.

Note that for the M-splines and B-splines - in addition to any internal knots - a lower boundary knot is placed at the earliest entry time and an upper boundary knot is placed at the latest event or censoring time. These boundary knot locations are the default and cannot be changed by the user.

qnodes

The number of nodes to use for the Gauss-Kronrod quadrature that is used to evaluate the cumulative hazard when basehaz = "bs" or when time-dependent effects (i.e. non-proportional hazards) are specified. Options are 15 (the default), 11 or 7.

prior_intercept

The prior distribution for the intercept. Note that there will only be an intercept parameter when basehaz is set equal to one of the standard parametric distributions, i.e. "exp", "weibull" or "gompertz", in which case the intercept corresponds to the parameter log(lambda) as defined in the stan_surv: Survival (Time-to-Event) Models vignette. For the cubic spline-based baseline hazards there is no intercept parameter since it is absorbed into the spline basis and, therefore, the prior for the intercept is effectively specified as part of prior_aux.

Where relevant, prior_intercept can be a call to normal, student_t or cauchy. See the priors help page for details on these functions. Note however that default scale for prior_intercept is 20 for stan_surv models (rather than 10, which is the default scale used for prior_intercept by most rstanarm modelling functions). To omit a prior on the intercept —i.e., to use a flat (improper) uniform prior— prior_intercept can be set to NULL.

prior_aux

The prior distribution for "auxiliary" parameters related to the baseline hazard. The relevant parameters differ depending on the type of baseline hazard specified in the basehaz argument. The following applies:

  • basehaz = "ms": the auxiliary parameters are the coefficients for the M-spline basis terms on the baseline hazard. These parameters have a lower bound at zero.

  • basehaz = "bs": the auxiliary parameters are the coefficients for the B-spline basis terms on the log baseline hazard. These parameters are unbounded.

  • basehaz = "exp": there is no auxiliary parameter, since the log scale parameter for the exponential distribution is incorporated as an intercept in the linear predictor.

  • basehaz = "weibull": the auxiliary parameter is the Weibull shape parameter, while the log scale parameter for the Weibull distribution is incorporated as an intercept in the linear predictor. The auxiliary parameter has a lower bound at zero.

  • basehaz = "gompertz": the auxiliary parameter is the Gompertz scale parameter, while the log shape parameter for the Gompertz distribution is incorporated as an intercept in the linear predictor. The auxiliary parameter has a lower bound at zero.

Currently, prior_aux can be a call to normal, student_t or cauchy. See priors for details on these functions. To omit a prior —i.e., to use a flat (improper) uniform prior— set prior_aux to NULL.

prior_smooth

This is only relevant when time-dependent effects are specified in the model (i.e. the tde() function is used in the model formula. When that is the case, prior_smooth determines the prior distribution given to the hyperparameter (standard deviation) contained in a random-walk prior for the cubic B-spline coefficients used to model the time-dependent coefficient. Lower values for the hyperparameter yield a less a flexible smooth function for the time-dependent coefficient. Specifically, prior_smooth can be a call to exponential to use an exponential distribution, or normal, student_t or cauchy, which results in a half-normal, half-t, or half-Cauchy prior. See priors for details on these functions. To omit a prior —i.e., to use a flat (improper) uniform prior— set prior_smooth to NULL. The number of hyperparameters depends on the model specification (i.e. the number of time-dependent effects specified in the model) but a scalar prior will be recylced as necessary to the appropriate length.

Details

Time dependent effects (i.e. non-proportional hazards)

By default, any covariate effects specified in the formula are included in the model under a proportional hazards assumption. To relax this assumption, it is possible to estimate a time-dependent coefficient for a given covariate. This can be specified in the model formula by wrapping the covariate name in the tde() function (note that this function is not an exported function, rather it is an internal function that can only be evaluated within the formula of a stan_surv call).

For example, if we wish to estimate a time-dependent effect for the covariate sex then we can specify tde(sex) in the formula, e.g. Surv(time, status) ~ tde(sex) + age + trt. The coefficient for sex will then be modelled using a flexible smooth function based on a cubic B-spline expansion of time.

The flexibility of the smooth function can be controlled in two ways:

In practice, the default tde() function should provide sufficient flexibility for model most time-dependent effects. However, it is worth noting that the reliable estimation of a time-dependent effect usually requires a relatively large number of events in the data (e.g. >1000).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#---------- Proportional hazards

# Simulated data
library("SemiCompRisks")
library("scales")


# Data generation parameters
n <- 1500
beta1.true <- c(0.1, 0.1)
beta2.true <- c(0.2, 0.2)
beta3.true <- c(0.3, 0.3)
alpha1.true <- 0.12
alpha2.true <- 0.23
alpha3.true <- 0.34
kappa1.true <- 0.33
kappa2.true <- 0.11
kappa3.true <- 0.22
theta.true <- 0

# Make design matrix with single binary covariate
x_c <- rbinom(n, size = 1, prob = 0.7)
x_m <- cbind(1, x_c)

# Generate semicompeting data
dat_ID <- SemiCompRisks::simID(x1 = x_m, x2 = x_m, x3 = x_m,
                               beta1.true = beta1.true, 
                               beta2.true = beta2.true, 
                               beta3.true = beta3.true,
                               alpha1.true = alpha1.true, 
                               alpha2.true = alpha2.true, 
                               alpha3.true = alpha3.true,
                               kappa1.true = kappa1.true, 
                               kappa2.true = kappa2.true, 
                               kappa3.true = kappa3.true,
                               theta.true = theta.true,
                               cens = c(240, 360))

dat_ID$x_c <- x_c
colnames(dat_ID)[1:4] <- c("R", "delta_R", "tt", "delta_T")

              
              
formula01 <- Surv(time=rr,event=delta_R)~x_c
formula02 <- Surv(time=tt,event=delta_T)~x_c
formula12 <- Surv(time=difft,event=delta_T)~x_c       

csetraynor/rms documentation built on May 9, 2019, 10:40 a.m.