BayesID_AFT: The function to implement Bayesian parametric and...
In SemiCompRisks: Hierarchical Models for Parametric and Semi-Parametric Analyses of Semi-Competing Risks Data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.

1 2	BayesID_AFT(Formula, data, model = "LN", hyperParams, startValues, mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)

`Formula`	a `Formula` object, with the outcomes on the left of a \sim, and covariates on the right. It is of the form, left truncation time \| interval- (or right-) censored time for non-terminal event \| interval-(or right-) censored time for terminal event \sim covariates for h_1 \| covariates for h_2 \| covariates for h_3: i.e., L \| y_{1L}+y_{1U} \| y_{2L}+y_{2U} ~ x_1 \| x_2 \| x_3.
`data`	a data.frame in which to interpret the variables named in `Formula`.
`model`	The specification of baseline survival distribution: "LN" or "DPM".
`hyperParams`	a list containing lists or vectors for hyperparameter values in hierarchical models. Components include, `theta` (a numeric vector for hyperparameter in the prior of subject-specific frailty variance component), `LN` (a list containing numeric vectors for log-Normal hyperparameters: `LN.ab1`, `LN.ab2`, `LN.ab3`), `DPM` (a list containing numeric vectors for DPM hyperparameters: `DPM.mu1`, `DPM.mu2`, `DPM.mu3`, `DPM.sigSq1`, `DPM.sigSq2`, `DPM.sigSq3`, `DPM.ab1`, `DPM.ab2`, `DPM.ab3`, `Tau.ab1`, `Tau.ab2`, `Tau.ab3`). See Details and Examples below.
`startValues`	a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function `initiate.startValues_AFT`.
`mcmcParams`	a list containing variables required for MCMC sampling. Components include, `run` (a list containing numeric values for setting for the overall run: `numReps`, total number of scans; `thin`, extent of thinning; `burninPerc`, the proportion of burn-in). `storage` (a list containing numeric values for storing posterior samples for subject- and cluster-specific random effects: `nGam_save`, the number of γ to be stored; `nY1_save`, the number of y1 to be stored; `nY2_save`, the number of y2 to be stored; `nY1.NA_save`, the number of y1.NA to be stored). `tuning` (a list containing numeric values relevant to tuning parameters for specific updates in Metropolis-Hastings (MH) algorithm: `betag.prop.var`, the variance of proposal density for β_g; `mug.prop.var`, the variance of proposal density for μ_{g}; `zetag.prop.var`, the variance of proposal density for 1/σ_g^2; `gamma.prop.var`, the variance of proposal density for γ). See Details and Examples below.
`na.action`	how NAs are treated. See `model.frame`.
`subset`	a specification of the rows to be used: defaults to all rows. See `model.frame`.
`path`	the name of directory where the results are saved.

We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}, T_{i2} denote time to non-terminal and terminal event from subject i=1,...,n. We propose to directly model the times of the events via the following AFT model specification:

\log(T_{i1}) = x_{i1}^\topβ_1 + γ_i + ε_{i1}, T_{i1} > 0,

\log(T_{i2}) = x_{i2}^\topβ_2 + γ_i + ε_{i2}, T_{i2} > 0,

\log(T_{i2} - T_{i1}) = x_{i3}^\topβ_3 + γ_i + ε_{i3}, T_{i2} > T_{i1},

where x_{ig} is a vector of transition-specific covariates, β_g is a corresponding vector of transition-specific regression parameters and ε_{ig} is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}. γ_i is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let L_{i} denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i was observed at follow-up times \{c_{i1},…, c_{im_i}\} and let c_i^* denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th} study participant satisfy c_{ij}≤q T_{i1}< c_{ij+1} for some j and c_{ik}≤q T_{i2}< c_{ik+1} for some k, respectively. Then the observed outcomes for the i^{th} study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}.

For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each ε_{ig}. More precisely, ε_{ig} is taken to be an independent draw from a mixture of M_g normal distributions with means and variances (μ_{gr}, σ_{gr}^2), for r \in \{1,…,M_g\}. Since the class-specific (μ_{gr}, σ_{gr}^2) are not known, they are taken to be draws from some common distribution, G_{g0}, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr} denote the probability of belonging to the r^{th} class for transition g and p_g = (p_{g1}, …, p_{gM_g}) the collection of such probabilities. Note, p_g is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n individuals across the M_g classes, p_g is assumed to follow a conjugate symmetric Dirichlet(τ_g/M_g,…,τ_g/M_g) distribution, where τ_g is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:

ε_{ig} | r_{i} \sim Normal(μ_{r_{i}}, σ_{r_{i}}^2),

(μ_{gr}, σ_{gr}^2) \sim G_{g0}, ~~for~ r=1,…,M_g,

r_{i}| p_g \sim Discrete(r_{i} | p_{g1},…,p_{gM_g}),

p_g \sim Dirichlet(τ_g/M_g, …, τ_g/M_g).

Letting M_g approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{τ_g}, b_{τ_g}) hyperprior for τ_g. For regression parameters, we adopt non-informative flat priors on the real line. For γ=\{γ_1, …, γ_n\}, we assume that each γ_i is an independent random draw from a Normal(0, θ) distribution. In the absence of prior knowledge on the variance component θ, we adopt a conjugate inverse-Gamma hyperprior, IG(a_θ, b_θ). Finally, We take the G_{g0} as a normal distribution centered at μ_{g0} with a variance σ_{g0}^2 for μ_{gr} and an IG(a_{σ_g}, b_{σ_g}) for σ_{gr}^2.

For the Bayesian parametric analysis, we build on the log-Normal formulation and take the ε_{ig} to follow independent Normal(μ_g, σ_g^2) distributions, g=1,2,3. For location parameters \{μ_1, μ_2, μ_3\}, we adopt non-informative flat priors on the real line. For \{σ_1^2, σ_2^2, σ_3^2\}, we adopt independent inverse Gamma distributions, denoted IG(a_{σ g}, b_{σ g}). For β_g, γ, and θ, we adopt the same priors as those adopted for the DPM model.

BayesID_AFT returns an object of class Bayes_AFT.

The posterior samples of γ are saved separately in working directory/path. For a dataset with large n, nGam_save should be carefully specified considering the system memory and the storage capacity.

Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>

Lee, K. H., Rondeau, V., and Haneuse, S. (2017), Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.

Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019), SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.

initiate.startValues_AFT, print.Bayes_AFT, summary.Bayes_AFT, predict.Bayes_AFT

## Not run: 
# loading a data set
data(scrData)
scrData$y1L <- scrData$y1U <- scrData[,1]
scrData$y1U[which(scrData[,2] == 0)] <- Inf
scrData$y2L <- scrData$y2U <- scrData[,3]
scrData$y2U[which(scrData[,4] == 0)] <- Inf
scrData$LT <- rep(0, dim(scrData)[1])

form <- Formula(LT | y1L + y1U | y2L + y2U  ~ x1 + x2 + x3 | x1 + x2 | x1 + x2)

#####################
## Hyperparameters ##
#####################

## Subject-specific random effects variance component
##
theta.ab <- c(0.5, 0.05)

## log-Normal model
##
LN.ab1 <- c(0.3, 0.3)
LN.ab2 <- c(0.3, 0.3)
LN.ab3 <- c(0.3, 0.3)

## DPM model
##
DPM.mu1 <- log(12)
DPM.mu2 <- log(12)
DPM.mu3 <- log(12)

DPM.sigSq1 <- 100
DPM.sigSq2 <- 100
DPM.sigSq3 <- 100

DPM.ab1 <-  c(2, 1)
DPM.ab2 <-  c(2, 1)
DPM.ab3 <-  c(2, 1)

Tau.ab1 <- c(1.5, 0.0125)
Tau.ab2 <- c(1.5, 0.0125)
Tau.ab3 <- c(1.5, 0.0125)

##
hyperParams <- list(theta=theta.ab,
LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3),
DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1,
DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2,
DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3))

###################
## MCMC SETTINGS ##
###################

## Setting for the overall run
##
numReps    <- 300
thin       <- 3
burninPerc <- 0.5

## Setting for storage
##
nGam_save <- 10
nY1_save <- 10
nY2_save <- 10
nY1.NA_save <- 10

## Tuning parameters for specific updates
##
##  - those common to all models
betag.prop.var	<- c(0.01,0.01,0.01)
mug.prop.var	<- c(0.1,0.1,0.1)
zetag.prop.var	<- c(0.1,0.1,0.1)
gamma.prop.var	<- 0.01

##
mcmcParams	<- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save),
tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var,
zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var))

#################################################################
## Analysis of Independent Semi-competing risks data ############
#################################################################

###############
## logNormal ##
###############

##
myModel <- "LN"
myPath  <- "Output/01-Results-LN/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_LN
summ.fit_LN <- summary(fit_LN); names(summ.fit_LN)
summ.fit_LN
pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_LN, plot.est="Haz")
plot(pred_LN, plot.est="Surv")

#########
## DPM ##
#########

##
myModel <- "DPM"
myPath  <- "Output/02-Results-DPM/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_DPM
summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM)
summ.fit_DPM
pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_DPM, plot.est="Haz")
plot(pred_DPM, plot.est="Surv")

## End(Not run)