# BayesID_AFT: The function to implement Bayesian parametric and... In SemiCompRisks: Hierarchical Models for Parametric and Semi-Parametric Analyses of Semi-Competing Risks Data

## Description

Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.

## Usage

 1 2 BayesID_AFT(Y, lin.pred, data, model = "LN", hyperParams, startValues, mcmcParams, path=NULL) 

## Arguments

 Y a data.frame containing (interval-censored and/or left-truncated) semi-competing risks outcomes from n subjects. It is of dimension n\times 5: the columns correspond to c_{j}, c_{j+1}, c_{k}, c_{k+1}, L. See Details and Examples below. lin.pred a list containing three formula objects whose right-hand side specifies the covariate terms for the transition g=1,2,3. data a data.frame in which to interpret the variables named in the formulas in lin.pred. model The specification of baseline survival distribution: "LN" or "DPM". hyperParams a list containing lists or vectors for hyperparameter values in hierarchical models. Components include, theta (a numeric vector for hyperparameter in the prior of subject-specific frailty variance component), LN (a list containing numeric vectors for log-Normal hyperparameters: LN.ab1, LN.ab2, LN.ab3), DPM (a list containing numeric vectors for DPM hyperparameters: DPM.mu1, DPM.mu2, DPM.mu3, DPM.sigSq1, DPM.sigSq2, DPM.sigSq3, DPM.ab1, DPM.ab2, DPM.ab3, Tau.ab1, Tau.ab2, Tau.ab3). See Details and Examples below. startValues a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function initiate.startValues_AFT. mcmcParams a list containing variables required for MCMC sampling. Components include, run (a list containing numeric values for setting for the overall run: numReps, total number of scans; thin, extent of thinning; burninPerc, the proportion of burn-in). storage (a list containing numeric values for storing posterior samples for subject- and cluster-specific random effects: nGam_save, the number of γ to be stored; nY1_save, the number of y1 to be stored; nY2_save, the number of y2 to be stored; nY1.NA_save, the number of y1.NA to be stored). tuning (a list containing numeric values relevant to tuning parameters for specific updates in Metropolis-Hastings (MH) algorithm: betag.prop.var, the variance of proposal density for β_g; mug.prop.var, the variance of proposal density for μ_{g}; zetag.prop.var, the variance of proposal density for 1/σ_g^2; gamma.prop.var, the variance of proposal density for γ). See Details and Examples below. path the name of directory where the results are saved.

## Details

We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}, T_{i2} denote time to non-terminal and terminal event from subject i=1,...,n. We propose to directly model the times of the events via the following AFT model specification:

\log(T_{i1}) = x_{i1}^\topβ_1 + γ_i + ε_{i1}, T_{i1} > 0,

\log(T_{i2}) = x_{i2}^\topβ_2 + γ_i + ε_{i2}, T_{i2} > 0,

\log(T_{i2} - T_{i1}) = x_{i3}^\topβ_3 + γ_i + ε_{i3}, T_{i2} > T_{i1},

where x_{ig} is a vector of transition-specific covariates, β_g is a corresponding vector of transition-specific regression parameters and ε_{ig} is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}. γ_i is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let L_{i} denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i was observed at follow-up times \{c_{i1},…, c_{im_i}\} and let c_i^* denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th} study participant satisfy c_{ij}≤q T_{i1}< c_{ij+1} for some j and c_{ik}≤q T_{i2}< c_{ik+1} for some k, respectively. Then the observed outcomes for the i^{th} study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}.

For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each ε_{ig}. More precisely, ε_{ig} is taken to be an independent draw from a mixture of M_g normal distributions with means and variances (μ_{gr}, σ_{gr}^2), for r \in \{1,…,M_g\}. Since the class-specific (μ_{gr}, σ_{gr}^2) are not known, they are taken to be draws from some common distribution, G_{g0}, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr} denote the probability of belonging to the r^{th} class for transition g and p_g = (p_{g1}, …, p_{gM_g}) the collection of such probabilities. Note, p_g is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n individuals across the M_g classes, p_g is assumed to follow a conjugate symmetric Dirichlet(τ_g/M_g,…,τ_g/M_g) distribution, where τ_g is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:

ε_{ig} | r_{i} \sim Normal(μ_{r_{i}}, σ_{r_{i}}^2),

(μ_{gr}, σ_{gr}^2) \sim G_{g0}, ~~for~ r=1,…,M_g,

r_{i}| p_g \sim Discrete(r_{i} | p_{g1},…,p_{gM_g}),

p_g \sim Dirichlet(τ_g/M_g, …, τ_g/M_g).

Letting M_g approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{τ_g}, b_{τ_g}) hyperprior for τ_g. For regression parameters, we adopt non-informative flat priors on the real line. For γ=\{γ_1, …, γ_n\}, we assume that each γ_i is an independent random draw from a Normal(0, θ) distribution. In the absence of prior knowledge on the variance component θ, we adopt a conjugate inverse-Gamma hyperprior, IG(a_θ, b_θ). Finally, We take the G_{g0} as a normal distribution centered at μ_{g0} with a variance σ_{g0}^2 for μ_{gr} and an IG(a_{σ_g}, b_{σ_g}) for σ_{gr}^2.

For the Bayesian parametric analysis, we build on the log-Normal formulation and take the ε_{ig} to follow independent Normal(μ_g, σ_g^2) distributions, g=1,2,3. For location parameters \{μ_1, μ_2, μ_3\}, we adopt non-informative flat priors on the real line. For \{σ_1^2, σ_2^2, σ_3^2\}, we adopt independent inverse Gamma distributions, denoted IG(a_{σ g}, b_{σ g}). For β_g, γ, and θ, we adopt the same priors as those adopted for the DPM model.

## Value

BayesID_AFT returns an object of class Bayes_AFT.

## Note

The posterior samples of γ are saved separately in working directory/path. For a dataset with large n, nGam_save should be carefully specified considering the system memory and the storage capacity.

## Author(s)

Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <[email protected]>

## References

Lee, K. H., Rondeau, V., and Haneuse, S. (2017), Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.

initiate.startValues_AFT, print.Bayes_AFT, summary.Bayes_AFT, plot.Bayes_AFT
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 ## Not run: # loading a data set data(scrData) Y <- matrix(NA, dim(scrData)[1], 5) Y[,1] <- Y[,2] <- scrData[,1] Y[,3] <- Y[,4] <- scrData[,3] Y[which(scrData[,2] == 0),2] <- Inf Y[which(scrData[,4] == 0),4] <- Inf Y[,5] <- rep(0, dim(scrData)[1]) form1 <- as.formula( ~ x1 + x2 + x3) form2 <- as.formula( ~ x1 + x2) form3 <- as.formula( ~ x1 + x2) lin.pred <- list(form1, form2, form3) ##################### ## Hyperparameters ## ##################### ## Subject-specific random effects variance component ## theta.ab <- c(0.5, 0.05) ## log-Normal model ## LN.ab1 <- c(0.3, 0.3) LN.ab2 <- c(0.3, 0.3) LN.ab3 <- c(0.3, 0.3) ## DPM model ## DPM.mu1 <- log(12) DPM.mu2 <- log(12) DPM.mu3 <- log(12) DPM.sigSq1 <- 100 DPM.sigSq2 <- 100 DPM.sigSq3 <- 100 DPM.ab1 <- c(2, 1) DPM.ab2 <- c(2, 1) DPM.ab3 <- c(2, 1) Tau.ab1 <- c(1.5, 0.0125) Tau.ab2 <- c(1.5, 0.0125) Tau.ab3 <- c(1.5, 0.0125) ## hyperParams <- list(theta=theta.ab, LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3), DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1, DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2, DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3)) ################### ## MCMC SETTINGS ## ################### ## Setting for the overall run ## numReps <- 300 thin <- 3 burninPerc <- 0.5 ## Setting for storage ## nGam_save <- 10 nY1_save <- 10 nY2_save <- 10 nY1.NA_save <- 10 ## Tuning parameters for specific updates ## ## - those common to all models betag.prop.var <- c(0.01,0.01,0.01) mug.prop.var <- c(0.1,0.1,0.1) zetag.prop.var <- c(0.1,0.1,0.1) gamma.prop.var <- 0.01 ## mcmcParams <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc), storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save), tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var, zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var)) ################################################################# ## Analysis of Independent Semi-competing risks data ############ ################################################################# ############### ## logNormal ## ############### ## myModel <- "LN" myPath <- "Output/01-Results-LN/" startValues <- vector("list", 2) startValues[[1]] <- initiate.startValues_AFT(Y, lin.pred, scrData, model=myModel) startValues[[2]] <- initiate.startValues_AFT(Y, lin.pred, scrData, model=myModel, theta = 0.20) ## fit_LN <- BayesID_AFT(Y, lin.pred, scrData, model=myModel, hyperParams, startValues, mcmcParams, path=myPath) fit_LN summ.fit_LN <- summary(fit_LN); names(summ.fit_LN) summ.fit_LN plot(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5)) plot(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5), plot.est = "BH") names(fit_LN.plot <- plot(fit_LN, time = seq(0, 35, 1), tseq=seq(0, 30, 5), plot=FALSE)) ######### ## DPM ## ######### ## myModel <- "DPM" myPath <- "Output/02-Results-DPM/" startValues <- vector("list", 2) startValues[[1]] <- initiate.startValues_AFT(Y, lin.pred, scrData, model=myModel) startValues[[2]] <- initiate.startValues_AFT(Y, lin.pred, scrData, model=myModel, theta = 0.23) ## fit_DPM <- BayesID_AFT(Y, lin.pred, scrData, model=myModel, hyperParams, startValues, mcmcParams, path=myPath) fit_DPM summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM) summ.fit_DPM plot(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5)) plot(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5), plot.est = "BH") names(fit_DPM.plot <- plot(fit_DPM, time = seq(0, 35, 1), tseq=seq(0, 30, 5), plot=FALSE)) ## End(Not run)