Description Usage Arguments Details Value Note Author(s) References See Also Examples
Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.
1 2 |
Formula |
a |
data |
a data.frame in which to interpret the variables named in |
model |
The specification of baseline survival distribution: "LN" or "DPM". |
hyperParams |
a list containing lists or vectors for hyperparameter values in hierarchical models. Components include,
|
startValues |
a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function |
mcmcParams |
a list containing variables required for MCMC sampling. Components include,
|
na.action |
how NAs are treated. See |
subset |
a specification of the rows to be used: defaults to all rows. See |
path |
the name of directory where the results are saved. |
We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}, T_{i2} denote time to non-terminal and terminal event from subject i=1,...,n. We propose to directly model the times of the events via the following AFT model specification:
\log(T_{i1}) = x_{i1}^\topβ_1 + γ_i + ε_{i1}, T_{i1} > 0,
\log(T_{i2}) = x_{i2}^\topβ_2 + γ_i + ε_{i2}, T_{i2} > 0,
\log(T_{i2} - T_{i1}) = x_{i3}^\topβ_3 + γ_i + ε_{i3}, T_{i2} > T_{i1},
where x_{ig} is a vector of transition-specific covariates, β_g is a corresponding vector of transition-specific regression parameters and ε_{ig} is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}. γ_i is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let L_{i} denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i was observed at follow-up times \{c_{i1},…, c_{im_i}\} and let c_i^* denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th} study participant satisfy c_{ij}≤q T_{i1}< c_{ij+1} for some j and c_{ik}≤q T_{i2}< c_{ik+1} for some k, respectively. Then the observed outcomes for the i^{th} study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}.
For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each ε_{ig}. More precisely, ε_{ig} is taken to be an independent draw from a mixture of M_g normal distributions with means and variances (μ_{gr}, σ_{gr}^2), for r \in \{1,…,M_g\}. Since the class-specific (μ_{gr}, σ_{gr}^2) are not known, they are taken to be draws from some common distribution, G_{g0}, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr} denote the probability of belonging to the r^{th} class for transition g and p_g = (p_{g1}, …, p_{gM_g}) the collection of such probabilities. Note, p_g is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n individuals across the M_g classes, p_g is assumed to follow a conjugate symmetric Dirichlet(τ_g/M_g,…,τ_g/M_g) distribution, where τ_g is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:
ε_{ig} | r_{i} \sim Normal(μ_{r_{i}}, σ_{r_{i}}^2),
(μ_{gr}, σ_{gr}^2) \sim G_{g0}, ~~for~ r=1,…,M_g,
r_{i}| p_g \sim Discrete(r_{i} | p_{g1},…,p_{gM_g}),
p_g \sim Dirichlet(τ_g/M_g, …, τ_g/M_g).
Letting M_g approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{τ_g}, b_{τ_g}) hyperprior for τ_g. For regression parameters, we adopt non-informative flat priors on the real line. For γ=\{γ_1, …, γ_n\}, we assume that each γ_i is an independent random draw from a Normal(0, θ) distribution. In the absence of prior knowledge on the variance component θ, we adopt a conjugate inverse-Gamma hyperprior, IG(a_θ, b_θ). Finally, We take the G_{g0} as a normal distribution centered at μ_{g0} with a variance σ_{g0}^2 for μ_{gr} and an IG(a_{σ_g}, b_{σ_g}) for σ_{gr}^2.
For the Bayesian parametric analysis, we build on the log-Normal formulation and take the ε_{ig} to follow independent Normal(μ_g, σ_g^2) distributions, g=1,2,3. For location parameters \{μ_1, μ_2, μ_3\}, we adopt non-informative flat priors on the real line. For \{σ_1^2, σ_2^2, σ_3^2\}, we adopt independent inverse Gamma distributions, denoted IG(a_{σ g}, b_{σ g}). For β_g, γ, and θ, we adopt the same priors as those adopted for the DPM model.
BayesID_AFT
returns an object of class Bayes_AFT
.
The posterior samples of γ are saved separately in working directory/path
.
For a dataset with large n, nGam_save
should be carefully specified considering the system memory and the storage capacity.
Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>
Lee, K. H., Rondeau, V., and Haneuse, S. (2017),
Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.
Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019),
SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.
initiate.startValues_AFT
, print.Bayes_AFT
, summary.Bayes_AFT
, predict.Bayes_AFT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | ## Not run:
# loading a data set
data(scrData)
scrData$y1L <- scrData$y1U <- scrData[,1]
scrData$y1U[which(scrData[,2] == 0)] <- Inf
scrData$y2L <- scrData$y2U <- scrData[,3]
scrData$y2U[which(scrData[,4] == 0)] <- Inf
scrData$LT <- rep(0, dim(scrData)[1])
form <- Formula(LT | y1L + y1U | y2L + y2U ~ x1 + x2 + x3 | x1 + x2 | x1 + x2)
#####################
## Hyperparameters ##
#####################
## Subject-specific random effects variance component
##
theta.ab <- c(0.5, 0.05)
## log-Normal model
##
LN.ab1 <- c(0.3, 0.3)
LN.ab2 <- c(0.3, 0.3)
LN.ab3 <- c(0.3, 0.3)
## DPM model
##
DPM.mu1 <- log(12)
DPM.mu2 <- log(12)
DPM.mu3 <- log(12)
DPM.sigSq1 <- 100
DPM.sigSq2 <- 100
DPM.sigSq3 <- 100
DPM.ab1 <- c(2, 1)
DPM.ab2 <- c(2, 1)
DPM.ab3 <- c(2, 1)
Tau.ab1 <- c(1.5, 0.0125)
Tau.ab2 <- c(1.5, 0.0125)
Tau.ab3 <- c(1.5, 0.0125)
##
hyperParams <- list(theta=theta.ab,
LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3),
DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1,
DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2,
DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3))
###################
## MCMC SETTINGS ##
###################
## Setting for the overall run
##
numReps <- 300
thin <- 3
burninPerc <- 0.5
## Setting for storage
##
nGam_save <- 10
nY1_save <- 10
nY2_save <- 10
nY1.NA_save <- 10
## Tuning parameters for specific updates
##
## - those common to all models
betag.prop.var <- c(0.01,0.01,0.01)
mug.prop.var <- c(0.1,0.1,0.1)
zetag.prop.var <- c(0.1,0.1,0.1)
gamma.prop.var <- 0.01
##
mcmcParams <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save),
tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var,
zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var))
#################################################################
## Analysis of Independent Semi-competing risks data ############
#################################################################
###############
## logNormal ##
###############
##
myModel <- "LN"
myPath <- "Output/01-Results-LN/"
startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)
##
fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)
fit_LN
summ.fit_LN <- summary(fit_LN); names(summ.fit_LN)
summ.fit_LN
pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_LN, plot.est="Haz")
plot(pred_LN, plot.est="Surv")
#########
## DPM ##
#########
##
myModel <- "DPM"
myPath <- "Output/02-Results-DPM/"
startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)
##
fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)
fit_DPM
summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM)
summ.fit_DPM
pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_DPM, plot.est="Haz")
plot(pred_DPM, plot.est="Surv")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.