bayesCureRateModel-package | R Documentation |
A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2024) <doi:10.1007/s11749-024-00942-w>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of distributions. In both cases, user-defined families of distributions are allowed under some specific requirements. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.
The main function of the package is cure_rate_MC3
. See details for a brief description of the model.
Let \boldsymbol{y} = (y_1,\ldots,y_n)
denote the observed data, which correspond to time-to-event data or censoring times. Let also \boldsymbol{x}_i = (x_{i1},\ldots,x_{x_{ip}})'
denote the covariates for subject i
, i=1,\ldots,n
.
Assuming that the n
observations are independent, the observed likelihood is defined as
L=L({\boldsymbol \theta}; {\boldsymbol y}, {\boldsymbol x})=\prod_{i=1}^{n}f_P(y_i;{\boldsymbol\theta},{\boldsymbol x}_i)^{\delta_i}S_P(y_i;{\boldsymbol \theta},{\boldsymbol x}_i)^{1-\delta_i},
where \delta_i=1
if the i
-th observation corresponds to time-to-event while \delta_i=0
indicates censoring time. The parameter vector \boldsymbol\theta
is decomposed as
\boldsymbol\theta = (\boldsymbol\alpha', \boldsymbol\beta', \gamma,\lambda)
where
\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d)'\in\mathcal A
are the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted as F(\cdot,\boldsymbol\alpha)
and f(\cdot,\boldsymbol\alpha)
, respectively.
\boldsymbol\beta\in\mathbf R^{k}
are the regression coefficients with k
denoting the number of columns in the design matrix (it may include a constant term or not).
\gamma\in\mathbf R
\lambda > 0
.
The population survival and density functions are defined as
S_P(y;\boldsymbol\theta) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}F(y;\boldsymbol\alpha)^\lambda\right)^{-1/\gamma}
whereas,
f_P(y;\boldsymbol\theta)=-\frac{\partial S_P(y;\boldsymbol\theta)}{\partial y}.
Finally, the cure rate is affected through covariates and parameters as follows
p_0(\boldsymbol{x}_i;\boldsymbol{\theta}) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}\right)^{-1/\gamma}
where c = e^{e^{-1}}
.
The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case \alpha = (\alpha_1,\alpha_2)\in (0,\infty)^2
. Also considered is the Dagum distribution, which has three parameters (\alpha_1,\alpha_2,\alpha_3)\in(0,\infty)^3
. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form
f(y;\boldsymbol \alpha) = \alpha_5 f_{\mathcal G}(y;\alpha_1,\alpha_3) + (1-\alpha_5) f_{\mathcal G}(y;\alpha_2,\alpha_4),
where
f_\mathcal{G}(y;\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp\{-\beta y\}, y>0
denotes the density of the Gamma distribution with parameters \alpha > 0
(shape) and \beta > 0
(rate).
For the previous model, the parameter vector is
\boldsymbol\alpha = (\alpha_1,\alpha_2,\alpha_3,\alpha_4,\alpha_5)'\in\mathcal A
where \mathcal A = (0,\infty)^4\times (0,1)
.
More generally, one can fit a mixture of K>2
Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.
The binary vector \boldsymbol{I} = (I_1,\ldots,I_n)
contains the (latent) cure indicators, that is, I_i = 1
if the i
-th subject is susceptible and I_i = 0
if the i
-th subject is cured. \Delta_0
denotes the subset of \{1,\ldots,n\}
containing the censored subjects, whereas \Delta_1 = \Delta_0^c
is the (complementary) subset of uncensored subjects. The complete likelihood of the model is
L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\
\prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.
f_U
and S_U
denote the probability density and survival function of the susceptibles, respectively, that is
S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.
Index: This package was not yet installed at build time.
Panagiotis Papastamoulis and Fotios S. Milienos
Maintainer: Panagiotis Papastamoulis <papapast@yahoo.gr>
Papastamoulis and Milienos (2024). Bayesian inference and cure rate modeling for event history data. TEST doi: 10.1007/s11749-024-00942-w.
cure_rate_MC3
# TOY EXAMPLE (very small numbers... only for CRAN check purposes)
# simulate toy data
set.seed(10)
n = 4
# censoring indicators
stat = rbinom(n, size = 1, prob = 0.5)
# covariates
x <- matrix(rnorm(2*n), n, 2)
# observed response variable
y <- rexp(n)
# define a data frame with the response and the covariates
my_data_frame <- data.frame(y, stat, x1 = x[,1], x2 = x[,2])
# run a weibull model with default prior setup
# considering 2 heated chains
fit1 <- cure_rate_MC3(survival::Surv(y, stat) ~ x1 + x2,
data = my_data_frame,
promotion_time = list(distribution = 'weibull'),
nChains = 2,
nCores = 1,
mcmc_cycles = 3, sweep=2)
# print method
fit1
# summary method
summary1 <- summary(fit1)
# WARNING: the following parameters
# mcmc_cycles, nChains
# should take _larger_ values. E.g. a typical implementation consists of:
# mcmc_cycles = 15000, nChains = 12
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.