Description Usage Arguments Details Value Source Examples
Use Dirichlet process mixture/dependent Dirichlet process Weibull model for survival data with/without competing risks. When regression covariates are present, the model is a dependent Dirichlet process model. For competing risks data we only consider two potential causes of events and the user can combine events of secondary interests. In competing risks regression model, the estimates provided focus on the primary cause (cause 1), and the user can switch the event indicator to get the estimates for the secondary cause.
1 2 3 4 5  dpweib(formula,data, high.pct = NULL, predtime = NULL, comp = FALSE,
alpha = 0.05, simultaneous = FALSE, burnin = 8000, iteration = 2000,
alpha00 = 1.354028, alpha0 = 0.03501257, lambda00 = 7.181247,
alphaalpha = 0.2, alphalambda = 0.1, a = 1, b = 1, gamma0 = 1,
gamma1 = 1, thin = 10, betasl = 2.5, addgroup = 2)

formula 
A formula written in regular y \sim x_1+x_2+ … +x_p regression format. y is a Surv object for survival data (including interval censored data) and Hist object for competing risks data. The regression covaraites can be continuous or factors. Since the model is flexible enough, interaction terms are not necessary. 
data 
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which dpweib is called. 
high.pct 
The estimated high percentile (95th) percentile of the datagenerating distribution of the average population given by the user. If the user does not provide this value, we will look into the data. If there is no censoring, we take the 95th percentile of the observed data. If censoring takes less than 15% of the total observations, we use the maximum of the observed time. If the censoring takes more than 15%, we suggest a scaling parameter by first finding the time t corresponding to the observed survival rate at the end of study from the plot of the median of the components (survmedian) generated by our LIO prior on a 0 to 10 scale, then set the scaling parameter to be the largest observation time multiplied by 10/t. 
predtime 
A vector given by the user to specify the time points where the inferences will be made. If the user does not provide it, we will take 40 time points located evenly from the beginning to the high.pct. 
comp 
A logical value indicating whether this is competing risks data or not. The default is FALSE. 
alpha 
1α is the probability for constructing credible intervals. The default α is 0.05. 
simultaneous 
A logical value indicating whether to provide simultaneous credible intervals. The default is FALSE. 
burnin 
Number of burnin iterations. The default is 5000. 
iteration 
Number of iterations. The default is 5000. 
alpha00 
Parameter for the base distribution of λ in noncompeting risks data model and λ_1, λ_2 in competing risks data model. The default is 1.354028. 
alpha0 
Parameter for the base distribution of λ in noncompeting risks data model and λ_1, λ_2 in competing risks data model. The default is 0.03501257. 
lambda00 
Parameter for the base distribution of λ in noncompeting risks data model and λ_1, λ_2 in competing risks data model. The default is 7.181247. 
alphaalpha 
Parameter for the base distribution of α in noncompeting risks data model and α_1, α_2 in competing risks data model. The default is 0.2. 
alphalambda 
Parameter for the base distribution of α in noncompeting risks data model and α_1, α_2 in competing risks data model. THe default value is 0.1. 
a 
Parameter for the gamma prior of the concentration parameter of DP. The default is 1. 
b 
Parameter for the gamma prior of the concentration parameter of DP. The default is 1. 
gamma0 
Parameter for the base distribution of p in competing risks data model. The default value is 1. 
gamma1 
Parameter for the base distribution of p in competing risks data model. The default value is 1. 
thin 
Thinning. The default value is 10. 
betasl 
Parameter for the base distribution of the regression coefficients β in noncompeting risks data model and β_1 and β_2 in competing risks data model. The default value is 2.5. 
addgroup 
Number of new parameters proposed for each cluster assignment. The default is 2 (suggested by Neal). 
For no regression, no competing risks data, the function dpweib implements dirichlet process Weibull mixture model. The basic form of model is the following.
\begin{array}{rl} y_iα_i,λ_i&\sim Weib(t_iα_i,λ_i),\quad i=1,...,n\\ (α_i,λ_i)G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λα_0,λ_0) I_{(f(λ),∞)}(α) Ga(α_{α},λ_{α})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}
wheref(λ)=max(0,\log\{\log(20)/λ\}/\log(25)).
For regression data without competing risks, the method is a mixture of Cox model.
\begin{array}{rl} y_iα_i,λ_i,\boldsymbol{β_i}, \mathbf{Z_i}&\sim Weib(y_iα_i,λ_i\exp(\mathbf{Z_i^T}\boldsymbol{β_i})),\quad i=1,...,n\\ (α_i,λ_i,\boldsymbol{β_i})G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λα_0,λ_0) I_{(f(λ),u)}(α) Ga(α_{α},λ_{α}) q(\boldsymbol{β})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}
The density function corresponding to this Weibull notation is p(y_iα_i,λ_i)=λ_iα_i y_i^{α_i1}e^{λ_i y_i^{α_i}},\quad y_i>0,\quad α_i>0,\quad λ_i>0. [x]=Ga(α,λ) denotes that the density function of x is \displaystyle\frac{λ^{α}}{Γ(α)}x^{α1}e^{λ x}, α>0, λ>0, x>0. q(β) is the base distribution for regression coefficients.The details of the choice of base distribution is described in our coming paper.
In competing risks data, the likelihood for each individual can be written as
L=\{f_1(t_i)\}^{I(c_i=1)}\{f_2(t_i)\}^{I(c_i=2)}\{1F_1(t_i)F_2(t_i)\}^{I(c_i=0)},
where f_1(\cdot) and f_2(\cdot) are the causespecific density functions for cause 1 and 2 and survival function for the ith observation can be expressed as 1F_1(t_i)F_2(t_i). In order to model it, we introduce a parameter p, which is the cumulative incidence function of primary cause at ∞, p=F_1(∞). The likelihood can be written as
L=\{pd_1(t_i)\}^{I(c_i=1)}\{(1p)d_2(t_i)\}^{I(c_i=2)}\{1pD_1(t_i)(1p)D_2(t_i)\}^{I(c_i=0)} .
Here the D_{1}, D_{2}, d_{1}, d_{2} are the normalized baseline cumulative incidence functions and causespecific density functions and are modeled with Weibull mixtures as above, while p is the normalizing parameter for the baseline distribution. When regression covariates are present in a competing risks data, we modify the above likelihood with respect to the value of covaraites, such that
F_1(t\mathbf{Z},\boldsymbol{β_1},p) = 1(1pD_{01}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}.
The causespecific density function for cause 1 is
f_1(t\mathbf{Z},\boldsymbol{β_1},p)=\exp(\mathbf{Z^T}\boldsymbol{β_1})[1pD_{01}(t)]^{\exp(\mathbf{Z^T}\boldsymbol{β_1})1}pd_{01}(t).
The model for the secondary cause is defined as
F_2(t\mathbf{Z},\boldsymbol{β_1},\boldsymbol{β_2},p)=(1p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})} (1(1D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})}),
which leads to the causespecific subdensity function for cause 2 as
f_2(t\mathbf{Z},\boldsymbol{β_2},p)=(1p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}(1D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})1}\exp(\mathbf{Z^T}\boldsymbol{β_2})d_{02}(t).
This function can generate 4 different kinds of output based on the data set given. They all share,
c 
a vector, the cluster assignment in the last iteration, useful for the resumption of MCMC iteration 
nm 
a vector, the number of observations in each cluster from the last iteration, useful for the resumption of MCMC iteration 
emptybasket 
only useful for the resumption of MCMC iteration 
allbaskets 
only useful for the resumption of MCMC iteration 
ngrp 
a vector, the number of clusters in each iteration, useful for the resumption of MCMC iteration 
predtime 
the time points where the inferences are made 
high.pct 
the scaling parameter of observations used in the model 
usertime 
a logic value, whether user provides time for estimation or not 
1α is the probability for constructing credible intervals.
simultaneous 
Whether give simultaneous credible intervals. 
For noncompeting risks data, dpweib can generate two classes of output, dpm and ddp, for data with and without covariates separately. They both have
alpharec 
a matrix, saved samples of αs, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdarec 
a matrix, saved samples of λs, the rows correspond to the iterations saved, the columns correspond to the observations 
lambda0rec 
a matrix, saved samples of λ_0s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdascaled 
a matrix, saved samples of λs under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration 
tl 
the left end point 
tr 
the right end point 
pi 
right censoring indicator 
delta 
exact observation indicator 
For dpm output, it has
S 
a matrix, the estimated survival function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
Spred 
a vector, the estimated survival function at specified time points 
Spredu 
a vector, the estimated pointwise upper credible interval for survival function at specified time points 
Spredl 
a vector, the estimated pointwise lower credible interval for survival function at specified time points 
d 
a matrix, the estimated density function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
dpred 
a vector, the estimated density function at specified time points 
dpredu 
a vector, the estimated pointwise upper credible interval for density function at specified time points 
dpredl 
a vector, the estimated pointwise lower credible interval for density function at specified time points 
h 
a matrix, the estimated hazard function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
hpred 
a vector, the estimated hazard function at specified time points 
hpredu 
a vector, the estimated pointwise upper credible interval for hazard function at specified time points 
hpredl 
a vector, the estimated pointwise lower credible interval for hazard function at specified time points 
When simultaneous is specified TRUE, the function also provides
Sbandu 
a vector, the estimated simultaneous upper credible interval for survival function at specified time points 
Sbandl 
a vector, the estimated simultaneous lower credible interval for survival function at specified time points 
dbandu 
a vector, the estimated simultaneous upper credible interval for density function at specified time points 
dbandl 
a vector, the estimated simultaneous lower credible interval for density function at specified time points 
hbandu 
a vector, the estimated simultaneous upper credible interval for hazard function at specified time points 
hbandl 
a vector, the estimated simultaneous lower credible interval for hazard function at specified time points 
For ddp output, it also has
betarec 
a matrix, saved samples of βs, which is consist of horizontalmerged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. 
x 
the covariate matrix 
xmean 
a vector, the mean for each covariate(including created binary dummy covariates) 
xsd 
a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5.(including created binary dummy covariates) 
xscale 
The matrix used to scale log hazard ratio 
loghr 
a matrix, the estimated log hazard ratio for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
loghr.est 
a vector, the estimated log hazard ratio at specified time points 
loghru 
a vector, the estimated pointwise upper credible interval for log hazard ratio at specified time points 
loghrl 
a vector, the estimated pointwise lower credible interval for log hazard ratio at specified time points 
indicator 
a vector, whether a covariate is binary 
covnames 
a vector, the names of covariates 
When simultaneous is specified TRUE, the function also provides
loghrbandu 
a vector, the estimated simultaneous upper credible interval for log hazard ratio at specified time points 
loghrbandl 
a vector, the estimated simultaneous lower credible interval for log hazard ratio at specified time points 
For competing risks data, dpweib can generate two classes of output, dpmcomp and ddpcomp, for data with and without covariate separately. They both have
alpharec1 
a matrix, saved samples of α_1s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdarec1 
a matrix, saved samples of λ_1s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambda0rec1 
a matrix, saved samples of λ_{01}s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdascaled1 
a matrix, saved samples of λ_1s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration 
alpharec2 
a matrix, saved samples of α_2s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdarec2 
a matrix, saved samples of λ_2s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambda0rec2 
a matrix, saved samples of λ_{02}s, the rows correspond to the iterations saved, the columns correspond to the observations 
lambdascaled2 
a matrix, saved samples of λ_2s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration 
prec 
a matrix, saved samples of p, the rows correspond to the iterations saved, the columns correspond to the observations 
t 
the observed time 
event 
the event indicator 
For dpmcomp output, it has
CIF1 
a matrix, the estimated cumulative incidence function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
CIF1.est 
a vector, the estimated cumulative incidence function of cause 1 at specified time points 
CIF1u 
a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 1 at specified time points 
CIF1l 
a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 1 at specified time points 
d1 
a matrix, the estimated causespecific density function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
d1.est 
a vector, the estimated causespecific density function of cause 1 at specified time points 
d1u 
a vector, the estimated pointwise upper credible interval for causespecific density function of cause 1 at specified time points 
d1l 
a vector, the estimated pointwise lower credible interval for causespecific density function of cause 1 at specified time points 
h1 
a matrix, the estimated subdistribution hazard function for cause 1 at specified time points, the columns correspond to time points, the rows correspond to saved iterations 
h1.est 
a vector, the estimated subdistribution hazard function of cause 1 at specified time points 
h1u 
a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 1 at specified time points 
h1l 
a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 1 at specified time points 
CIF2 
a matrix, the estimated cumulative incidence function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
CIF2.est 
a vector, the estimated cumulative incidence function of cause 2 at specified time points 
CIF2u 
a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 2 at specified time points 
CIF2l 
a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 2 at specified time points 
d2 
a matrix, the estimated causespecific density function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
d2.est 
a vector, the estimated causespecific density function of cause 2 at specified time points 
d2u 
a vector, the estimated pointwise upper credible interval for causespecific density function of cause 2 at specified time points 
d2l 
a vector, the estimated pointwise lower credible interval for causespecific density function of cause 2 at specified time points 
h2 
a matrix, the estimated subdistribution hazard function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations 
h2.est 
a vector, the estimated subdistribution hazard function of cause 2 at specified time points 
h2u 
a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 2 at specified time points 
h2l 
a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 2 at specified time points 
When simultaneous is specified TRUE, the function also provides
CIF1bandu 
a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 1 at specified time points 
CIF1bandl 
a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 1 at specified time points 
d1bandu 
a vector, the estimated simultaneous upper credible interval for causespecific density function of cause 1 at specified time points 
d1bandl 
a vector, the estimated simultaneous lower credible interval for causespecific density function of cause 1 at specified time points 
h1bandu 
a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 1 at specified time points 
h1bandl 
a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 1 at specified time points 
CIF2bandu 
a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 2 at specified time points 
CIF2bandl 
a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 2 at specified time points 
d2bandu 
a vector, the estimated simultaneous upper credible interval for causespecific density function of cause 2 at specified time points 
d2bandl 
a vector, the estimated simultaneous lower credible interval for causespecific density function of cause 2 at specified time points 
h2bandu 
a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 2 at specified time points 
h2bandl 
a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 2 at specified time points 
For ddpcomp output, it also has
betarec1 
a matrix, saved samples of β_1s, which is consist of horizontalmerged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. 
betarec2 
a matrix, saved samples of β_2s, which is consist of horizontalmerged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. 
xmean 
a vector, the mean for each covariate(including created dummy covariates) 
xsd 
a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5(including created dummy covariates). 
x 
the covariate matrix 
xscale 
The matrix used to scale log hazard ratio 
covnames 
a vector, the names of covariates 
loghr.est 
the estimated log subdistribution hazard ratio at specified time points for cause 1 
loghru 
the estimated pointwise upper credible interval for log subdistribution hazard ratio at specified time points for cause 1 
loghrl 
the estimated pointwise lower credible interval for log subdistribution hazard ratio at specified time points for cause 1 
indicator 
a vector, whether a covariate is binary 
When simultaneous is specified TRUE, the function also provides
loghrbandu 
a vector, the estimated simultaneous upper credible interval for log subdistribution hazard ratio at specified time points 
loghrbandl 
a vector, the estimated simultaneous lower credible interval for log subdistribution hazard ratio at specified time points 
Gilks,W.R. and Best,N.G. and Tan,K.K.C. (1995) Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics, 455472 doi:10.2307/2986138
Neal,R.M (2000) Markov chain sampling methods for Dirichlet process mixture models,Journal of computational and graphical statistics, 9, Num 2, 249265 doi: 10.1080/10618600.2000.10474879
Kottas,A. (2006) Nonparametric Bayesian survival analysis using mixtures of Weibull distributions, Journal of Statistical Planning and Inference, 136, Num 3, 578596 doi: 10.1016/j.jspi.2004.08.009
Shi, Y. Martens, M., Banerjee, A. and Laud, P. (2019) Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models. Bayesian Analysis 14, Num 3, 677702. doi:10.1214/18BA1119. https://projecteuclid.org/euclid.ba/1560240023
Shi,Y. and Laud,P. and Neuner,J (2021) A Dependent Dirichlet Process Model for Survival Data With Competing Risks Lifetime Data Analysis 27, 156176. https://doi.org/10.1007/s10985020095060
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95  ## Not run:
library(survival)
library(DPWeibull)
data(veteran)
DPresult1<dpweib(Surv(time,status)~1,data=veteran)
summary(DPresult1)
opar<par(mfrow=c(1,3),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPresult1)
par(opar)
DPresult2<dpweib(Surv(time,status)~factor(trt)+age,data=veteran)
summary(DPresult2)
opar<par(mfrow=c(1,2),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPresult2)
par(opar)
newdata<NULL
newdata$trt<veteran$trt[c(1,70)]
newdata$age<veteran$age[c(2,87)]
newdata<data.frame(newdata)
DPpredict<predict(DPresult2,newdata)
summary(DPpredict)
opar<par(mfrow=c(2,3),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPpredict)
par(opar)
############################################################################
# Competing Risks Data
# Competing Risks Data
library(survival)
library(prodlim)
library(riskRegression)
library(DPWeibull)
data(Paquid)
Paquid<Paquid[1:500,]
DPresult1<dpweib(Hist(time, status)~1,data=Paquid,
predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200))
opar<par(mfrow=c(1,3),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPresult1)
par(opar)
DPresult2<continue(DPresult1,simultaneous=TRUE)
summary(DPresult2)
DPresult3<dpweib(Hist(time, status)~DSST+MMSE,data=Paquid,
predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200))
summary(DPresult3)
opar<par(mfrow=c(1,2),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPresult3)
par(opar)
newdata<NULL
newdata$DSST<Paquid$DSST[c(1,70)]
newdata$MMSE<Paquid$MMSE[c(2,87)]
newdata<data.frame(newdata)
DPpredict<predict(DPresult3,newdata)
summary(DPpredict)
opar<par(mfrow=c(2,3),
mar=c(3.1, 3.1, 3.1, 5.1),
mgp=c(2, 0.5, 0),
oma=c(0, 0, 0, 4))
plot(DPpredict)
par(opar)
###############################################################
# An example of interval censored data
library(KMsurv)
library(survival)
library(DPWeibull)
data("bcdeter")
DPresult<dpweib(Surv(lower, upper, type="interval2") ~ treat, data = bcdeter)
summary(DPresult)
plot(DPresult)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.