# dpweib: Dirichlet process mixture/Dependent Dirichlet process model... In DPWeibull: Dirichlet Process Weibull Mixture Model for Survival Data

## Description

Use Dirichlet process mixture/dependent Dirichlet process Weibull model for survival data with/without competing risks. When regression covariates are present, the model is a dependent Dirichlet process model. For competing risks data we only consider two potential causes of events and the user can combine events of secondary interests. In competing risks regression model, the estimates provided focus on the primary cause (cause 1), and the user can switch the event indicator to get the estimates for the secondary cause.

## Usage

 1 2 3 4 5 dpweib(formula,data, high.pct = NULL, predtime = NULL, comp = FALSE, alpha = 0.05, simultaneous = FALSE, burnin = 8000, iteration = 2000, alpha00 = 1.354028, alpha0 = 0.03501257, lambda00 = 7.181247, alphaalpha = 0.2, alphalambda = 0.1, a = 1, b = 1, gamma0 = 1, gamma1 = 1, thin = 10, betasl = 2.5, addgroup = 2) 

## Arguments

 formula A formula written in regular y \sim x_1+x_2+ … +x_p regression format. y is a Surv object for survival data (including interval censored data) and Hist object for competing risks data. The regression covaraites can be continuous or factors. Since the model is flexible enough, interaction terms are not necessary.
 data an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which dpweib is called. high.pct The estimated high percentile (95th) percentile of the data-generating distribution of the average population given by the user. If the user does not provide this value, we will look into the data. If there is no censoring, we take the 95th percentile of the observed data. If censoring takes less than 15% of the total observations, we use the maximum of the observed time. If the censoring takes more than 15%, we suggest a scaling parameter by first finding the time t corresponding to the observed survival rate at the end of study from the plot of the median of the components (survmedian) generated by our LIO prior on a 0 to 10 scale, then set the scaling parameter to be the largest observation time multiplied by 10/t. predtime A vector given by the user to specify the time points where the inferences will be made. If the user does not provide it, we will take 40 time points located evenly from the beginning to the high.pct. comp A logical value indicating whether this is competing risks data or not. The default is FALSE. alpha 1-α is the probability for constructing credible intervals. The default α is 0.05. simultaneous A logical value indicating whether to provide simultaneous credible intervals. The default is FALSE. burnin Number of burnin iterations. The default is 5000. iteration Number of iterations. The default is 5000. alpha00 Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 1.354028. alpha0 Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 0.03501257. lambda00 Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 7.181247. alphaalpha Parameter for the base distribution of α in non-competing risks data model and α_1, α_2 in competing risks data model. The default is 0.2. alphalambda Parameter for the base distribution of α in non-competing risks data model and α_1, α_2 in competing risks data model. THe default value is 0.1. a Parameter for the gamma prior of the concentration parameter of DP. The default is 1. b Parameter for the gamma prior of the concentration parameter of DP. The default is 1. gamma0 Parameter for the base distribution of p in competing risks data model. The default value is 1. gamma1 Parameter for the base distribution of p in competing risks data model. The default value is 1. thin Thinning. The default value is 10. betasl Parameter for the base distribution of the regression coefficients β in non-competing risks data model and β_1 and β_2 in competing risks data model. The default value is 2.5. addgroup Number of new parameters proposed for each cluster assignment. The default is 2 (suggested by Neal).

## Details

For no regression, no competing risks data, the function dpweib implements dirichlet process Weibull mixture model. The basic form of model is the following.

\begin{array}{rl} y_i|α_i,λ_i&\sim Weib(t_i|α_i,λ_i),\quad i=1,...,n\\ (α_i,λ_i)|G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λ|α_0,λ_0) I_{(f(λ),∞)}(α) Ga(α_{α},λ_{α})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}

wheref(λ)=max(0,\log\{\log(20)/λ\}/\log(25)).

For regression data without competing risks, the method is a mixture of Cox model.

\begin{array}{rl} y_i|α_i,λ_i,\boldsymbol{β_i}, \mathbf{Z_i}&\sim Weib(y_i|α_i,λ_i\exp(\mathbf{Z_i^T}\boldsymbol{β_i})),\quad i=1,...,n\\ (α_i,λ_i,\boldsymbol{β_i})|G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λ|α_0,λ_0) I_{(f(λ),u)}(α) Ga(α_{α},λ_{α}) q(\boldsymbol{β})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}

The density function corresponding to this Weibull notation is p(y_i|α_i,λ_i)=λ_iα_i y_i^{α_i-1}e^{-λ_i y_i^{α_i}},\quad y_i>0,\quad α_i>0,\quad λ_i>0. [x]=Ga(α,λ) denotes that the density function of x is \displaystyle\frac{λ^{α}}{Γ(α)}x^{α-1}e^{-λ x}, α>0, λ>0, x>0. q(β) is the base distribution for regression coefficients.The details of the choice of base distribution is described in our coming paper.

In competing risks data, the likelihood for each individual can be written as

L=\{f_1(t_i)\}^{I(c_i=1)}\{f_2(t_i)\}^{I(c_i=2)}\{1-F_1(t_i)-F_2(t_i)\}^{I(c_i=0)},

where f_1(\cdot) and f_2(\cdot) are the cause-specific density functions for cause 1 and 2 and survival function for the ith observation can be expressed as 1-F_1(t_i)-F_2(t_i). In order to model it, we introduce a parameter p, which is the cumulative incidence function of primary cause at , p=F_1(∞). The likelihood can be written as

L=\{pd_1(t_i)\}^{I(c_i=1)}\{(1-p)d_2(t_i)\}^{I(c_i=2)}\{1-pD_1(t_i)-(1-p)D_2(t_i)\}^{I(c_i=0)} .

Here the D_{1}, D_{2}, d_{1}, d_{2} are the normalized baseline cumulative incidence functions and cause-specific density functions and are modeled with Weibull mixtures as above, while p is the normalizing parameter for the baseline distribution. When regression covariates are present in a competing risks data, we modify the above likelihood with respect to the value of covaraites, such that

F_1(t|\mathbf{Z},\boldsymbol{β_1},p) = 1-(1-pD_{01}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}.

The cause-specific density function for cause 1 is

f_1(t|\mathbf{Z},\boldsymbol{β_1},p)=\exp(\mathbf{Z^T}\boldsymbol{β_1})[1-pD_{01}(t)]^{\exp(\mathbf{Z^T}\boldsymbol{β_1})-1}pd_{01}(t).

The model for the secondary cause is defined as

F_2(t|\mathbf{Z},\boldsymbol{β_1},\boldsymbol{β_2},p)=(1-p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})} (1-(1-D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})}),

which leads to the cause-specific subdensity function for cause 2 as

f_2(t|\mathbf{Z},\boldsymbol{β_2},p)=(1-p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}(1-D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})-1}\exp(\mathbf{Z^T}\boldsymbol{β_2})d_{02}(t).

## Value

This function can generate 4 different kinds of output based on the data set given. They all share,

 c a vector, the cluster assignment in the last iteration, useful for the resumption of MCMC iteration nm a vector, the number of observations in each cluster from the last iteration, useful for the resumption of MCMC iteration emptybasket only useful for the resumption of MCMC iteration allbaskets only useful for the resumption of MCMC iteration ngrp a vector, the number of clusters in each iteration, useful for the resumption of MCMC iteration predtime the time points where the inferences are made high.pct the scaling parameter of observations used in the model usertime a logic value, whether user provides time for estimation or not

1-α is the probability for constructing credible intervals.

 simultaneous Whether give simultaneous credible intervals.

For non-competing risks data, dpweib can generate two classes of output, dpm and ddp, for data with and without covariates separately. They both have

 alpharec a matrix, saved samples of αs, the rows correspond to the iterations saved, the columns correspond to the observations lambdarec a matrix, saved samples of λs, the rows correspond to the iterations saved, the columns correspond to the observations lambda0rec a matrix, saved samples of λ_0s, the rows correspond to the iterations saved, the columns correspond to the observations lambdascaled a matrix, saved samples of λs under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration tl the left end point tr the right end point pi right censoring indicator delta exact observation indicator

For dpm output, it has

 S a matrix, the estimated survival function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations Spred a vector, the estimated survival function at specified time points Spredu a vector, the estimated pointwise upper credible interval for survival function at specified time points Spredl a vector, the estimated pointwise lower credible interval for survival function at specified time points d a matrix, the estimated density function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations dpred a vector, the estimated density function at specified time points dpredu a vector, the estimated pointwise upper credible interval for density function at specified time points dpredl a vector, the estimated pointwise lower credible interval for density function at specified time points h a matrix, the estimated hazard function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations hpred a vector, the estimated hazard function at specified time points hpredu a vector, the estimated pointwise upper credible interval for hazard function at specified time points hpredl a vector, the estimated pointwise lower credible interval for hazard function at specified time points

When simultaneous is specified TRUE, the function also provides

 Sbandu a vector, the estimated simultaneous upper credible interval for survival function at specified time points Sbandl a vector, the estimated simultaneous lower credible interval for survival function at specified time points dbandu a vector, the estimated simultaneous upper credible interval for density function at specified time points dbandl a vector, the estimated simultaneous lower credible interval for density function at specified time points hbandu a vector, the estimated simultaneous upper credible interval for hazard function at specified time points hbandl a vector, the estimated simultaneous lower credible interval for hazard function at specified time points

For ddp output, it also has

 betarec a matrix, saved samples of βs, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. x the covariate matrix xmean a vector, the mean for each covariate(including created binary dummy covariates) xsd a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5.(including created binary dummy covariates) xscale The matrix used to scale log hazard ratio loghr a matrix, the estimated log hazard ratio for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations loghr.est a vector, the estimated log hazard ratio at specified time points loghru a vector, the estimated pointwise upper credible interval for log hazard ratio at specified time points loghrl a vector, the estimated pointwise lower credible interval for log hazard ratio at specified time points indicator a vector, whether a covariate is binary covnames a vector, the names of covariates

When simultaneous is specified TRUE, the function also provides

 loghrbandu a vector, the estimated simultaneous upper credible interval for log hazard ratio at specified time points loghrbandl a vector, the estimated simultaneous lower credible interval for log hazard ratio at specified time points

For competing risks data, dpweib can generate two classes of output, dpmcomp and ddpcomp, for data with and without covariate separately. They both have

 alpharec1 a matrix, saved samples of α_1s, the rows correspond to the iterations saved, the columns correspond to the observations lambdarec1 a matrix, saved samples of λ_1s, the rows correspond to the iterations saved, the columns correspond to the observations lambda0rec1 a matrix, saved samples of λ_{01}s, the rows correspond to the iterations saved, the columns correspond to the observations lambdascaled1 a matrix, saved samples of λ_1s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration alpharec2 a matrix, saved samples of α_2s, the rows correspond to the iterations saved, the columns correspond to the observations lambdarec2 a matrix, saved samples of λ_2s, the rows correspond to the iterations saved, the columns correspond to the observations lambda0rec2 a matrix, saved samples of λ_{02}s, the rows correspond to the iterations saved, the columns correspond to the observations lambdascaled2 a matrix, saved samples of λ_2s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration prec a matrix, saved samples of p, the rows correspond to the iterations saved, the columns correspond to the observations t the observed time event the event indicator

For dpmcomp output, it has

 CIF1 a matrix, the estimated cumulative incidence function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations CIF1.est a vector, the estimated cumulative incidence function of cause 1 at specified time points CIF1u a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 1 at specified time points CIF1l a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 1 at specified time points d1 a matrix, the estimated cause-specific density function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations d1.est a vector, the estimated cause-specific density function of cause 1 at specified time points d1u a vector, the estimated pointwise upper credible interval for cause-specific density function of cause 1 at specified time points d1l a vector, the estimated pointwise lower credible interval for cause-specific density function of cause 1 at specified time points h1 a matrix, the estimated subdistribution hazard function for cause 1 at specified time points, the columns correspond to time points, the rows correspond to saved iterations h1.est a vector, the estimated subdistribution hazard function of cause 1 at specified time points h1u a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 1 at specified time points h1l a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 1 at specified time points CIF2 a matrix, the estimated cumulative incidence function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations CIF2.est a vector, the estimated cumulative incidence function of cause 2 at specified time points CIF2u a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 2 at specified time points CIF2l a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 2 at specified time points d2 a matrix, the estimated cause-specific density function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations d2.est a vector, the estimated cause-specific density function of cause 2 at specified time points d2u a vector, the estimated pointwise upper credible interval for cause-specific density function of cause 2 at specified time points d2l a vector, the estimated pointwise lower credible interval for cause-specific density function of cause 2 at specified time points h2 a matrix, the estimated subdistribution hazard function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations h2.est a vector, the estimated subdistribution hazard function of cause 2 at specified time points h2u a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 2 at specified time points h2l a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 2 at specified time points

When simultaneous is specified TRUE, the function also provides

 CIF1bandu a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 1 at specified time points CIF1bandl a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 1 at specified time points d1bandu a vector, the estimated simultaneous upper credible interval for cause-specific density function of cause 1 at specified time points d1bandl a vector, the estimated simultaneous lower credible interval for cause-specific density function of cause 1 at specified time points h1bandu a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 1 at specified time points h1bandl a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 1 at specified time points CIF2bandu a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 2 at specified time points CIF2bandl a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 2 at specified time points d2bandu a vector, the estimated simultaneous upper credible interval for cause-specific density function of cause 2 at specified time points d2bandl a vector, the estimated simultaneous lower credible interval for cause-specific density function of cause 2 at specified time points h2bandu a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 2 at specified time points h2bandl a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 2 at specified time points

For ddpcomp output, it also has

 betarec1 a matrix, saved samples of β_1s, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. betarec2 a matrix, saved samples of β_2s, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates. xmean a vector, the mean for each covariate(including created dummy covariates) xsd a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5(including created dummy covariates). x the covariate matrix xscale The matrix used to scale log hazard ratio covnames a vector, the names of covariates loghr.est the estimated log subdistribution hazard ratio at specified time points for cause 1 loghru the estimated pointwise upper credible interval for log subdistribution hazard ratio at specified time points for cause 1 loghrl the estimated pointwise lower credible interval for log subdistribution hazard ratio at specified time points for cause 1 indicator a vector, whether a covariate is binary

When simultaneous is specified TRUE, the function also provides

 loghrbandu a vector, the estimated simultaneous upper credible interval for log subdistribution hazard ratio at specified time points loghrbandl a vector, the estimated simultaneous lower credible interval for log subdistribution hazard ratio at specified time points

## Source

Gilks,W.R. and Best,N.G. and Tan,K.K.C. (1995) Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics, 455-472 doi:10.2307/2986138

Neal,R.M (2000) Markov chain sampling methods for Dirichlet process mixture models,Journal of computational and graphical statistics, 9, Num 2, 249-265 doi: 10.1080/10618600.2000.10474879

Kottas,A. (2006) Nonparametric Bayesian survival analysis using mixtures of Weibull distributions, Journal of Statistical Planning and Inference, 136, Num 3, 578-596 doi: 10.1016/j.jspi.2004.08.009

Shi, Y. Martens, M., Banerjee, A. and Laud, P. (2019) Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models. Bayesian Analysis 14, Num 3, 677-702. doi:10.1214/18-BA1119. https://projecteuclid.org/euclid.ba/1560240023

Shi,Y. and Laud,P. and Neuner,J (2021) A Dependent Dirichlet Process Model for Survival Data With Competing Risks Lifetime Data Analysis 27, 156-176. https://doi.org/10.1007/s10985-020-09506-0

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ## Not run: library(survival) library(DPWeibull) data(veteran) DPresult1<-dpweib(Surv(time,status)~1,data=veteran) summary(DPresult1) opar<-par(mfrow=c(1,3), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPresult1) par(opar) DPresult2<-dpweib(Surv(time,status)~factor(trt)+age,data=veteran) summary(DPresult2) opar<-par(mfrow=c(1,2), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPresult2) par(opar) newdata<-NULL newdata$trt<-veteran$trt[c(1,70)] newdata$age<-veteran$age[c(2,87)] newdata<-data.frame(newdata) DPpredict<-predict(DPresult2,newdata) summary(DPpredict) opar<-par(mfrow=c(2,3), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPpredict) par(opar) ############################################################################ # Competing Risks Data # Competing Risks Data library(survival) library(prodlim) library(riskRegression) library(DPWeibull) data(Paquid) Paquid<-Paquid[1:500,] DPresult1<-dpweib(Hist(time, status)~1,data=Paquid, predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200)) opar<-par(mfrow=c(1,3), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPresult1) par(opar) DPresult2<-continue(DPresult1,simultaneous=TRUE) summary(DPresult2) DPresult3<-dpweib(Hist(time, status)~DSST+MMSE,data=Paquid, predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200)) summary(DPresult3) opar<-par(mfrow=c(1,2), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPresult3) par(opar) newdata<-NULL newdata$DSST<-Paquid$DSST[c(1,70)] newdata$MMSE<-Paquid$MMSE[c(2,87)] newdata<-data.frame(newdata) DPpredict<-predict(DPresult3,newdata) summary(DPpredict) opar<-par(mfrow=c(2,3), mar=c(3.1, 3.1, 3.1, 5.1), mgp=c(2, 0.5, 0), oma=c(0, 0, 0, 4)) plot(DPpredict) par(opar) ############################################################### # An example of interval censored data library(KMsurv) library(survival) library(DPWeibull) data("bcdeter") DPresult<-dpweib(Surv(lower, upper, type="interval2") ~ treat, data = bcdeter) summary(DPresult) plot(DPresult) ## End(Not run) 

DPWeibull documentation built on Dec. 13, 2021, 1:07 a.m.