dpweib: Dirichlet process mixture/Dependent Dirichlet process model...

Description Usage Arguments Details Value Source Examples

Description

Use Dirichlet process mixture/dependent Dirichlet process Weibull model for survival data with/without competing risks. When regression covariates are present, the model is a dependent Dirichlet process model. For competing risks data we only consider two potential causes of events and the user can combine events of secondary interests. In competing risks regression model, the estimates provided focus on the primary cause (cause 1), and the user can switch the event indicator to get the estimates for the secondary cause.

Usage

1
2
3
4
5
dpweib(formula,data, high.pct = NULL, predtime = NULL, comp = FALSE,
alpha = 0.05, simultaneous = FALSE, burnin = 8000, iteration = 2000,
alpha00 = 1.354028, alpha0 = 0.03501257, lambda00 = 7.181247,
alphaalpha = 0.2, alphalambda = 0.1, a = 1, b = 1, gamma0 = 1, 
gamma1 = 1, thin = 10, betasl = 2.5, addgroup = 2)

Arguments

formula

A formula written in regular y \sim x_1+x_2+ … +x_p regression format. y is a Surv object for survival data (including interval censored data) and Hist object for competing risks data. The regression covaraites can be continuous or factors. Since the model is flexible enough, interaction terms are not necessary.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which dpweib is called.

high.pct

The estimated high percentile (95th) percentile of the data-generating distribution of the average population given by the user. If the user does not provide this value, we will look into the data. If there is no censoring, we take the 95th percentile of the observed data. If censoring takes less than 15% of the total observations, we use the maximum of the observed time. If the censoring takes more than 15%, we suggest a scaling parameter by first finding the time t corresponding to the observed survival rate at the end of study from the plot of the median of the components (survmedian) generated by our LIO prior on a 0 to 10 scale, then set the scaling parameter to be the largest observation time multiplied by 10/t.

predtime

A vector given by the user to specify the time points where the inferences will be made. If the user does not provide it, we will take 40 time points located evenly from the beginning to the high.pct.

comp

A logical value indicating whether this is competing risks data or not. The default is FALSE.

alpha

1-α is the probability for constructing credible intervals. The default α is 0.05.

simultaneous

A logical value indicating whether to provide simultaneous credible intervals. The default is FALSE.

burnin

Number of burnin iterations. The default is 5000.

iteration

Number of iterations. The default is 5000.

alpha00

Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 1.354028.

alpha0

Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 0.03501257.

lambda00

Parameter for the base distribution of λ in non-competing risks data model and λ_1, λ_2 in competing risks data model. The default is 7.181247.

alphaalpha

Parameter for the base distribution of α in non-competing risks data model and α_1, α_2 in competing risks data model. The default is 0.2.

alphalambda

Parameter for the base distribution of α in non-competing risks data model and α_1, α_2 in competing risks data model. THe default value is 0.1.

a

Parameter for the gamma prior of the concentration parameter of DP. The default is 1.

b

Parameter for the gamma prior of the concentration parameter of DP. The default is 1.

gamma0

Parameter for the base distribution of p in competing risks data model. The default value is 1.

gamma1

Parameter for the base distribution of p in competing risks data model. The default value is 1.

thin

Thinning. The default value is 10.

betasl

Parameter for the base distribution of the regression coefficients β in non-competing risks data model and β_1 and β_2 in competing risks data model. The default value is 2.5.

addgroup

Number of new parameters proposed for each cluster assignment. The default is 2 (suggested by Neal).

Details

For no regression, no competing risks data, the function dpweib implements dirichlet process Weibull mixture model. The basic form of model is the following.

\begin{array}{rl} y_i|α_i,λ_i&\sim Weib(t_i|α_i,λ_i),\quad i=1,...,n\\ (α_i,λ_i)|G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λ|α_0,λ_0) I_{(f(λ),∞)}(α) Ga(α_{α},λ_{α})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}

wheref(λ)=max(0,\log\{\log(20)/λ\}/\log(25)).

For regression data without competing risks, the method is a mixture of Cox model.

\begin{array}{rl} y_i|α_i,λ_i,\boldsymbol{β_i}, \mathbf{Z_i}&\sim Weib(y_i|α_i,λ_i\exp(\mathbf{Z_i^T}\boldsymbol{β_i})),\quad i=1,...,n\\ (α_i,λ_i,\boldsymbol{β_i})|G&\sim G,\quad i=1,...,n\\ G&\sim DP(G_0,ν)\\ G_0&=Ga(λ|α_0,λ_0) I_{(f(λ),u)}(α) Ga(α_{α},λ_{α}) q(\boldsymbol{β})\\ λ_0&\sim Ga(α_{00},λ_{00})\\ ν&\sim Ga(a,b)\\ \end{array}

The density function corresponding to this Weibull notation is p(y_i|α_i,λ_i)=λ_iα_i y_i^{α_i-1}e^{-λ_i y_i^{α_i}},\quad y_i>0,\quad α_i>0,\quad λ_i>0. [x]=Ga(α,λ) denotes that the density function of x is \displaystyle\frac{λ^{α}}{Γ(α)}x^{α-1}e^{-λ x}, α>0, λ>0, x>0. q(β) is the base distribution for regression coefficients.The details of the choice of base distribution is described in our coming paper.

In competing risks data, the likelihood for each individual can be written as

L=\{f_1(t_i)\}^{I(c_i=1)}\{f_2(t_i)\}^{I(c_i=2)}\{1-F_1(t_i)-F_2(t_i)\}^{I(c_i=0)},

where f_1(\cdot) and f_2(\cdot) are the cause-specific density functions for cause 1 and 2 and survival function for the ith observation can be expressed as 1-F_1(t_i)-F_2(t_i). In order to model it, we introduce a parameter p, which is the cumulative incidence function of primary cause at , p=F_1(∞). The likelihood can be written as

L=\{pd_1(t_i)\}^{I(c_i=1)}\{(1-p)d_2(t_i)\}^{I(c_i=2)}\{1-pD_1(t_i)-(1-p)D_2(t_i)\}^{I(c_i=0)} .

Here the D_{1}, D_{2}, d_{1}, d_{2} are the normalized baseline cumulative incidence functions and cause-specific density functions and are modeled with Weibull mixtures as above, while p is the normalizing parameter for the baseline distribution. When regression covariates are present in a competing risks data, we modify the above likelihood with respect to the value of covaraites, such that

F_1(t|\mathbf{Z},\boldsymbol{β_1},p) = 1-(1-pD_{01}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}.

The cause-specific density function for cause 1 is

f_1(t|\mathbf{Z},\boldsymbol{β_1},p)=\exp(\mathbf{Z^T}\boldsymbol{β_1})[1-pD_{01}(t)]^{\exp(\mathbf{Z^T}\boldsymbol{β_1})-1}pd_{01}(t).

The model for the secondary cause is defined as

F_2(t|\mathbf{Z},\boldsymbol{β_1},\boldsymbol{β_2},p)=(1-p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})} (1-(1-D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})}),

which leads to the cause-specific subdensity function for cause 2 as

f_2(t|\mathbf{Z},\boldsymbol{β_2},p)=(1-p)^{\exp(\mathbf{Z^T}\boldsymbol{β_1})}(1-D_{02}(t))^{\exp(\mathbf{Z^T}\boldsymbol{β_2})-1}\exp(\mathbf{Z^T}\boldsymbol{β_2})d_{02}(t).

Value

This function can generate 4 different kinds of output based on the data set given. They all share,

c

a vector, the cluster assignment in the last iteration, useful for the resumption of MCMC iteration

nm

a vector, the number of observations in each cluster from the last iteration, useful for the resumption of MCMC iteration

emptybasket

only useful for the resumption of MCMC iteration

allbaskets

only useful for the resumption of MCMC iteration

ngrp

a vector, the number of clusters in each iteration, useful for the resumption of MCMC iteration

predtime

the time points where the inferences are made

high.pct

the scaling parameter of observations used in the model

usertime

a logic value, whether user provides time for estimation or not

1-α is the probability for constructing credible intervals.

simultaneous

Whether give simultaneous credible intervals.

For non-competing risks data, dpweib can generate two classes of output, dpm and ddp, for data with and without covariates separately. They both have

alpharec

a matrix, saved samples of αs, the rows correspond to the iterations saved, the columns correspond to the observations

lambdarec

a matrix, saved samples of λs, the rows correspond to the iterations saved, the columns correspond to the observations

lambda0rec

a matrix, saved samples of λ_0s, the rows correspond to the iterations saved, the columns correspond to the observations

lambdascaled

a matrix, saved samples of λs under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration

tl

the left end point

tr

the right end point

pi

right censoring indicator

delta

exact observation indicator

For dpm output, it has

S

a matrix, the estimated survival function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

Spred

a vector, the estimated survival function at specified time points

Spredu

a vector, the estimated pointwise upper credible interval for survival function at specified time points

Spredl

a vector, the estimated pointwise lower credible interval for survival function at specified time points

d

a matrix, the estimated density function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

dpred

a vector, the estimated density function at specified time points

dpredu

a vector, the estimated pointwise upper credible interval for density function at specified time points

dpredl

a vector, the estimated pointwise lower credible interval for density function at specified time points

h

a matrix, the estimated hazard function for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

hpred

a vector, the estimated hazard function at specified time points

hpredu

a vector, the estimated pointwise upper credible interval for hazard function at specified time points

hpredl

a vector, the estimated pointwise lower credible interval for hazard function at specified time points

When simultaneous is specified TRUE, the function also provides

Sbandu

a vector, the estimated simultaneous upper credible interval for survival function at specified time points

Sbandl

a vector, the estimated simultaneous lower credible interval for survival function at specified time points

dbandu

a vector, the estimated simultaneous upper credible interval for density function at specified time points

dbandl

a vector, the estimated simultaneous lower credible interval for density function at specified time points

hbandu

a vector, the estimated simultaneous upper credible interval for hazard function at specified time points

hbandl

a vector, the estimated simultaneous lower credible interval for hazard function at specified time points

For ddp output, it also has

betarec

a matrix, saved samples of βs, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates.

x

the covariate matrix

xmean

a vector, the mean for each covariate(including created binary dummy covariates)

xsd

a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5.(including created binary dummy covariates)

xscale

The matrix used to scale log hazard ratio

loghr

a matrix, the estimated log hazard ratio for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

loghr.est

a vector, the estimated log hazard ratio at specified time points

loghru

a vector, the estimated pointwise upper credible interval for log hazard ratio at specified time points

loghrl

a vector, the estimated pointwise lower credible interval for log hazard ratio at specified time points

indicator

a vector, whether a covariate is binary

covnames

a vector, the names of covariates

When simultaneous is specified TRUE, the function also provides

loghrbandu

a vector, the estimated simultaneous upper credible interval for log hazard ratio at specified time points

loghrbandl

a vector, the estimated simultaneous lower credible interval for log hazard ratio at specified time points

For competing risks data, dpweib can generate two classes of output, dpmcomp and ddpcomp, for data with and without covariate separately. They both have

alpharec1

a matrix, saved samples of α_1s, the rows correspond to the iterations saved, the columns correspond to the observations

lambdarec1

a matrix, saved samples of λ_1s, the rows correspond to the iterations saved, the columns correspond to the observations

lambda0rec1

a matrix, saved samples of λ_{01}s, the rows correspond to the iterations saved, the columns correspond to the observations

lambdascaled1

a matrix, saved samples of λ_1s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration

alpharec2

a matrix, saved samples of α_2s, the rows correspond to the iterations saved, the columns correspond to the observations

lambdarec2

a matrix, saved samples of λ_2s, the rows correspond to the iterations saved, the columns correspond to the observations

lambda0rec2

a matrix, saved samples of λ_{02}s, the rows correspond to the iterations saved, the columns correspond to the observations

lambdascaled2

a matrix, saved samples of λ_2s under 0 to 10 scale, the rows correspond to the iterations saved, the columns correspond to the observations, only useful for the resumption of MCMC iteration

prec

a matrix, saved samples of p, the rows correspond to the iterations saved, the columns correspond to the observations

t

the observed time

event

the event indicator

For dpmcomp output, it has

CIF1

a matrix, the estimated cumulative incidence function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

CIF1.est

a vector, the estimated cumulative incidence function of cause 1 at specified time points

CIF1u

a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 1 at specified time points

CIF1l

a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 1 at specified time points

d1

a matrix, the estimated cause-specific density function for cause 1 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

d1.est

a vector, the estimated cause-specific density function of cause 1 at specified time points

d1u

a vector, the estimated pointwise upper credible interval for cause-specific density function of cause 1 at specified time points

d1l

a vector, the estimated pointwise lower credible interval for cause-specific density function of cause 1 at specified time points

h1

a matrix, the estimated subdistribution hazard function for cause 1 at specified time points, the columns correspond to time points, the rows correspond to saved iterations

h1.est

a vector, the estimated subdistribution hazard function of cause 1 at specified time points

h1u

a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 1 at specified time points

h1l

a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 1 at specified time points

CIF2

a matrix, the estimated cumulative incidence function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

CIF2.est

a vector, the estimated cumulative incidence function of cause 2 at specified time points

CIF2u

a vector, the estimated pointwise upper credible interval for cumulative incidence function of cause 2 at specified time points

CIF2l

a vector, the estimated pointwise lower credible interval for cumulative incidence function of cause 2 at specified time points

d2

a matrix, the estimated cause-specific density function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

d2.est

a vector, the estimated cause-specific density function of cause 2 at specified time points

d2u

a vector, the estimated pointwise upper credible interval for cause-specific density function of cause 2 at specified time points

d2l

a vector, the estimated pointwise lower credible interval for cause-specific density function of cause 2 at specified time points

h2

a matrix, the estimated subdistribution hazard function for cause 2 for each saved iteration, the columns correspond to time points, the rows correspond to saved iterations

h2.est

a vector, the estimated subdistribution hazard function of cause 2 at specified time points

h2u

a vector, the estimated pointwise upper credible interval for subdistribution hazard function of cause 2 at specified time points

h2l

a vector, the estimated pointwise lower credible interval for subdistribution hazard function of cause 2 at specified time points

When simultaneous is specified TRUE, the function also provides

CIF1bandu

a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 1 at specified time points

CIF1bandl

a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 1 at specified time points

d1bandu

a vector, the estimated simultaneous upper credible interval for cause-specific density function of cause 1 at specified time points

d1bandl

a vector, the estimated simultaneous lower credible interval for cause-specific density function of cause 1 at specified time points

h1bandu

a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 1 at specified time points

h1bandl

a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 1 at specified time points

CIF2bandu

a vector, the estimated simultaneous upper credible interval for cumulative incidence function of cause 2 at specified time points

CIF2bandl

a vector, the estimated simultaneous lower credible interval for cumulative incidence function of cause 2 at specified time points

d2bandu

a vector, the estimated simultaneous upper credible interval for cause-specific density function of cause 2 at specified time points

d2bandl

a vector, the estimated simultaneous lower credible interval for cause-specific density function of cause 2 at specified time points

h2bandu

a vector, the estimated simultaneous upper credible interval for subdistribution hazard function of cause 2 at specified time points

h2bandl

a vector, the estimated simultaneous lower credible interval for subdistribution hazard function of cause 2 at specified time points

For ddpcomp output, it also has

betarec1

a matrix, saved samples of β_1s, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates.

betarec2

a matrix, saved samples of β_2s, which is consist of horizontal-merged blocks. One block corresponds to one observation. Inside each block, the rows correspond to the iterations saved, the columns correspond to the covariates.

xmean

a vector, the mean for each covariate(including created dummy covariates)

xsd

a vector, the standized deviation for each covariate, if the covariate is binary, then it is set to be 0.5(including created dummy covariates).

x

the covariate matrix

xscale

The matrix used to scale log hazard ratio

covnames

a vector, the names of covariates

loghr.est

the estimated log subdistribution hazard ratio at specified time points for cause 1

loghru

the estimated pointwise upper credible interval for log subdistribution hazard ratio at specified time points for cause 1

loghrl

the estimated pointwise lower credible interval for log subdistribution hazard ratio at specified time points for cause 1

indicator

a vector, whether a covariate is binary

When simultaneous is specified TRUE, the function also provides

loghrbandu

a vector, the estimated simultaneous upper credible interval for log subdistribution hazard ratio at specified time points

loghrbandl

a vector, the estimated simultaneous lower credible interval for log subdistribution hazard ratio at specified time points

Source

Gilks,W.R. and Best,N.G. and Tan,K.K.C. (1995) Adaptive rejection Metropolis sampling within Gibbs sampling, Applied Statistics, 455-472 doi:10.2307/2986138

Neal,R.M (2000) Markov chain sampling methods for Dirichlet process mixture models,Journal of computational and graphical statistics, 9, Num 2, 249-265 doi: 10.1080/10618600.2000.10474879

Kottas,A. (2006) Nonparametric Bayesian survival analysis using mixtures of Weibull distributions, Journal of Statistical Planning and Inference, 136, Num 3, 578-596 doi: 10.1016/j.jspi.2004.08.009

Shi, Y. Martens, M., Banerjee, A. and Laud, P. (2019) Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models. Bayesian Analysis 14, Num 3, 677-702. doi:10.1214/18-BA1119. https://projecteuclid.org/euclid.ba/1560240023

Shi,Y. and Laud,P. and Neuner,J (2021) A Dependent Dirichlet Process Model for Survival Data With Competing Risks Lifetime Data Analysis 27, 156-176. https://doi.org/10.1007/s10985-020-09506-0

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
## Not run: 
library(survival)
library(DPWeibull)
data(veteran)

DPresult1<-dpweib(Surv(time,status)~1,data=veteran)
summary(DPresult1)
opar<-par(mfrow=c(1,3),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPresult1)
par(opar)

DPresult2<-dpweib(Surv(time,status)~factor(trt)+age,data=veteran)
summary(DPresult2)
opar<-par(mfrow=c(1,2),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPresult2)
par(opar)

newdata<-NULL
newdata$trt<-veteran$trt[c(1,70)]
newdata$age<-veteran$age[c(2,87)]
newdata<-data.frame(newdata)
DPpredict<-predict(DPresult2,newdata)
summary(DPpredict)
opar<-par(mfrow=c(2,3),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPpredict)
par(opar)

############################################################################
# Competing Risks Data
# Competing Risks Data
library(survival)
library(prodlim)
library(riskRegression)
library(DPWeibull)
data(Paquid)

Paquid<-Paquid[1:500,]
DPresult1<-dpweib(Hist(time, status)~1,data=Paquid,
                  predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200))
opar<-par(mfrow=c(1,3),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPresult1)
par(opar)

DPresult2<-continue(DPresult1,simultaneous=TRUE)
summary(DPresult2)

DPresult3<-dpweib(Hist(time, status)~DSST+MMSE,data=Paquid,
                  predtime = seq(from=min(Paquid$time),to=max(Paquid$time),length=200))
summary(DPresult3)
opar<-par(mfrow=c(1,2),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPresult3)
par(opar)

newdata<-NULL
newdata$DSST<-Paquid$DSST[c(1,70)]
newdata$MMSE<-Paquid$MMSE[c(2,87)]
newdata<-data.frame(newdata)

DPpredict<-predict(DPresult3,newdata)
summary(DPpredict)
opar<-par(mfrow=c(2,3),
          mar=c(3.1, 3.1, 3.1, 5.1),
          mgp=c(2, 0.5, 0),
          oma=c(0, 0, 0, 4))
plot(DPpredict)
par(opar)

###############################################################

# An example of interval censored data
library(KMsurv)
library(survival)
library(DPWeibull)
data("bcdeter")

DPresult<-dpweib(Surv(lower, upper, type="interval2") ~ treat, data = bcdeter)
summary(DPresult)
plot(DPresult)

## End(Not run)

DPWeibull documentation built on Dec. 13, 2021, 1:07 a.m.

Related to dpweib in DPWeibull...