bayessurvreg3 | R Documentation |
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2008).
SUPPLEMENTED IN 06/2013: Interval-censored times might be subject to misclassification. In case of doubly-interval-censored data, the event time might be subject to misclassification. For details, see García-Zattera, Jara and Komárek (2016).
We explain first in more detail a model without doubly censoring. Let T[i,l], i=1,..., N, l=1,..., n[i] be event times for ith cluster and the units within that cluster The following regression model is assumed:
log(T[i,l]) = beta'x[i,l] + b[i] + epsilon[i,l], i=1,..., N, l=1,..., n[i]
where beta is unknown regression parameter vector, x[i,l] is a vector of covariates. b[i] is a cluster-specific random effect (random intercept).
The random effects b[i], i=1,..., N are assumed to be i.i.d. with a univariate density g[b](b). The error terms epsilon[i,l], i=1,..., N, l=1,..., n[i] are assumed to be i.i.d. with a univariate density g[epsilon](e).
Densities g[b] and g[epsilon] are both expressed as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). We distinguish two, theoretically equivalent, specifications.
In the following, the density for epsilon is explicitely described. The density for b is obtained in an analogous manner.
epsilon is distributed as sum[j=-K][K] w[j] N(mu[j], sigma^2)
where sigma^2 is the unknown basis variance and mu[j],\;j=-K,..., K is an equidistant grid of knots symmetric around the unknown point gamma and related to the unknown basis variance through the relationship
mu[j] = gamma + j*delta*sigma, j=K,..., K
where delta is fixed constants, e.g. delta=2/3 (which has a justification of being close to cubic B-splines).
epsilon[1] is distributed as alpha + tau * V
where alpha is an unknown intercept term and tau is an unknown scale parameter. V is then standardized error term which is distributed according to the univariate normal mixture, i.e.
V is distributed as sum[j=-K][K] w[j] N(mu[j], sigma^2)
where mu[j], j=-K,..., K is an equidistant grid of fixed knots (means), usually symmetric about the fixed point gamma = 0 and sigma^2 is fixed basis variance. Reasonable values for the numbers of grid points K is K=15 with the distance between the two knots equal to delta=0.3 and for the basis variance sigma^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.
The mixture weights w[j], j=-K,..., K are not estimated directly. To avoid the constraints 0 < w[j] < 1 and sum[j=-K][K] w[j] = 1 transformed weights a[j], j=-K,..., K related to the original weights by the logistic transformation:
a[j] = exp(w[j])/sum[m] exp(w[m])
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.
Bivariate normal distribution
It is assumed that the pair of random intercepts from the onset and
time-to-event part of the model are normally distributed with zero
mean and an unknown covariance matrix D.
A priori, the inverse covariance matrix D^(-1) is addumed to follow a Wishart distribution.
Unknown correlation between the basis G-splines
Each pair of basis G-splines describing the distribution of the random
intercept in the onset part and the time-to-event part of the model is
assumed to be correlated with an unknown correlation coefficient
rho. Note that this is just an experiment and is no
more further supported.
Prior distribution on rho is assumed to be uniform. In the MCMC, the Fisher Z transform of the rho given by
Z = -0.5*log((1-rho)/(1+rho)) = atanh(rho)
is sampled. Its prior is derived from the uniform prior Unif(-1, 1) put on rho.
The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).
bayessurvreg3(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir = getwd()) bayessurvreg3Para(formula, random, formula2, random2, data = parent.frame(), classification, classParam = list(Model = c("Examiner", "Factor:Examiner"), a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1, init.sens = NULL, init.spec = NULL), na.action = na.fail, onlyX = FALSE, nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10), prior, prior.beta, prior.b, init = list(iter = 0), mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), prior2, prior.beta2, prior.b2, init2, mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1, k.overrelax.sigma = 1, k.overrelax.scale = 1, type.update.a.b = "slice", k.overrelax.a.b = 1, k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1), priorinit.Nb, rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1), store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE, r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE, a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), dir = getwd())
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the Intercept is implicitely included in the model by the
estimation of the error distribution. As a consequence If
|
random |
formula for the ‘random’ part of the model.
In the case of doubly-censored data, this is the
is allowed. If omitted, no random part is included in the model. |
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for |
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for |
data |
optional data frame in which to interpret the variables
occuring in the |
classification |
Possible additional columns are ignored. If missing, no misclassification is considered. |
classParam |
a The following components of the
|
na.action |
the user is discouraged from changing the default
value |
onlyX |
if |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item |
prior.b |
a list specifying the prior distribution of the
G-spline defining the distribution of the random intercept in the
regression model given by It is ignored if the argument The item |
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg3 first with its argument |
init |
an optional list with initial values for the MCMC related
to the model given by
|
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term and of the random
intercept from Compared to the mcmc.par argument of the function
In contrast to |
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
prior.b2 |
prior specification for the parameters related to the
random effects from It is ignored if the argument |
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
|
init2 |
an optional list with initial values for the MCMC related
to the model given by |
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to |
priorinit.Nb |
a list specifying the prior of the random intercepts in the case of the AFT model with doubly-interval-censored data and onset, time-to-event random intercepts following bivariate normal distribution. The list should have the following components.
|
rho |
a list specifying possible correlation between the onset random intercept and the time-to-event random intercept in the experimental version of the model. If not given correlation is fixed to 0. It is ignored if the argument The list can have the following components.
|
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayessurvreg3
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
this file is related to the density of the
error term from the model given by formula
.
Columns labeled k
, Mean.1
,
D.1.1
, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 = E(epsilon[i,l]);
D.1.1 = var(epsilon[i,l]).
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mixmoment.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mixmoment.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mixmoment.sim
.
this file is related to the density of the
error term from the model given by formula
.
Sampled mixture weights w[k] of mixture components that had probabilities numerically higher than zero.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mweight.sim
.
this file is related to the density of the
error term from the model given by formula
.
Indeces k,
k in {-K, ..., K}
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as mmean.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as mmean.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as mmean.sim
.
this file is related to the density of the
error term from the model given by formula
.
Characteristics of the sampled G-spline.
This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled
gamma1
,
sigma1
,
delta1
,
intercept1
,
scale1
,
The meaning of the values in these columns is the following:
gamma1 = the middle knot gamma If ‘Specification’ is 2, this column usually contains zeros;
sigma1 = basis standard deviation sigma of the G-spline. This column contains a fixed value if ‘Specification’ is 2;
delta1 = distance delta between the two knots of the G-spline. This column contains a fixed value if ‘Specification’ is 2;
intercept1 = the intercept term alpha of the G-spline. If ‘Specification’ is 1, this column usually contains zeros;
scale1 = the scale parameter tau of the G-spline. If ‘Specification’ is 1, this column usually contains ones;
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as gspline.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as gspline.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as gspline.sim
.
this file is related to the density of the
error term from the model given by formula
.
Fully created only if store$a = TRUE
. The
file contains the transformed weights
a[k],
k=-K,..., K
of all mixture components, i.e. also of components that had numerically zero
probabilities.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
Fully created only if store$a.b = TRUE
.
The same structure as mlogweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
Fully created only if store$a2 = TRUE
.
The same structure as mlogweight.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
Fully created only if store$a.b2 = TRUE
.
The same structure as mlogweight.sim
.
this file is related to the density of the
error term from the model given by formula
.
Fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
residuals are intrinsically assigned. Instead of indeces on the
scale {-K,..., K}
values from 1 to (2*K+1) are stored here. Function
vecr2matr
can be used to transform it back to
indices from -K to K.
this file is related to the density of the
random intercept from the model given by formula
and
random
.
Fully created only if store$r.b = TRUE
.
The same structure as r.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
Fully created only if store$r2 = TRUE
.
The same structure as r.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
Fully created only if store$r.b2 = TRUE
.
The same structure as r.sim
.
this file is related to the density of the
error term from the model given by formula
.
One column labeled lambda
. These are the
values of the smoothing parameterlambda
(hyperparameters of the prior distribution of the transformed
mixture weights a[k]).
this file is related to the density of the
random intercept from the model given by formula
and
random
.
The same structure as lambda.sim
.
in the case of doubly-censored data. This
file is related to the density of the error term from the model
given by formula2
.
The same structure as lambda.sim
.
in the case of doubly-censored data. This
file is related to the density of the random intercept from the model
given by formula2
and random2
.
The same structure as lambda.sim
.
this file is related to the model given by
formula
.
Sampled values of the regression parameters beta.
The columns are labeled according to the
colnames
of the design matrix.
in the case of doubly-censored data, the same
structure as beta.sim
, however related to the model
given by formula2
.
this file is related to the model given by
formula
and random
.
Fully created only if store$b = TRUE
. It
contains sampled values of random intercepts for all clusters in
the data set. The file has N columns.
fully created only if store$b2 =
TRUE
and in the case of doubly-censored data, the same
structure as b.sim
, however related to the model
given by formula2
and random2
.
this file is related to the model given by formula
.
Fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
fully created only if store$y2 =
TRUE
and in the case of doubly-censored data, the same
structure as Y.sim
, however related to the model
given by formula2
.
This file is related to the residuals of the model
given by formula
.
Columns labeled loglik
, penalty
, and logprw
.
The columns have the following meaning.
loglik = -(sum[i=1][N] n[i]) * (log(sqrt(2*pi)) + log(sigma)) -0.5*sum[i=1][N] sum[l=1][n[i]]( (sigma^2*tau^2)^(-1) * (y[i,l] - x[i,l]'beta - b[i] - alpha - tau*mu[r[i,l]])^2)
where y[i,l] denotes (augmented) (i,l)th true log-event time.
In other words, loglik
is equal to the
conditional log-density
sum[i=1][N] sum[l=1][n[i]] log(p(y[i,l] | r[i,l], beta, b[i], error-G-spline));
penalty: the penalty term
-0.5*sum[k] (Delta a[k])^2
(not multiplied by lambda);
logprw = -2*(sum[i] n[i])*log(sum[k] exp(a[k])) + sum[k[1]] N[k]*a[k], where N[k] is the number of residuals assigned intrinsincally to the kth mixture component.
In other words, logprw
is equal to the conditional
log-density
sum[i=1][N] sum[l=1][n[i]] log(p(r[i,l] | error-G-spline weights)).
This file is related to the random
intercepts from the model given by formula
and
random
.
Columns labeled loglik
, penalty
, and logprw
.
The columns have the following meaning.
loglik = -N * (log(sqrt(2*pi)) + log(sigma)) -0.5*sum[i=1][N]( (sigma^2*tau^2)^(-1) * (b[i] - alpha - tau*mu[r[i]])^2)
where b[i] denotes (augmented) ith random intercept.
In other words, loglik
is equal to the
conditional log-density
sum[i=1][N] log(p(b[i] | r[i], b-G-spline));
The columns penalty
and logprw
have the analogous
meaning as in the case of logposter.sim file.
in the case of doubly-censored data, the same
structure as logposter.sim
, however related to the model
given by formula2
.
in the case of doubly-censored data, the same
structure as logposter_b.sim
, however related to the model
given by formula2
and random2
.
Arnošt Komárek arnost.komarek@mff.cuni.cz
García-Zattera, M. J., Jara, A., and Komárek, A. (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics, 72, 473 - 483.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.
## See the description of R commands for ## the cluster specific AFT model ## with the Signal Tandmobiel data, ## analysis described in Komarek and Lesaffre (2007). ## ## R commands available in the documentation ## directory of this package ## - see ex-tandmobCS.R and ## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobCS.pdf ##
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.