bayessurvreg3: Cluster-specific accelerated failure time model for...

bayessurvreg3R Documentation

Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data with flexibly specified random effects and/or error distribution.

Description

A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.

A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.

The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.

For details, see Komárek (2006) and Komárek and Lesaffre (2008).

SUPPLEMENTED IN 06/2013: Interval-censored times might be subject to misclassification. In case of doubly-interval-censored data, the event time might be subject to misclassification. For details, see García-Zattera, Jara and Komárek (2016).

We explain first in more detail a model without doubly censoring. Let T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i be event times for ith cluster and the units within that cluster The following regression model is assumed:

\log(T_{i,l}) = \beta'x_{i,l} + b_i + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i

where \beta is unknown regression parameter vector, x_{i,l} is a vector of covariates. b_i is a cluster-specific random effect (random intercept).

The random effects b_i,\;i=1,\dots, N are assumed to be i.i.d. with a univariate density g_{b}(b). The error terms \varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i are assumed to be i.i.d. with a univariate density g_{\varepsilon}(e).

Densities g_{b} and g_{\varepsilon} are both expressed as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). We distinguish two, theoretically equivalent, specifications.

In the following, the density for \varepsilon is explicitely described. The density for b is obtained in an analogous manner.

Specification 1

\varepsilon \sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)

where \sigma^2 is the unknown basis variance and \mu_{j},\;j=-K,\dots, K is an equidistant grid of knots symmetric around the unknown point \gamma and related to the unknown basis variance through the relationship

\mu_{j} = \gamma + j\delta\sigma,\quad j=-K,\dots,K,

where \delta is fixed constants, e.g. \delta=2/3 (which has a justification of being close to cubic B-splines).

Specification 2

\varepsilon \sim \alpha + \tau\,V

where \alpha is an unknown intercept term and \tau is an unknown scale parameter. V is then standardized error term which is distributed according to the univariate normal mixture, i.e.

V\sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)

where \mu_{j},\;j=-K,\dots, K is an equidistant grid of fixed knots (means), usually symmetric about the fixed point \gamma=0 and \sigma^2 is fixed basis variance. Reasonable values for the numbers of grid points K is K=15 with the distance between the two knots equal to \delta=0.3 and for the basis variance \sigma^2=0.2^2.

Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.

The mixture weights w_{j},\;j=-K,\dots, K are not estimated directly. To avoid the constraints 0 < w_{j} < 1 and \sum_{j=-K}^{K}\,w_j = 1 transformed weights a_{j},\;j=-K,\dots, K related to the original weights by the logistic transformation:

a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}

are estimated instead.

A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008).

If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.

In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.

Bivariate normal distribution
It is assumed that the pair of random intercepts from the onset and time-to-event part of the model are normally distributed with zero mean and an unknown covariance matrix D.

A priori, the inverse covariance matrix D^{-1} is addumed to follow a Wishart distribution.

Unknown correlation between the basis G-splines
Each pair of basis G-splines describing the distribution of the random intercept in the onset part and the time-to-event part of the model is assumed to be correlated with an unknown correlation coefficient \varrho. Note that this is just an experiment and is no more further supported.

Prior distribution on \varrho is assumed to be uniform. In the MCMC, the Fisher Z transform of the \varrho given by

Z = -\frac{1}{2}\log\Bigl(\frac{1-\varrho}{1+\varrho}\Bigr)=\mbox{atanh}(\varrho)

is sampled. Its prior is derived from the uniform prior \mbox{Unif}(-1,\;1) put on \varrho.

The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).

Usage

bayessurvreg3(formula, random, formula2, random2,
   data = parent.frame(),
   classification,
   classParam = list(Model = c("Examiner", "Factor:Examiner"),
                     a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
                     init.sens = NULL, init.spec = NULL),
   na.action = na.fail, onlyX = FALSE,
   nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),   
   prior, prior.beta, prior.b, init = list(iter = 0),
   mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
                   k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   prior2, prior.beta2, prior.b2, init2,
   mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
                    k.overrelax.sigma = 1, k.overrelax.scale = 1,
                    type.update.a.b = "slice", k.overrelax.a.b = 1,
                    k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   priorinit.Nb,
   rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
   store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
                r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
                a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), 
   dir)

bayessurvreg3Para(formula, random, formula2, random2,
   data = parent.frame(),
   classification,
   classParam = list(Model = c("Examiner", "Factor:Examiner"),
                     a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
                     init.sens = NULL, init.spec = NULL),
   na.action = na.fail, onlyX = FALSE,
   nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),   
   prior, prior.beta, prior.b, init = list(iter = 0),
   mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
                   k.overrelax.sigma = 1, k.overrelax.scale = 1,
                   type.update.a.b = "slice", k.overrelax.a.b = 1,
                   k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   prior2, prior.beta2, prior.b2, init2,
   mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
                    k.overrelax.sigma = 1, k.overrelax.scale = 1,
                    type.update.a.b = "slice", k.overrelax.a.b = 1,
                    k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
   priorinit.Nb,
   rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
   store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
                r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
                a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE), 
   dir)

Arguments

formula

model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time.

The left-hand side of the formula must be an object created using Surv.

Intercept is implicitely included in the model by the estimation of the error distribution. As a consequence -1 in the model formula does not have any effect on the model specification.

If random is used then the formula must contain an identification of clusters in the form cluster(id), where id is a name of the variable that determines clusters, e.g.

Surv(time, event) gender + cluster(id).
random

formula for the ‘random’ part of the model. In the case of doubly-censored data, this is the random formula for the onset time. With this version of the function only

random = 1

is allowed. If omitted, no random part is included in the model.

formula2

model formula for the regression of the time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for formula applies here.

random2

specification of the ‘random’ part of the model for time-to-event in the case of doubly-censored data. Ignored otherwise. The same structure as for random applies here.

data

optional data frame in which to interpret the variables occuring in the formula, formula2, random, random2 statements.

classification

data.frame with the information for a model which considers misclassification of the event times. It is assumed to have the following columns where the position of columns is important, not their names:

  1. idUnit: variable which determines the rows of classification matrix pertaining to one unit in formula/formula2 data. Number of unique idUnit values must be the same as in formula/formula2 data, classification matrix must be sorted in the same order as formula/formula2 data and having all rows pertaining to one unit in its consecutive rows.

  2. Time: variable with the examination times. It is assumed that the Times are sorted in an increasing order for each idUnit.

  3. Examiner: variable which determines the examiner who performed evaluation at a specific visit. Number of unique Examiner values determines the number of examiners.

  4. Status: 0/1 variable giving the event status according to examiner, 0 = no event, 1 = event.

  5. Factor: possible factor (e.g., tooth in our dental application which may influence the misclassification). Numeric or character variables are converted to a factor. This column is obligatory only if classModel is “Factor:Examiner”.

Possible additional columns are ignored.

If missing, no misclassification is considered.

classParam

a list with additional parameters for the misclassification model. It is ignored if there is no classification argument specified.

The following components of the list classParam are expected.

Model

a character string which specifies the model considered. It can be 1. “Examiner”: sensitivity and specificity depend only on Examiner, 2. “Factor:Examiner”: sensitivity and specificity is for each examiner generally different for different levels of a factor Factor.

a.sens

parameter ‘a’ (shape1) of the beta prior distributions for sensitivities.

b.sens

parameter ‘b’ (shape2) of the beta prior distributions for sensitivities.

a.spec

parameter ‘a’ (shape1) of the beta prior distributions for specificities.

b.spec

parameter ‘b’ (shape2) of the beta prior distributions for specificities.

init.sens

a vector or matrix with initial values of sensitivities. A vector is expected if Model is “Examiner” in which case each component of the vector corresponds to each examiner. A matrix is expected if Model is “Factor:Examiner” in which case rows of the matrix correspond to the values of Factor and columns to examiners.

If not given then the initial sensitivities are sampled from a uniform distribution on (0.8, 0.9).

init.spec

a vector or matrix with initial values of specificities. The structure is the same as for init.sens.

na.action

the user is discouraged from changing the default value na.fail.

onlyX

if TRUE no MCMC sampling is performed and only the design matrix (matrices) are returned. This can be useful to set up correctly priors for regression parameters in the presence of factor covariates.

nsimul

a list giving the number of iterations of the MCMC and other parameters of the simulation.

niter

total number of sampled values after discarding thinned ones, burn-up included;

nthin

thinning interval;

nburn

number of sampled values in a burn-up period after discarding thinned values. This value should be smaller than niter. If not, nburn is set to niter - 1. It can be set to zero;

nwrite

an interval at which information about the number of performed iterations is print on the screen and during the burn-up period an interval with which the sampled values are writen to files;

prior

a list specifying the prior distribution of the G-spline defining the distribution of the error term in the regression model given by formula and random. See prior argument of bayesHistogram function for more detail. In this list also ‘Specification’ as described above is specified.

The item prior$neighbor.system can only be equal to uniCAR here.

prior.b

a list specifying the prior distribution of the G-spline defining the distribution of the random intercept in the regression model given by formula and random. See prior argument of bayesHistogram function for more detail. In this list also ‘Specification’ as described above is specified.

It is ignored if the argument priorinit.Nb is given.

The item prior.b$neighbor.system can only be equal to uniCAR here.

prior.beta

prior specification for the regression parameters, in the case of doubly-censored data for the regression parameters of the onset time, i.e. it is related to formula and random.

This should be a list with the following components:

mean.prior

a vector specifying a prior mean for each beta parameter in the model.

var.prior

a vector specifying a prior variance for each beta parameter.

It is recommended to run the function bayessurvreg3 first with its argument onlyX set to TRUE to find out how the betas are sorted. They must correspond to a design matrix X taken from formula.

init

an optional list with initial values for the MCMC related to the model given by formula and random. The list can have the following components:

iter

the number of the iteration to which the initial values correspond, usually zero.

beta

a vector of initial values for the regression parameters. It must be sorted in the same way as are the columns in the design matrix. Use onlyX=TRUE if you do not know how the columns in the design matrix are created.

a

a vector of length 2K+1 with the initial values of transformed mixture weights for the G-spline defining the distribution of the error term \varepsilon.

lambda

initial values for the Markov random fields precision parameter for the G-spline defining the distribution of the error term \varepsilon.

gamma

an initial values for the middle knot \gamma for the G-spline defining the distribution of the error term \varepsilon.

If ‘Specification’ is 2, this value will not be changed by the MCMC and it is recommended (for easier interpretation of the results) to set init$gamma to zero (default behavior).

If ‘Specification’ is 1 init$gamma should be approximately equal to the mean value of the residuals.

sigma

an initial values of the basis standard deviation \sigma for the G-spline defining the distribution of the error term \varepsilon.

If ‘Specification’ is 2, this value will not be changed by the MCMC and it is recommended to set it approximately equal to the range of standardized data (let say 4 + 4) divided by the number of knots and multiplied by something like 2/3.

If ‘Specification’ is 1 this should be approximately equal to the range of the residuals divided by the number of knots (2K+1) and multiplied again by something like 2/3.

intercept

an initial values of the intercept term \alpha for the G-spline defining the distribution of the error term \varepsilon.

If ‘Specification’ is 1 this value is not changed by the MCMC and the initial value is always changed to zero.

scale

an initial value of the scale parameter \tau for the G-spline defining the distribution of the error term \varepsilon.

If ‘Specification’ is 1 this value is not changed by the MCMC and the initial value is always changed to one.

a.b

a vector of length 2K+1 with the initial values of transformed mixture weights for the G-spline defining the distribution of the random intercept b.

lambda.b

initial values for the Markov random fields precision parameter for the G-spline defining the distribution of the random intercept b.

gamma.b

an initial values for the middle knot \gamma for the G-spline defining the distribution of the random intercept b.

Due to identifiability reasons, this value is always changed to zero and is for neither ‘Specification’ updated by the MCMC.

sigma.b

an initial values of the basis standard deviation \sigma for the G-spline defining the distribution of the random intercept b.

If ‘Specification’ is 2, this value will not be changed by the MCMC and it is recommended to set it approximately equal to the range of standardized data (let say 4 + 4) divided by the number of knots and multiplied by something like 2/3.

If ‘Specification’ is 1 this should be approximately equal to the range of the residuals divided by the number of knots (2K+1) and multiplied again by something like 2/3.

intercept.b

an initial values of the intercept term \alpha for the G-spline defining the distribution of the random intercept b.

Due to identifiability reasons, this value is always changed to zero and is for neither ‘Specification’ updated by the MCMC.

scale.b

an initial value of the scale parameter \tau for the G-spline defining the distribution of the random intercept b.

If ‘Specification’ is 1 this value is not changed by the MCMC and the initial value is always changed to one.

b

a vector of length N of the initial values of random effects b_i,\;i=1,\dots,N for each cluster.

y

a vector of length \sum_{i=1}^N\,n_i with initial values of log-event-times.

r

a vector of length \sum_{i=1}^N\,n_i with initial component labels for each residual. All values must be between -K and K. See argument init of the function bayesHistogram for more details.

r.b

a vector of length N with initial component labels for each random intercept. All values must be between -K and K. See argument init of the function bayesHistogram for more details.

mcmc.par

a list specifying how some of the G-spline parameters related to the distribution of the error term and of the random intercept from formula and random are to be updated. See bayesBisurvreg for more details.

Compared to the mcmc.par argument of the function bayesBisurvreg additional components related to the G-spline for the random intercept can be present, namely

type.update.a.b
k.overrelax.a.b
k.overrelax.sigma.b
k.overrelax.scale.b

In contrast to bayesBisurvreg function arguments mcmc.par$type.update.a and mcmc.par$type.update.a.b can also be equal to "block" in which case all a coefficients are updated in 1 block using the Metropolis-Hastings algorithm.

prior2

a list specifying the prior distribution of the G-spline defining the distribution of the error term in the regression model given by formula2 and random2. See prior argument of bayesHistogram function for more detail.

prior.b2

prior specification for the parameters related to the random effects from formula2 and random2. This should be a list with the same structure as prior.b.

It is ignored if the argument priorinit.Nb is given.

prior.beta2

prior specification for the regression parameters of time-to-event in the case of doubly censored data (related to formula2 and random2). This should be a list with the same structure as prior.beta.

init2

an optional list with initial values for the MCMC related to the model given by formula2 and random2. The list has the same structure as init.

mcmc.par2

a list specifying how some of the G-spline parameters related to formula2 and random2 are to be updated. The list has the same structure as mcmc.par.

priorinit.Nb

a list specifying the prior of the random intercepts in the case of the AFT model with doubly-interval-censored data and onset, time-to-event random intercepts following bivariate normal distribution.

The list should have the following components.

init.D

initial value for the covariance matrix of the onset random intercept and time-to-event random intercept.

It can be specified either as a vector of length 3 giving the lower triangle of the matrix or as a matrix 2 x 2.

df.Di

degrees of freedom \nu for the Wishart prior of the matrix D^{-1}.

Note that it must be higher than 1.

scale.Di

scale matrix S for the Wishart prior of the matrix D^{-1}.

It can be specified either as a vector of length 3 giving the lower triangle of the matrix or as a matrix 2 x 2.

Note that a priori

\mbox{E}(D^{-1}.) = \nu S

rho

a list specifying possible correlation between the onset random intercept and the time-to-event random intercept in the experimental version of the model. If not given correlation is fixed to 0.

It is ignored if the argument priorinit.Nb is given. Ordinary users should not care about this argument.

The list can have the following components.

type.update

character specifying how the Fisher Z transform of the correlation coefficient is updated. Possible values are:

"fixed.zero": correlation coefficient is fixed to 0 and it is not updated.

"normal.around.mode": at each iteration of MCMC, 1 Newton-Raphson step from the current point Z of the full conditional distribution is performed, normal approximation is formed by Taylor expansion and new point Z is sampled from that normal approximation.

Note that this proposal does not work too well if the current point Z lies in the area of low posterior mass. The reason is that even 1 Newton-Raphson step usually leads to the area of high posterior probability mass and the proposal is “too ambisious”.

"langevin". at each iteration of MCMC, new point Z is sampled using the Langevin algorithm. A scale parameter (see below) must cerefully be chosen for this algorithm to ensure that the acceptance rate is about 50–60% (Robert, Casella, 2004, p. 319).

store

a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.

a

if TRUE then all the transformed mixture weights a_{k}, k=-K,\dots,K, related to the G-spline defining the error distribution of formula are stored.

a.b

if TRUE then all the transformed mixture weights a_{k}, k=-K,\dots,K, related to the G-spline defining the distribution of the random intercept from formula and random are stored.

a2

if TRUE and there are doubly-censored data then all the transformed mixture weights a_{k}, k=-K,\dots,K, related to the G-spline defining the error distribution of formula2 are stored.

a.b2

if TRUE then all the transformed mixture weights a_{k}, k=-K,\dots,K, related to the G-spline defining the distribution of the random intercept from formula2 and random2 are stored.

y

if TRUE then augmented log-event times for all observations related to the formula are stored.

y2

if TRUE then augmented log-event times for all observations related to formula2 are stored.

r

if TRUE then labels of mixture components for residuals related to formula are stored.

r.b

if TRUE then labels of mixture components for random intercepts related to formula and random are stored.

r2

if TRUE then labels of mixture components for residuals related to formula2 are stored.

r.b2

if TRUE then labels of mixture components for random intercepts related to formula2 and random2 are stored.

b

if TRUE then the sampled values of the random interceptss related to formula and random are stored.

b2

if TRUE then the sampled values of the random interceptss related to formula2 and random2 are stored.

dir

a string that specifies a directory where all sampled values are to be stored.

Value

A list of class bayessurvreg3 containing an information concerning the initial values and prior choices.

Files created

Additionally, the following files with sampled values are stored in a directory specified by dir argument of this function (some of them are created only on request, see store parameter of this function).

Headers are written to all files created by default and to files asked by the user via the argument store. During the burn-in, only every nsimul$nwrite value is written. After the burn-in, all sampled values are written in files created by default and to files asked by the user via the argument store. In the files for which the corresponding store component is FALSE, every nsimul$nwrite value is written during the whole MCMC (this might be useful to restart the MCMC from some specific point).

The following files are created:

iteration.sim

one column labeled iteration with indeces of MCMC iterations to which the stored sampled values correspond.

mixmoment.sim

this file is related to the density of the error term from the model given by formula.

Columns labeled k, Mean.1, D.1.1, where

k = number of mixture components that had probability numerically higher than zero;

Mean.1 = \mbox{E}(\varepsilon_{i,l});

D.1.1 = \mbox{var}(\varepsilon_{i,l}).

mixmoment_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

The same structure as mixmoment.sim.

mixmoment_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

The same structure as mixmoment.sim.

mixmoment_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

The same structure as mixmoment.sim.

mweight.sim

this file is related to the density of the error term from the model given by formula.

Sampled mixture weights w_{k} of mixture components that had probabilities numerically higher than zero.

mweight_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

The same structure as mweight.sim.

mweight_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

The same structure as mweight.sim.

mweight_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

The same structure as mweight.sim.

mmean.sim

this file is related to the density of the error term from the model given by formula.

Indeces k, k \in\{-K, \dots, K\} of mixture components that had probabilities numerically higher than zero. It corresponds to the weights in mweight.sim.

mmean_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

The same structure as mmean.sim.

mmean_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

The same structure as mmean.sim.

mmean_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

The same structure as mmean.sim.

gspline.sim

this file is related to the density of the error term from the model given by formula.

Characteristics of the sampled G-spline. This file together with mixmoment.sim, mweight.sim and mmean.sim can be used to reconstruct the G-spline in each MCMC iteration.

The file has columns labeled gamma1, sigma1, delta1, intercept1, scale1, The meaning of the values in these columns is the following:

gamma1 = the middle knot \gamma If ‘Specification’ is 2, this column usually contains zeros;

sigma1 = basis standard deviation \sigma of the G-spline. This column contains a fixed value if ‘Specification’ is 2;

delta1 = distance delta between the two knots of the G-spline. This column contains a fixed value if ‘Specification’ is 2;

intercept1 = the intercept term \alpha of the G-spline. If ‘Specification’ is 1, this column usually contains zeros;

scale1 = the scale parameter \tau of the G-spline. If ‘Specification’ is 1, this column usually contains ones;

gspline_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

The same structure as gspline.sim.

gspline_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

The same structure as gspline.sim.

gspline_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

The same structure as gspline.sim.

mlogweight.sim

this file is related to the density of the error term from the model given by formula.

Fully created only if store$a = TRUE. The file contains the transformed weights a_{k}, k=-K,\dots,K of all mixture components, i.e. also of components that had numerically zero probabilities.

mlogweight_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

Fully created only if store$a.b = TRUE.

The same structure as mlogweight.sim.

mlogweight_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

Fully created only if store$a2 = TRUE.

The same structure as mlogweight.sim.

mlogweight_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

Fully created only if store$a.b2 = TRUE.

The same structure as mlogweight.sim.

r.sim

this file is related to the density of the error term from the model given by formula.

Fully created only if store$r = TRUE. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of indeces on the scale \{-K,\dots, K\} values from 1 to (2\,K+1) are stored here. Function vecr2matr can be used to transform it back to indices from -K to K.

r_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

Fully created only if store$r.b = TRUE.

The same structure as r.sim.

r_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

Fully created only if store$r2 = TRUE.

The same structure as r.sim.

r_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

Fully created only if store$r.b2 = TRUE.

The same structure as r.sim.

lambda.sim

this file is related to the density of the error term from the model given by formula.

One column labeled lambda. These are the values of the smoothing parameter\lambda (hyperparameters of the prior distribution of the transformed mixture weights a_{k}).

lambda_b.sim

this file is related to the density of the random intercept from the model given by formula and random.

The same structure as lambda.sim.

lambda_2.sim

in the case of doubly-censored data. This file is related to the density of the error term from the model given by formula2.

The same structure as lambda.sim.

lambda_b2.sim

in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by formula2 and random2.

The same structure as lambda.sim.

beta.sim

this file is related to the model given by formula.

Sampled values of the regression parameters \beta.

The columns are labeled according to the colnames of the design matrix.

beta_2.sim

in the case of doubly-censored data, the same structure as beta.sim, however related to the model given by formula2.

b.sim

this file is related to the model given by formula and random.

Fully created only if store$b = TRUE. It contains sampled values of random intercepts for all clusters in the data set. The file has N columns.

b_2.sim

fully created only if store$b2 = TRUE and in the case of doubly-censored data, the same structure as b.sim, however related to the model given by formula2 and random2.

Y.sim

this file is related to the model given by formula.

Fully created only if store$y = TRUE. It contains sampled (augmented) log-event times for all observations in the data set.

Y_2.sim

fully created only if store$y2 = TRUE and in the case of doubly-censored data, the same structure as Y.sim, however related to the model given by formula2.

logposter.sim

This file is related to the residuals of the model given by formula.

Columns labeled loglik, penalty, and logprw. The columns have the following meaning.

loglik = % - (\sum_{i=1}^N\,n_i)\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\}- 0.5\sum_{i=1}^N\sum_{l=1}^{n_i} \Bigl\{ (\sigma^2\,\tau^2)^{-1}\; (y_{i,l} - x_{i,l}'\beta - b_i - \alpha - \tau\mu_{r_{i,l}})^2 \Bigr\}

where y_{i,l} denotes (augmented) (i,l)th true log-event time.

In other words, loglik is equal to the conditional log-density

\sum_{i=1}^N \sum_{l=1}^{n_i}\,\log\Bigl\{p\bigl(y_{i,l}\;\big|\;r_{i,l},\,\beta,\,b_i,\,\mbox{error-G-spline}\bigr)\Bigr\};

penalty: the penalty term

-\frac{1}{2}\sum_{k}\Bigl(\Delta\, a_k\Bigr)^2

(not multiplied by \lambda);

logprw = -2\,(\sum_i n_i)\,\log\bigl\{\sum_{k}a_{k}\bigr\} + \sum_{k}N_{k}\,a_{k}, where N_{k} is the number of residuals assigned intrinsincally to the kth mixture component.

In other words, logprw is equal to the conditional log-density

\sum_{i=1}^N\sum_{l=1}^{n_i} \log\bigl\{p(r_{i,l}\;|\;\mbox{error-G-spline weights})\bigr\}.

logposter_b.sim

This file is related to the random intercepts from the model given by formula and random.

Columns labeled loglik, penalty, and logprw. The columns have the following meaning.

loglik = % - N\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\}- 0.5\sum_{i=1}^N \Bigl\{ (\sigma^2\,\tau^2)^{-1}\; (b_i - \alpha - \tau\mu_{r_{i}})^2 \Bigr\}

where b_{i} denotes (augmented) ith random intercept.

In other words, loglik is equal to the conditional log-density

\sum_{i=1}^N \,\log\Bigl\{p\bigl(b_{i}\;\big|\;r_{i},\,\mbox{b-G-spline}\bigr)\Bigr\};

The columns penalty and logprw have the analogous meaning as in the case of logposter.sim file.

logposter_2.sim

in the case of doubly-censored data, the same structure as logposter.sim, however related to the model given by formula2.

logposter_b2.sim

in the case of doubly-censored data, the same structure as logposter_b.sim, however related to the model given by formula2 and random2.

Author(s)

Arnošt Komárek arnost.komarek@mff.cuni.cz

References

García-Zattera, M. J., Jara, A., and Komárek, A. (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics, 72, 473 - 483.

Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.

Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.

Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.

Examples

## See the description of R commands for
## the cluster specific AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2007).
##
## R commands available in the documentation
## directory of this package
## - see ex-tandmobCS.R and
##   https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobCS.pdf
##

bayesSurv documentation built on Sept. 30, 2024, 9:34 a.m.