View source: R/bayesHistogram.R
bayesHistogram | R Documentation |
A function to estimate a density of a uni- or bivariate (possibly censored) sample. The density is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and equal variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities in the density specification. Other method functions are available to visualize resulting density estimate.
This function served as a basis for further developed
bayesBisurvreg
, bayessurvreg2
and
bayessurvreg3
functions. However, in contrast to these
functions, bayesHistogram
does not allow for doubly censoring.
Bivariate case:
Let Y_{i,l},\; i=1,\dots,N,\; l=1,2
be
observations for the i
th cluster and the first and the second
unit (dimension). The bivariate observations
Y_i=(Y_{i,1},\,Y_{i,2})',\;i=1,\dots,N
are assumed to be i.i.d. with a~bivariate density
g_{y}(y_1,\,y_2)
. This density is expressed as
a~mixture of Bayesian G-splines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
(Y_1,\,Y_2)' \sim
\sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2,\,\sigma_2^2))
where \sigma_1^2,\,\sigma_2^2
are
unknown basis variances and
\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an~equidistant grid of knots symmetric around the
unknown point (\gamma_1,\,\gamma_2)'
and related to the unknown basis variances through the
relationship
\mu_{1,j_1} = \gamma_1 + j_1\delta_1\sigma_1,\quad j_1=-K_1,\dots,K_1,
\mu_{2,j_2} = \gamma_2 + j_2\delta_2\sigma_2,\quad j_2=-K_2,\dots,K_2,
where \delta_1,\,\delta_2
are fixed
constants, e.g. \delta_1=\delta_2=2/3
(which has a~justification of being close to cubic B-splines).
(Y_1,\,Y_2)' \sim (\alpha_1,\,\alpha_2)'+ \bold{S}\,(Y_1,\,Y_2)'
where (\alpha_1,\,\alpha_2)'
is an
unknown intercept term and
\bold{S} \mbox{ is a diagonal matrix with } \tau_1 \mbox{ and }\tau_2 \mbox{ on a diagonal,}
i.e. \tau_1,\,\tau_2
are unknown scale
parameters. (V_1,\,V_2)'
is then
standardized observational vector which is distributed according
to the bivariate normal mixture, i.e.
(V_1,\,V_2)'\sim \sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2}
w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2, \sigma_2^2))
where \mu_{(j_1,j_2)} =
(\mu_{1,j_1},\,\mu_{2,j_2})'
is an~equidistant grid of fixed knots (means), usually
symmetric about the fixed point (\gamma_1,\,\gamma_2)'=(0,
0)'
and
\sigma_1^2,\,\sigma_2^2
are
fixed basis variances. Reasonable values for the numbers of grid
points K_1
and K_2
are
K_1=K_2=15
with the distance between the two
knots equal to \delta=0.3
and for the basis
variances
\sigma_1^2\sigma_2^2=0.2^2.
Univariate case:
It is a~direct simplification of the bivariate case.
bayesHistogram(y1, y2,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, y = FALSE, r = FALSE),
dir)
y1 |
response for the first dimension in the form of a survival
object created using |
y2 |
response for the second dimension in the form of a survival
object created using |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list that identifies prior hyperparameters and prior choices. See the paper Komárek and Lesaffre (2008) and the PhD. thesis Komárek (2006) for more details. Some prior parameters can be guessed by the function itself. If you
want to do so, set such parameters to
|
init |
a list of the initial values to start the McMC. Set to
|
mcmc.par |
a list specifying further details of the McMC simulation. There are default values implemented for all components of this list.
|
store |
a~list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
A list of class bayesHistogram
containing an information
concerning the initial values and prior choices.
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. All
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
one column labeled iteration
with
indeces of MCMC iterations to which the stored sampled values
correspond.
columns labeled k
, Mean.1
, Mean.2
,
D.1.1
, D.2.1
, D.2.2
in the bivariate case and
columns labeled k
, Mean.1
, D.1.1
in the
univariate case, where
k = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(Y_{i,1})
;
Mean.2 =
\mbox{E}(Y_{i,2})
;
D.1.1 =
\mbox{var}(Y_{i,1})
;
D.2.1 =
\mbox{cov}(Y_{i,1},\,Y_{i,2})
;
D.2.2 =
\mbox{var}(Y_{i,2})
.
sampled mixture weights
w_{k_1,\,k_2}
of mixture components that had
probabilities numerically higher than zero.
indeces k_1,\;k_2,
k_1 \in\{-K_1, \dots, K_1\},
k_2 \in\{-K_2, \dots, K_2\}
of mixture components that had probabilities numerically higher
than zero. It corresponds to the weights in
mweight.sim
.
characteristics of the sampled G-spline
(distribution of
(Y_{i,1},\,Y_{i,2})'
).
This file together with mixmoment.sim
,
mweight.sim
and mmean.sim
can be used to reconstruct
the G-spline in each MCMC iteration.
The file has columns labeled gamma1
,
gamma2
, sigma1
, sigma2
, delta1
,
delta2
, intercept1
, intercept2
,
scale1
, scale2
. The meaning of the values in these
columns is the following:
gamma1 = the middle knot \gamma_1
in the
first dimension. If ‘Specification’ is 2, this column
usually contains zeros;
gamma2 = the middle knot \gamma_2
in the
second dimension. If ‘Specification’ is 2, this column
usually contains zeros;
sigma1 = basis standard deviation \sigma_1
of the G-spline in the first dimension. This column contains
a~fixed value if ‘Specification’ is 2;
sigma2 = basis standard deviation \sigma_2
of the G-spline in the second dimension. This column contains
a~fixed value if ‘Specification’ is 2;
delta1 = distance delta_1
between the two knots of the G-spline in
the first dimension. This column contains
a~fixed value if ‘Specification’ is 2;
delta2 = distance \delta_2
between the two knots of the G-spline in
the second dimension. This column contains a~fixed value if
‘Specification’ is 2;
intercept1 = the intercept term \alpha_1
of
the G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains zeros;
intercept2 = the intercept term \alpha_2
of
the G-spline in the second dimension. If ‘Specification’ is 1, this column
usually contains zeros;
scale1 = the scale parameter \tau_1
of the
G-spline in the first dimension. If ‘Specification’ is 1, this column
usually contains ones;
scale2 = the scale parameter \tau_2
of the
G-spline in the second dimension. ‘Specification’ is 1, this column
usually contains ones.
fully created only if store$a = TRUE
. The
file contains the transformed weights
a_{k_1,\,k_2},
k_1=-K_1,\dots,K_1,
k_2=-K_2,\dots,K_2
of all mixture
components, i.e. also of components that had numerically zero
probabilities.
fully created only if store$r = TRUE
. The file
contains the labels of the mixture components into which the
observations are intrinsically assigned. Instead of double indeces
(k_1,\,k_2)
, values from 1 to (2\,K_1+1)\times
(2\,K_2+1)
are stored here. Function
vecr2matr
can be used to transform it back to double
indeces.
either one column labeled lambda
or two
columns labeled lambda1
and lambda2
. These are the
values of the smoothing parameter(s) \lambda
(hyperparameters of the prior distribution of the transformed
mixture weights a_{k_1,\,k_2}
).
fully created only if store$y = TRUE
. It
contains sampled (augmented) log-event times for all observations
in the data set.
columns labeled loglik
, penalty
or penalty1
and
penalty2
, logprw
. The columns have the following
meaning (the formulas apply for the bivariate case).
loglik
=
%
-N\Bigl\{\log(2\pi) + \log(\sigma_1) + \log(\sigma_2)\Bigr\}-
0.5\sum_{i=1}^N\Bigl\{
(\sigma_1^2\,\tau_1^2)^{-1}\; (y_{i,1} - \alpha_1 - \tau_1\mu_{1,\,r_{i,1}})^2 +
(\sigma_2^2\,\tau_2^2)^{-1}\; (y_{i,2} - \alpha_2 - \tau_2\mu_{2,\,r_{i,2}})^2
\Bigr\}
where y_{i,l}
denotes (augmented) (i,l)th
true log-event time. In other words, loglik
is equal to the
conditional log-density
\sum_{i=1}^N\,\log\Bigl\{p\bigl((y_{i,1},\,y_{i,2})\;\big|\;r_{i},\,\mbox{G-spline}\bigr)\Bigr\};
penalty1: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the first dimension not multiplied by
lambda1
;
penalty2: If prior$neighbor.system
= "uniCAR"
:
the penalty term for the second dimension not multiplied by
lambda2
;
penalty: If prior$neighbor.system
is different from "uniCAR"
:
the penalty term not multiplied by lambda
;
logprw =
-2\,N\,\log\bigl\{\sum_{k_1}\sum_{k_2}a_{k_1,\,k_2}\bigr\} +
\sum_{k_1}\sum_{k_2}N_{k_1,\,k_2}\,a_{k_1,\,k_2},
where N_{k_1,\,k_2}
is the number of observations
assigned intrinsincally to the (k_1,\,k_2)
th
mixture component.
In other words, logprw
is equal to the conditional
log-density
\sum_{i=1}^N \log\bigl\{p(r_i\;|\;\mbox{G-spline
weights})\bigr\}.
Arnošt Komárek arnost.komarek@mff.cuni.cz
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337 - 348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Komárek, A. and Lesaffre, E. (2006b). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.