# credibleball: Compute a Bayesian credible ball around a clustering estimate In muschellij2/mcclust.ext: Point estimation and credible balls for Bayesian cluster analysis

## Description

Computes a Bayesian credible ball around a clustering estimate to characterize uncertainty in the posterior, i.e. MCMC samples of clusterings.

## Usage

 ```1 2 3 4 5 6``` ```credibleball(c.star, cls.draw, c.dist = c("VI","Binder"), alpha = 0.05) ## S3 method for class 'credibleball' summary(object, ...) ## S3 method for class 'credibleball' plot(x,data=NULL,dx=NULL,xgrid=NULL,dxgrid=NULL,...) ```

## Arguments

 `c.star` vector, a clustering estimate of the `length(c.star)` data points. `cls.draw` a matrix of the MCMC samples of clusterings of the `ncol(cls.draw)` data points. `c.dist` the distance function on clusterings to use. Should be one of `"VI"` or `"Binder"`. Defaults to `"VI"`. `alpha` a number in the unit interval, specifies the Bayesian confidence level of `1-alpha`. Defaults to `0.05`. `object` an object of class `"credibleball"`. `x` an object of class `"credibleball"`. `data` the dataset contained in a `data.frame` with `ncol(cls.draw)` rows of data points. `dx` for `ncol(x)`=1, the estimated density at the observed data points. `xgrid` for `ncol(x)`=1, a grid of data points for density estimation. `dxgrid` for `ncol(x)`=1, the estimated density at the grid of data points. `...` other inputs to `summary` or `plot`.

## Details

An advantage of Bayesian cluster analysis is that it provides a posterior over the entire partition space, expressing beliefs in the clustering structure given the data. The credible ball summarizes the uncertainty in the posterior around a clustering estimate `c.star` and is defined as the smallest ball around `c.star` with posterior probability at least `1-alpha`. Possible distance metrics on the partition space are the Variation of Information and the N-invariant Binder's loss (Binder's loss times `2/length(c.star)^2`). The posterior probability is estimated from MCMC posterior samples of clusterings.

The credible ball is summarized via the upper vertical, lower vertical, and horizontal bounds, defined, respectively, as the partitions in the credible ball with the fewest clusters that are most distant to `c.star`, with the most clusters that are most distant to `c.star`, and with the greatest distance to `c.star`.

In plots, data points are colored according to cluster membership. For `nrow(data)=1`, the data points are plotted against the density (which is estimated via a call to `density` if not provided). For `nrow(data)=2` the data points are plotted, and for `nrow(data)>2`, the data points are plotted in the space spanned by the first two principal components.

## Value

 `c.star` vector, clustering estimate of the `length(c.star)` data points. `c.horiz` A matrix of horizontal bounds of the credible ball, i.e. partitions in the credible ball with the greatest distant to `c.star`. `c.uppervert` A matrix of upper vertical bounds of the credible ball, i.e. partitions in the credible ball with the fewest clusters that are most distant to `c.star`. `c.lowervert` A matrix of lower vertical bounds of the credible ball, i.e. partitions in the credible ball with the most clusters that are most distant to `c.star`. `dist.horiz` the distance between `c.star` and the horizontal bounds `dist.uppervert` the distance between `c.star` and the upper vertical bounds `dist.lowervert` the distance between `c.star` and the lower vertical bounds

## References

Wade, S. and Ghahramani, Z. (2015) Bayesian cluster analysis: Point estimation and credible balls. Submitted. arXiv:1505.03339.

`minVI`, `minbinder.ext`, `maxpear`, and `medv` to obtain a point estimate of clustering based on posterior MCMC samples; and `plotpsm` for a heat map of posterior similarity matrix.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```data(galaxy.fit) x=data.frame(x=galaxy.fit\$x) data(galaxy.pred) data(galaxy.draw) # Find representative partition of posterior psm=comp.psm(galaxy.draw) galaxy.VI=minVI(psm,galaxy.draw,method=("all"),include.greedy=TRUE) summary(galaxy.VI) plot(galaxy.VI,data=x,dx=galaxy.fit\$fx,xgrid=galaxy.pred\$x,dxgrid=galaxy.pred\$fx) # Uncertainty in partition estimate galaxy.cb=credibleball(galaxy.VI\$cl[1,],galaxy.draw) summary(galaxy.cb) plot(galaxy.cb,data=x,dx=galaxy.fit\$fx,xgrid=galaxy.pred\$x,dxgrid=galaxy.pred\$fx) # Compare with heat map of posterior similarity matrix plotpsm(psm) ```