Description Usage Arguments Details Value Author(s) References See Also Examples

Computes a Bayesian credible ball around a clustering estimate to characterize uncertainty in the posterior, i.e. MCMC samples of clusterings.

1 2 3 4 5 6 |

`c.star` |
vector, a clustering estimate of the |

`cls.draw` |
a matrix of the MCMC samples of clusterings of the |

`c.dist` |
the distance function on clusterings to use. Should be one of |

`alpha` |
a number in the unit interval, specifies the Bayesian confidence level of |

`object` |
an object of class |

`x` |
an object of class |

`data` |
the dataset contained in a |

`dx` |
for |

`xgrid` |
for |

`dxgrid` |
for |

`...` |
other inputs to |

An advantage of Bayesian cluster analysis is that it provides a posterior over the entire partition space, expressing beliefs in the clustering structure given the data. The credible ball summarizes the uncertainty in the posterior around a clustering estimate `c.star`

and is defined as the smallest ball around `c.star`

with posterior probability at least `1-alpha`

. Possible distance metrics on the partition space are the Variation of Information and the N-invariant Binder's loss (Binder's loss times `2/length(c.star)^2`

). The posterior probability is estimated from MCMC posterior samples of clusterings.

The credible ball is summarized via the upper vertical, lower vertical, and horizontal bounds, defined, respectively, as the partitions in the credible ball with the fewest clusters that are most distant to `c.star`

, with the most clusters that are most distant to `c.star`

, and with the greatest distance to `c.star`

.

In plots, data points are colored according to cluster membership. For `nrow(data)=1`

, the data points are plotted against the density (which is estimated via a call to `density`

if not provided). For `nrow(data)=2`

the data points are plotted, and for `nrow(data)>2`

, the data points are plotted in the space spanned by the first two principal components.

`c.star` |
vector, clustering estimate of the |

`c.horiz` |
A matrix of horizontal bounds of the credible ball, i.e. partitions in the credible ball with the greatest distant to |

`c.uppervert` |
A matrix of upper vertical bounds of the credible ball, i.e. partitions in the credible ball with the fewest clusters that are most distant to |

`c.lowervert` |
A matrix of lower vertical bounds of the credible ball, i.e. partitions in the credible ball with the most clusters that are most distant to |

`dist.horiz` |
the distance between |

`dist.uppervert` |
the distance between |

`dist.lowervert` |
the distance between |

Sara Wade, sara.wade@eng.cam.ac.uk

Wade, S. and Ghahramani, Z. (2015) Bayesian cluster analysis: Point estimation and credible balls. Submitted. arXiv:1505.03339.

`minVI`

, `minbinder.ext`

, `maxpear`

, and `medv`

to obtain a point estimate of clustering based on posterior MCMC samples; and `plotpsm`

for a heat map of posterior similarity matrix.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
data(galaxy.fit)
x=data.frame(x=galaxy.fit$x)
data(galaxy.pred)
data(galaxy.draw)
# Find representative partition of posterior
psm=comp.psm(galaxy.draw)
galaxy.VI=minVI(psm,galaxy.draw,method=("all"),include.greedy=TRUE)
summary(galaxy.VI)
plot(galaxy.VI,data=x,dx=galaxy.fit$fx,xgrid=galaxy.pred$x,dxgrid=galaxy.pred$fx)
# Uncertainty in partition estimate
galaxy.cb=credibleball(galaxy.VI$cl[1,],galaxy.draw)
summary(galaxy.cb)
plot(galaxy.cb,data=x,dx=galaxy.fit$fx,xgrid=galaxy.pred$x,dxgrid=galaxy.pred$fx)
# Compare with heat map of posterior similarity matrix
plotpsm(psm)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.