credibleball: Compute a Bayesian credible ball around a clustering estimate

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Computes a Bayesian credible ball around a clustering estimate to characterize uncertainty in the posterior, i.e. MCMC samples of clusterings.

Usage

1
2
3
4
5
6
credibleball(c.star, cls.draw, c.dist = c("VI","Binder"), alpha = 0.05)

## S3 method for class 'credibleball'
summary(object, ...)
## S3 method for class 'credibleball'
plot(x,data=NULL,dx=NULL,xgrid=NULL,dxgrid=NULL,...)

Arguments

c.star

vector, a clustering estimate of the length(c.star) data points.

cls.draw

a matrix of the MCMC samples of clusterings of the ncol(cls.draw) data points.

c.dist

the distance function on clusterings to use. Should be one of "VI" or "Binder". Defaults to "VI".

alpha

a number in the unit interval, specifies the Bayesian confidence level of 1-alpha. Defaults to 0.05.

object

an object of class "credibleball".

x

an object of class "credibleball".

data

the dataset contained in a data.frame with ncol(cls.draw) rows of data points.

dx

for ncol(x)=1, the estimated density at the observed data points.

xgrid

for ncol(x)=1, a grid of data points for density estimation.

dxgrid

for ncol(x)=1, the estimated density at the grid of data points.

...

other inputs to summary or plot.

Details

An advantage of Bayesian cluster analysis is that it provides a posterior over the entire partition space, expressing beliefs in the clustering structure given the data. The credible ball summarizes the uncertainty in the posterior around a clustering estimate c.star and is defined as the smallest ball around c.star with posterior probability at least 1-alpha. Possible distance metrics on the partition space are the Variation of Information and the N-invariant Binder's loss (Binder's loss times 2/length(c.star)^2). The posterior probability is estimated from MCMC posterior samples of clusterings.

The credible ball is summarized via the upper vertical, lower vertical, and horizontal bounds, defined, respectively, as the partitions in the credible ball with the fewest clusters that are most distant to c.star, with the most clusters that are most distant to c.star, and with the greatest distance to c.star.

In plots, data points are colored according to cluster membership. For nrow(data)=1, the data points are plotted against the density (which is estimated via a call to density if not provided). For nrow(data)=2 the data points are plotted, and for nrow(data)>2, the data points are plotted in the space spanned by the first two principal components.

Value

c.star

vector, clustering estimate of the length(c.star) data points.

c.horiz

A matrix of horizontal bounds of the credible ball, i.e. partitions in the credible ball with the greatest distant to c.star.

c.uppervert

A matrix of upper vertical bounds of the credible ball, i.e. partitions in the credible ball with the fewest clusters that are most distant to c.star.

c.lowervert

A matrix of lower vertical bounds of the credible ball, i.e. partitions in the credible ball with the most clusters that are most distant to c.star.

dist.horiz

the distance between c.star and the horizontal bounds

dist.uppervert

the distance between c.star and the upper vertical bounds

dist.lowervert

the distance between c.star and the lower vertical bounds

Author(s)

Sara Wade, sara.wade@eng.cam.ac.uk

References

Wade, S. and Ghahramani, Z. (2015) Bayesian cluster analysis: Point estimation and credible balls. Submitted. arXiv:1505.03339.

See Also

minVI, minbinder.ext, maxpear, and medv to obtain a point estimate of clustering based on posterior MCMC samples; and plotpsm for a heat map of posterior similarity matrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(galaxy.fit)
x=data.frame(x=galaxy.fit$x)
data(galaxy.pred)
data(galaxy.draw)

# Find representative partition of posterior
psm=comp.psm(galaxy.draw)
galaxy.VI=minVI(psm,galaxy.draw,method=("all"),include.greedy=TRUE)
summary(galaxy.VI)
plot(galaxy.VI,data=x,dx=galaxy.fit$fx,xgrid=galaxy.pred$x,dxgrid=galaxy.pred$fx)

# Uncertainty in partition estimate
galaxy.cb=credibleball(galaxy.VI$cl[1,],galaxy.draw)
summary(galaxy.cb)
plot(galaxy.cb,data=x,dx=galaxy.fit$fx,xgrid=galaxy.pred$x,dxgrid=galaxy.pred$fx)

# Compare with heat map of posterior similarity matrix
plotpsm(psm)

muschellij2/mcclust.ext documentation built on May 26, 2019, 9:36 a.m.