minbinder.ext: Minimize the posterior expected Binder's loss
In muschellij2/mcclust.ext: Point estimation and credible balls for Bayesian cluster analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

Finds a representative partition of the posterior by minimizing the posterior expected Binder's loss.

minbinder.ext(psm, cls.draw = NULL,
method = c("avg", "comp", "draws", "laugreen", "greedy", "all"),
max.k = NULL, include.lg = FALSE, include.greedy = FALSE,
start.cl.lg = NULL, start.cl.greedy = NULL, tol = 0.001,
maxiter = NULL, l = NULL, suppress.comment = TRUE)

`psm`	a posterior similarity matrix, which can be obtained from MCMC samples of clusterings through a call to `comp.psm`.
`cls.draw`	a matrix of the MCMC samples of clusterings of the `ncol(cls)` data points that have been used to compute `psm`. Note: `cls.draw` has to be provided if `method="draw"` or `"all"`.
`method`	the optimization method used. Should be one of `"avg"`, `"comp"`, `"draws"`, `"laugreen"`, `"greedy"` or `"all"`. Defaults to `"avg"`.
`max.k`	integer, if `method="avg"` or `"comp"` the maximum number of clusters up to which the hierarchical clustering is cut. Defaults to `ceiling(nrow(psm)/4)`.
`include.lg`	logical, should method `"laugreen"` be included when `method="all"`? Defaults to FALSE.
`include.greedy`	logical, should method `"greedy"` be included when `method="all"`? Defaults to FALSE.
`start.cl.lg`	clustering used as starting point for `method="laugreen"`. If `NULL` `start.cl= 1:nrow(psm)` is used.
`start.cl.greedy`	clustering used as starting point for `method="greedy"`. If `NULL` `start.cl= 1:nrow(psm)` is used.
`tol`	convergence tolerance for `method="laugreen"`.
`maxiter`	integer, maximum number of iterations for `method="greedy"`. Defaults to `2*nrow(psm)`.
`l`	integer, specifies the number of local partitions considered at each iteration for `method="greedy"`. Defaults to `2*nrow(psm)`.
`suppress.comment`	logical, for `method="greedy"`, prints a description of the current state (iteration number, number of clusters, posterior expected loss) at each iteration if set to FALSE. Defaults to TRUE.

This functions extends minbinder by implementing the greedy search algorithm to minimize the posterior expected Binder's loss.

Binder's loss counts the number of disagreements in all possible pairs of data points. The value returned is the posterior expected N-invariant Binder's loss, which is defined by multiplying Binder's loss times 2 and dividing by N^2, N representing the sample size, and is so-called because it only depends on the sample size through the proportion of data points in each cluster intersection.

The function minbinder is called for optimization methods method="avg", "comp", method="draws", and "laugreen".
Method "greedy" implements a greedy search algorithm, where at each iteration, we consider the l closest ancestors or descendants and move in the direction of minimum posterior expected loss with the N-invariant Binder's loss as the distance. We recommend trying different starting locations cl.start and values of l that control the amount of local exploration. Depending on the starting location and l, the method can take some time to converge, thus it is only included in method="all" if include.greedy=TRUE. If method="all", the starting location cl.start defaults to the best clustering found by the other methods. A description of the algorithm at every iteration is printed if suppress.comment=FALSE. If method="all" all minimization methods except "laugreen" and "greedy" are applied by default.

`cl`	clustering with minimal value of expected loss. If `method="all"` a matrix containing the clustering with the smallest value of the expected loss over all methods in the first row and the clusterings of the individual methods in the next rows.
`value`	value of posterior expected loss. A vector corresponding to the rows of `cl` if `method="all"`.
`method`	the optimization method used.
`iter.greedy`	if `method="greedy"` or `method="all"` and `include.greedy=T` the number of iterations the method needed to converge.
`iter.lg`	if `method="laugreen"` or `method="all"` and `include.lg=T` the number of iterations the method needed to converge.

Sara Wade, sara.wade@eng.cam.ac.uk

Binder, D.A. (1978) Bayesian cluster analysis, Biometrika 65, 31–38.

Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, 4,367–391.

Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526–558.

Wade, S. and Ghahramani, Z. (2015) Bayesian cluster analysis: Point estimation and credible balls. Sumbitted. arXiv:1505.03339.

summary.c.estimate and plot.c.estimate to summarize and plot the resulting output from minVI or minbinder.ext; comp.psm for computing posterior similarity matrix; maxpear, minVI, and medv for other point estimates of clustering based on posterior; and credibleball to compute credible ball characterizing uncertainty around the point estimate.

data(ex2.data)
x=data.frame(ex2.data[,c(1,2)])
cls.true=ex2.data$cls.true
plot(x[,1],x[,2],xlab="x1",ylab="x2")
k=max(cls.true)
for(l in 2:k){
points(x[cls.true==l,1],x[cls.true==l,2],col=l)}

# Find representative partition of posterior
data(ex2.draw)
psm=comp.psm(ex2.draw)
ex2.B=minbinder.ext(psm,ex2.draw,method=("all"),include.greedy=TRUE)
summary(ex2.B)
plot(ex2.B,data=x)

# Compare with VI
ex2.VI=minVI(psm,ex2.draw,method=("all"),include.greedy=TRUE)
summary(ex2.VI)
plot(ex2.VI,data=x)