minbinder | R Documentation |
Based on a posterior similarity matrix of a sample of clusterings minbinder
finds the clustering that minimizes the
posterior expectation of Binders loss function, while binder
computes the posterior expected loss for several provided clusterings.
minbinder(psm, cls.draw = NULL, method = c("avg", "comp", "draws", "laugreen","all"), max.k = NULL, include.lg = FALSE, start.cl = NULL, tol = 0.001) binder(cls,psm) laugreen(psm, start.cl, tol=0.001)
psm |
a posterior similarity matrix, usually obtained from a call to |
cls, cls.draw |
a matrix in which every row corresponds to a clustering of the |
method |
the maximization method used. Should be one of |
max.k |
integer, if |
include.lg |
logical, should method |
start.cl |
clustering used as starting point for |
tol |
convergence tolerance for |
The posterior expected loss is the sum of the absolute differences of the indicator function of observation i and j clustering together and the posterior probability that they are in one cluster.
For method="avg"
and "comp"
1-psm
is used as a distance matrix for hierarchical clustering with average/complete linkage.
The hierachical clustering is cut for the cluster sizes 1:max.k
and the posterior expected loss is computed for these clusterings.
Method "draws"
simply computes the posterior expected loss for each row of cls.draw
and takes the minimum.
Method "laugreen"
implements the algorithm of Lau and Green (2007), which is based on binary integer programming. Since the method can
take some time to converge it is only used if explicitly demanded with method="laugreen"
or method="all"
and include.lg=TRUE
.
If method="all"
all minimization methods except "laugreen"
are applied.
cl |
clustering with minimal value of expected loss. If |
value |
value of posterior expected loss. A vector corresponding to the rows of |
method |
the maximization method used. |
iter.lg |
if |
Arno Fritsch, arno.fritsch@tu-dortmund.de
Binder, D.A. (1978) Bayesian cluster analysis, Biometrika 65, 31–38.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526–558.
comp.psm
for computing posterior similarity matrix, maxpear
, medv
, relabel
for other possibilities for processing a sample of clusterings. lp
for the linear programming.
data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) mbind2 <- minbinder(psm2) table(mbind2$cl, tru.class) # Does hierachical clustering with Ward's method lead # to a lower value of Binders loss? hclust.ward <- hclust(as.dist(1-psm2), method="ward") cls.ward <- t(apply(matrix(1:20),1, function(k) cutree(hclust.ward,k=k))) ward2 <- binder(cls.ward, psm2) min(ward2) < mbind2$value # Method laugreen is applied to 40 randomly selected observations ind <- sample(1:400, 40) mbind.lg <- minbinder(psm2[ind, ind],cls.draw2[,ind], method="all", include.lg=TRUE) mbind.lg$value
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.