Description Usage Arguments Details Value References See Also Examples
Compute the agreement between (ensembles) of partitions or hierarchies.
1  cl_agreement(x, y = NULL, method = "euclidean", ...)

x 
an ensemble of partitions or hierarchies and dissimilarities,
or something coercible to that (see 
y 

method 
a character string specifying one of the builtin
methods for computing agreement, or a function to be taken as
a userdefined method. If a character string, its lowercased
version is matched against the lowercased names of the available
builtin methods using 
... 
further arguments to be passed to methods. 
If y
is given, its components must be of the same kind as those
of x
(i.e., components must either all be partitions, or all be
hierarchies or dissimilarities).
If all components are partitions, the following builtin methods for measuring agreement between two partitions with respective membership matrices u and v (brought to a common number of columns) are available:
"euclidean"
1  d / m, where d is the Euclidean dissimilarity of the memberships, i.e., the square root of the minimal sum of the squared differences of u and all column permutations of v, and m is an upper bound for the maximal Euclidean dissimilarity. See Dimitriadou, Weingessel and Hornik (2002).
"manhattan"
1  d / m, where d is the Manhattan dissimilarity of the memberships, i.e., the minimal sum of the absolute differences of u and all column permutations of v, and m is an upper bound for the maximal Manhattan dissimilarity.
"Rand"
the Rand index (the rate of distinct pairs of objects both in the same class or both in different classes in both partitions), see Rand (1971) or Gordon (1999), page 198. For soft partitions, (currently) the Rand index of the corresponding nearest hard partitions is used.
"cRand"
the Rand index corrected for agreement by chance, see Hubert and Arabie (1985) or Gordon (1999), page 198. Can only be used for hard partitions.
"NMI"
Normalized Mutual Information, see Strehl and Ghosh (2002). For soft partitions, (currently) the NMI of the corresponding nearest hard partitions is used.
"KP"
the KatzPowell index, i.e., the productmoment correlation coefficient between the elements of the comembership matrices C(u) = u u' and C(v), respectively, see Katz and Powell (1953). For soft partitions, (currently) the KatzPowell index of the corresponding nearest hard partitions is used. (Note that for hard partitions, the (i,j) entry of C(u) is one iff objects i and j are in the same class.)
"angle"
the maximal cosine of the angle between the elements of u and all column permutations of v.
"diag"
the maximal coclassification rate, i.e., the maximal rate of objects with the same class ids in both partitions after arbitrarily permuting the ids.
"FM"
the index of Fowlkes and Mallows (1983), i.e., the ratio N_xy / sqrt(N_x N_y) of the number N_xy of distinct pairs of objects in the same class in both partitions and the geometric mean of the numbers N_x and N_y of distinct pairs of objects in the same class in partition x and partition y, respectively. For soft partitions, (currently) the FowlkesMallows index of the corresponding nearest hard partitions is used.
"Jaccard"
the Jaccard index, i.e., the ratio of the numbers of distinct pairs of objects in the same class in both partitions and in at least one partition, respectively. For soft partitions, (currently) the Jaccard index of the corresponding nearest hard partitions is used.
"purity"
the purity of the classes of x
with
respect to those of y
, i.e.,
∑_j \max_i n_{ij} / n,
where n_{ij} is the joint frequency of objects in class
i for x
and in class j for y
, and
n is the total number of objects.
"PS"
Prediction Strength, see Tibshirani and Walter
(2005): the minimum, over all classes j of y
, of the
maximal rate of objects in the same class for x
and in
class j for y
.
If all components are hierarchies, available builtin methods for measuring agreement between two hierarchies with respective ultrametrics u and v are as follows.
"euclidean"
1 / (1 + d), where d is the Euclidean dissimilarity of the ultrametrics (i.e., the square root of the sum of the squared differences of u and v).
"manhattan"
1 / (1 + d), where d is the Manhattan dissimilarity of the ultrametrics (i.e., the sum of the absolute differences of u and v).
"cophenetic"
The cophenetic correlation coefficient. (I.e., the productmoment correlation of the ultrametrics.)
"angle"
the cosine of the angle between the ultrametrics.
"gamma"
1  d, where d is the rate of inversions between the associated ultrametrics (i.e., the rate of pairs (i,j) and (k,l) for which u_{ij} < u_{kl} and v_{ij} > v_{kl}). (This agreement measure is a linear transformation of Kruskal's gamma.)
The measures based on ultrametrics also allow computing agreement with
“raw” dissimilarities on the underlying objects (R objects
inheriting from class "dist"
).
If a userdefined agreement method is to be employed, it must be a function taking two clusterings as its arguments.
Symmetric agreement objects of class "cl_agreement"
are
implemented as symmetric proximity objects with selfproximities
identical to one, and inherit from class "cl_proximity"
. They
can be coerced to dense square matrices using as.matrix
. It is
possible to use 2index matrixstyle subscripting for such objects;
unless this uses identical row and column indices, this results in a
(nonsymmetric agreement) object of class "cl_cross_agreement"
.
If y
is NULL
, an object of class "cl_agreement"
containing the agreements between the all pairs of components of
x
. Otherwise, an object of class "cl_cross_agreement"
with the agreements between the components of x
and the
components of y
.
E. Dimitriadou, A. Weingessel and K. Hornik (2002).
A combination scheme for fuzzy clustering.
International Journal of Pattern Recognition and Artificial
Intelligence, 16, 901–912.
doi: 10.1142/S0218001402002052.
E. B. Fowlkes and C. L. Mallows (1983).
A method for comparing two hierarchical clusterings.
Journal of the American Statistical Association, 78,
553–569.
doi: 10.1080/01621459.1983.10478008.
A. D. Gordon (1999). Classification (2nd edition). Boca Raton, FL: Chapman & Hall/CRC.
L. Hubert and P. Arabie (1985). Comparing partitions. Journal of Classification, 2, 193–218. doi: 10.1007/bf01908075.
W. M. Rand (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850. doi: 10.2307/2284239.
L. Katz and J. H. Powell (1953). A proposed index of the conformity of one sociometric measurement to another. Psychometrika, 18, 249–256. doi: 10.1007/BF02289063.
A. Strehl and J. Ghosh (2002).
Cluster ensembles — A knowledge reuse framework for combining
multiple partitions.
Journal of Machine Learning Research, 3, 583–617.
http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf.
R. Tibshirani and G. Walter (2005). Cluster validation by Prediction Strength. Journal of Computational and Graphical Statistics, 14/3, 511–528. doi: 10.1198/106186005X59243.
cl_dissimilarity
;
classAgreement
in package e1071.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ## An ensemble of partitions.
data("CKME")
pens < CKME[1 : 20] # for saving precious time ...
summary(c(cl_agreement(pens)))
summary(c(cl_agreement(pens, method = "Rand")))
summary(c(cl_agreement(pens, method = "diag")))
cl_agreement(pens[1:5], pens[6:7], method = "NMI")
## Equivalently, using subscripting.
cl_agreement(pens, method = "NMI")[1:5, 6:7]
## An ensemble of hierarchies.
d < dist(USArrests)
hclust_methods <
c("ward", "single", "complete", "average", "mcquitty")
hclust_results < lapply(hclust_methods, function(m) hclust(d, m))
names(hclust_results) < hclust_methods
hens < cl_ensemble(list = hclust_results)
summary(c(cl_agreement(hens)))
## Note that the Euclidean agreements are *very* small.
## This is because the ultrametrics differ substantially in height:
u < lapply(hens, cl_ultrametric)
round(sapply(u, max), 3)
## Rescaling the ultrametrics to [0, 1] gives:
u < lapply(u, function(x) (x  min(x)) / (max(x)  min(x)))
shens < cl_ensemble(list = lapply(u, as.cl_dendrogram))
summary(c(cl_agreement(shens)))
## Au contraire ...
summary(c(cl_agreement(hens, method = "cophenetic")))
cl_agreement(hens[1:3], hens[4:5], method = "gamma")

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3093 0.3097 0.4229 0.5741 0.9368 1.0000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7191 0.7381 0.7527 0.8369 0.9949 1.0000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5230 0.5235 0.6670 0.7319 0.9960 1.0000
Crossagreements using normalized mutual information:
[,1] [,2]
[1,] 0.5992095 1.0000000
[2,] 0.5992095 1.0000000
[3,] 1.0000000 0.5992095
[4,] 0.9405846 0.5998537
[5,] 1.0000000 0.5992095
Crossagreements using normalized mutual information:
[,1] [,2]
[1,] 0.5992095 1.0000000
[2,] 0.5992095 1.0000000
[3,] 1.0000000 0.5992095
[4,] 0.9405846 0.5998537
[5,] 1.0000000 0.5992095
The "ward" method has been renamed to "ward.D"; note new "ward.D2"
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.910e05 2.086e05 2.122e04 3.399e04 3.091e04 1.971e03
ward single complete average mcquitty
2177.882 38.528 293.623 152.314 173.112
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.07168 0.09129 0.18230 0.23756 0.35971 0.53318
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4834 0.5694 0.9881 0.8146 0.9953 0.9986
Crossagreements using rate of inversions:
average mcquitty
ward 0.9762118 0.9774923
single 0.8807656 0.8846339
complete 0.9903135 0.9865946
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.