partitionComparison-package | R Documentation |
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00357-006-0017-z")} and Meila M (2007) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2006.11.013")}. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
This package provides a large collection of measures to compare two partitions. Some survey articles for these measures are cited below, the seminal papers for each individual measure is provided with the function definition.
Most functionality is implemented as S4 classes and methods so that an
adoption is easily possible for special needs and specifications.
The main class is Partition
which merely wraps an atomic
vector of length n
for storing the class label of each object.
The computation of all measures is designed to work on vectors
of class labels.
All partition comparison methods can be called in the
same way: <measure method>(p, q)
with p, q
being the two
partitions (as Partition
instances).
One often does not explicitly want to transform the vector of class labels
(as output of another package's function/algorithm) into
Partition
instances before using measures from this
package. For convenience, the function
registerPartitionVectorSignatures
exists which dynamically creates
versions of all measures that will directly work with plain R vectors.
Maintainer: Fabian Ball mail@fabian-ball.de [copyright holder, contributor]
Other contributors:
Andreas Geyer-Schulz andreas.geyer-schulz@kit.edu [copyright holder]
Albatineh2006partitionComparison
\insertRefMeila2007partitionComparison
Useful links:
Report bugs at https://github.com/KIT-IISM-EM/partitionComparison/issues
# Generate some data
set.seed(42)
data <- cbind(x=c(rnorm(50), rnorm(30, mean=5)), y=c(rnorm(50), rnorm(30, mean=5)))
# Run k-means with two/three centers
data.km2 <- kmeans(data, 2)
data.km3 <- kmeans(data, 3)
# Load this library
library(partitionComparison)
# Register the measures to take ANY input
registerPartitionVectorSignatures(environment())
# Compare the clusters
randIndex(data.km2$cluster, data.km3$cluster)
# [1] 0.8101266
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.