mclustcomp: Measures for Comparing Clusterings

View source: R/mclustcomp.R

mclustcompR Documentation

Measures for Comparing Clusterings

Description

Given two partitions or clusterings C_1 and C_2, it returns community comparison scores corresponding with a set of designated methods. Note that two label vectors should be of same length having either numeric or factor type. Currently we have 3 classes of methods depending on methodological philosophy behind each. See below for the taxonomy.

Usage

mclustcomp(x, y, types = "all", tversky.param = list())

Arguments

x, y

vectors of clustering labels

types

"all" for returning scores for every available measure. Either a single score name or a vector of score names can be supplied. See the section for the list of the methods for details.

tversky.param

a list of parameters for Tversky index; alpha and beta for weight parameters, and sym, a logical where FALSE stands for original method, TRUE for a revised variant to symmetrize the score. Default (alpha,beta)=(1,1).

Value

a data frame with columns types and corresponding scores.

Category 1. Counting Pairs

TYPE FULL NAME
'adjrand' Adjusted Rand index.
'chisq' Chi-Squared Coefficient.
'fmi' Fowlkes-Mallows index.
'jaccard' Jaccard index.
'mirkin' Mirkin Metric, or Equivalence Mismatch Distance.
'overlap' Overlap Coefficient, or Szymkiewicz-Simpson coefficient.
'pd' Partition Difference.
'rand' Rand Index.
'sdc' Sørensen–Dice Coefficient.
'smc' Simple Matching Coefficient.
'tanimoto' Tanimoto index.
'tversky' Tversky index.
'wallace1' Wallace Criterion Type 1.
'wallace2' Wallace Criterion Type 2.

Note that Tanimoto Coefficient and Dice's coefficient are special cases with (alpha,beta) = (1,1) and (0.5,0.5), respectively.

Category 2. Set Overlaps/Matching

TYPE FULL NAME
'f' F-Measure.
'mhm' Meila-Heckerman Measure.
'mmm' Maximum-Match Measure.
'vdm' Van Dongen Measure.

Category 3. Information Theory

TYPE FULL NAME
'jent' Joint Entropy
'mi' Mutual Information.
'nmi1' Normalized Mutual Information by Strehl and Ghosh.
'nmi2' Normalized Mutual Information by Fred and Jain.
'nmi3' Normalized Mutual Information by Danon et al.
'nvi' Normalized Variation of Information.
'vi' Variation of Information.

References

\insertRef

strehl_cluster_2003mclustcomp

\insertRef

meila_comparing_2007mclustcomp

\insertRef

goos_comparing_2003mclustcomp

\insertRef

wagner_comparing_2007mclustcomp

\insertRef

albatineh_similarity_2006mclustcomp

\insertRef

mirkin_eleven_2001mclustcomp

\insertRef

rand_objective_1971mclustcomp

\insertRef

kuncheva_using_2004mclustcomp

\insertRef

fowlkes_method_1983mclustcomp

\insertRef

dongen_performance_2000mclustcomp

\insertRef

jaccard_distribution_1912mclustcomp

\insertRef

li_combining_2010mclustcomp

\insertRef

larsen_fast_1999mclustcomp

\insertRef

meila_experimental_2001mclustcomp

\insertRef

cover_elements_2006mclustcomp

\insertRef

ana_robust_2003mclustcomp

\insertRef

wallace_comment_1983mclustcomp

\insertRef

simpson_mammals_1943mclustcomp

\insertRef

dice_measures_1945mclustcomp

\insertRef

segaran_programming_2007mclustcomp

\insertRef

tversky_features_1977mclustcomp

\insertRef

danon_comparing_2005mclustcomp

\insertRef

lancichinetti_detecting_2009mclustcomp

Examples

## example 1. compare two identical clusterings
x = sample(1:5,20,replace=TRUE) # label from 1 to 5, 10 elements
y = x                           # set two labels x and y equal
mclustcomp(x,y)                 # show all results

## example 2. selection of a few methods
z = sample(1:4,20,replace=TRUE)           # generate a non-trivial clustering
cmethods = c("jaccard","tanimoto","rand") # select 3 methods
mclustcomp(x,z,types=cmethods)            # test with the selected scores

## example 3. tversky.param
tparam = list()                           # create an empty list
tparam$alpha = 2
tparam$beta  = 3
tparam$sym   = TRUE
mclustcomp(x,z,types="tversky")           # default set as Tanimoto case.
mclustcomp(x,z,types="tversky",tversky.param=tparam)



kisungyou/mclustcomp documentation built on Feb. 9, 2023, 8:50 p.m.