Home

/

GitHub

/

kisungyou/mclustcomp

/

mclustcomp: Measures for Comparing Clusterings

mclustcomp: Measures for Comparing Clusterings
In kisungyou/mclustcomp: Measures for Comparing Clusters

View source: R/mclustcomp.R

mclustcomp

R Documentation

Measures for Comparing Clusterings

Description

Given two partitions or clusterings C_1 and C_2, it returns community comparison scores corresponding with a set of designated methods. Note that two label vectors should be of same length having either numeric or factor type. Currently we have 3 classes of methods depending on methodological philosophy behind each. See below for the taxonomy.

Usage

mclustcomp(x, y, types = "all", tversky.param = list())

Arguments

`x, y`	vectors of clustering labels
`types`	`"all"` for returning scores for every available measure. Either a single score name or a vector of score names can be supplied. See the section for the list of the methods for details.
`tversky.param`	a list of parameters for Tversky index; `alpha` and `beta` for weight parameters, and `sym`, a logical where `FALSE` stands for original method, `TRUE` for a revised variant to symmetrize the score. Default (alpha,beta)=(1,1).

Value

a data frame with columns types and corresponding scores.

Category 1. Counting Pairs

TYPE	FULL NAME
`'adjrand'`	Adjusted Rand index.
`'chisq'`	Chi-Squared Coefficient.
`'fmi'`	Fowlkes-Mallows index.
`'jaccard'`	Jaccard index.
`'mirkin'`	Mirkin Metric, or Equivalence Mismatch Distance.
`'overlap'`	Overlap Coefficient, or Szymkiewicz-Simpson coefficient.
`'pd'`	Partition Difference.
`'rand'`	Rand Index.
`'sdc'`	Sørensen–Dice Coefficient.
`'smc'`	Simple Matching Coefficient.
`'tanimoto'`	Tanimoto index.
`'tversky'`	Tversky index.
`'wallace1'`	Wallace Criterion Type 1.
`'wallace2'`	Wallace Criterion Type 2.

Note that Tanimoto Coefficient and Dice's coefficient are special cases with (alpha,beta) = (1,1) and (0.5,0.5), respectively.

Category 2. Set Overlaps/Matching

TYPE	FULL NAME
`'f'`	F-Measure.
`'mhm'`	Meila-Heckerman Measure.
`'mmm'`	Maximum-Match Measure.
`'vdm'`	Van Dongen Measure.

Category 3. Information Theory

TYPE	FULL NAME
`'jent'`	Joint Entropy
`'mi'`	Mutual Information.
`'nmi1'`	Normalized Mutual Information by Strehl and Ghosh.
`'nmi2'`	Normalized Mutual Information by Fred and Jain.
`'nmi3'`	Normalized Mutual Information by Danon et al.
`'nvi'`	Normalized Variation of Information.
`'vi'`	Variation of Information.

References

\insertRef

strehl_cluster_2003mclustcomp

\insertRef

meila_comparing_2007mclustcomp

\insertRef

goos_comparing_2003mclustcomp

\insertRef

wagner_comparing_2007mclustcomp

\insertRef

albatineh_similarity_2006mclustcomp

\insertRef

mirkin_eleven_2001mclustcomp

\insertRef

rand_objective_1971mclustcomp

\insertRef

kuncheva_using_2004mclustcomp

\insertRef

fowlkes_method_1983mclustcomp

\insertRef

dongen_performance_2000mclustcomp

\insertRef

jaccard_distribution_1912mclustcomp

\insertRef

li_combining_2010mclustcomp

\insertRef

larsen_fast_1999mclustcomp

\insertRef

meila_experimental_2001mclustcomp

\insertRef

cover_elements_2006mclustcomp

\insertRef

ana_robust_2003mclustcomp

\insertRef

wallace_comment_1983mclustcomp

\insertRef

simpson_mammals_1943mclustcomp

\insertRef

dice_measures_1945mclustcomp

\insertRef

segaran_programming_2007mclustcomp

\insertRef

tversky_features_1977mclustcomp

\insertRef

danon_comparing_2005mclustcomp

\insertRef

lancichinetti_detecting_2009mclustcomp

Examples

## example 1. compare two identical clusterings
x = sample(1:5,20,replace=TRUE) # label from 1 to 5, 10 elements
y = x                           # set two labels x and y equal
mclustcomp(x,y)                 # show all results

## example 2. selection of a few methods
z = sample(1:4,20,replace=TRUE)           # generate a non-trivial clustering
cmethods = c("jaccard","tanimoto","rand") # select 3 methods
mclustcomp(x,z,types=cmethods)            # test with the selected scores

## example 3. tversky.param
tparam = list()                           # create an empty list
tparam$alpha = 2
tparam$beta  = 3
tparam$sym   = TRUE
mclustcomp(x,z,types="tversky")           # default set as Tanimoto case.
mclustcomp(x,z,types="tversky",tversky.param=tparam)

kisungyou/mclustcomp documentation built on Feb. 9, 2023, 8:50 p.m.