mapClusterings: Compare two clusterings of the same elements

Description Usage Arguments Details Value See Also Examples

Description

This function solves this problem: Given two clusterings of the same elements, how many elements are in a different cluster, when we map for the best the clusters of the first clustering to the second?

The function finds the best mapping between clusters of cl1 with clusters of cl2 (injective in both directions - but not necessarly surjective), that is such that the number of differently clustered elements is minimum.

An element e is identically clustered if one of these conditions is true:

  1. it belongs to a cluster i in cl1 and j in cl2 and i is associated with j.

  2. e is unclustered in both cl1 and cl2

Differently clustered is the opposite.

Usage

1
mapClusterings(cl1, cl2, verbose=FALSE, use.solve.LSAP=NULL)

Arguments

cl1

A vector of integers such that cl1[i] is the number (>= 1) of the cluster in which i is. 0 is a special value that means the element is not clustered. (This is the format of the cluster field returned by CLAG.clust)

cl2

The same for the other clustering.

verbose

Display information about tested cases.

use.solve.LSAP

Whether to use solve_LSAP function from package clue. If NULL (the default), solve_LSAP will be used only if the package clue is installed.

Details

We propose two methods for computing the number of differently clustered elements: using the solve_LSAP function from package clue or using our branch and bound algorithm. You can choose which one to use by setting the use.solve.LSAP parameter.

The solve_LSAP function is much faster both in practice and theory (it is polynomial) but requires the clue package to be installed. See its own help page for information about how it works.

The branch and bound algorithm we provide works like this: For a cluster i in the first clustering, it tries to associate it with every cluster j not already associated (and also to leave it alone), and for each of this choices explores recursively the choice for cluster i+1. In the worst case it runs in exponential time.

The exploration is optimized by:

Value

The returned value is a list with several members:

ndiff

The number of elements differently clustered.

assoc

Cluster mapping. assoc[i] = j means cluster i in first clustering is associated to cluster j in second clustering. If j = 0, i is not associated to any cluster.

diffclust

A vector of integers of length ndiff giving the indices of elements that are differently clustered.

See Also

compareClusterings

Examples

1
mapClusterings(c(0,1,1,1,2,2,2,3,3,3,3,0), c(3,1,3,3,1,1,1,2,0,2,4,0))

CLAG documentation built on May 2, 2019, 5:49 p.m.