For two clusterings of the same data set, this function calculates the similarity statistic specified of the clusterings from the comemberships of the observations. Basically, the comembership is defined as the pairs of observations that are clustered together.

1 2 3 | ```
cluster_similarity(labels1, labels2,
similarity = c("jaccard", "rand"),
method = "independence")
``` |

`labels1` |
a vector of |

`labels2` |
a vector of |

`similarity` |
the similarity statistic to calculate |

`method` |
the model under which the statistic was derived |

To calculate the similarity, we compute the 2x2 contingency table, consisting of the following four cells:

- n_11
the number of observation pairs where both observations are comembers in both clusterings

- n_10
the number of observation pairs where the observations are comembers in the first clustering but not the second

- n_01
the number of observation pairs where the observations are comembers in the second clustering but not the first

- n_00
the number of observation pairs where neither pair are comembers in either clustering

Currently, we have implemented the following similarity statistics:

Rand index

Jaccard coefficient

To compute the contingency table, we use the
`comembership_table`

function.

the similarity between the two clusterings

1 2 3 4 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.