similarity: Similarity measurement for binary matrices.

Description Usage Arguments Details Value References Examples

View source: R/similarity-functions.R

Description

This function calculates different similarity indices for a binary matrix.

Usage

1
2
  similarity(x,
    method = c("soerensen-dice", "soerensen", "dice", "jaccard", "simple", "simplematching", "rogers", "tanimoto", "rogers-tanimoto"))

Arguments

x

a binary matrix. Please take care that rows contain the samples and columns the features.

method

the similarity index to be used.

Details

p = number of binary variables, n_{00} + n_{01} + n_{10} + n_{11}
i and j = two observations
n_{00} = ∑\limits_{k=1}^p I(x_{ik} = 0, x_{jk} = 0
n_{01} = ∑\limits_{k=1}^p I(x_{ik} = 0, x_{jk} = 1
n_{10} = ∑\limits_{k=1}^p I(x_{ik} = 1, x_{jk} = 0
n_{11} = ∑\limits_{k=1}^p I(x_{ik} = 1, x_{jk} = 1

Asymmetric indices:

Soerensen/Dice (soerensen-dice) \frac{2n_{11}}{n_{01} + n_{10} + 2n_{11}}
Jaccard (jaccard) \frac{n_{11}}{n_{01} + n_{10} + n_{11}}

Symmetric indices:

Simple Matching (simple) \frac{n_{00} + n_{11}}{p}
Rogers and Tanimoto (rogers-tanimoto) \frac{n_{00} + n_{11}}{n_{00} + 2(n_{01} + n_{10}) + n_{11}}

Value

An object of class similarity.

The lower triangle of the similarity matrix stored by columns in a vector.

References

Dice, L. R. (1945), “Measures of the amount of ecological association between species.”, Ecology 26: 297-302.

Jaccard, P. (1901), “\'Etude comparative de la distribution florale dans une portion des Alpes et des Jura”, Bulletin de la Soci\'et\'e Vaudoise des Sciences Naturelles 37: 547-579.

Rogers, D. J. and Tanimoto, T. T. (1960), “A computer program for classifying plants.”, Science 132: 1115-1118.

Sokal, R. R. and Michener, C. D. (1956), “A statistical method for evaluating systematic relationships.”, University of Kansas Science Bulletin 38: 1409-1438.

Soerensen, T. (1948), “A method of establishing groups of equal amplitude in plant sociology based on similarity of species content.”, Biologiske Skrifter 4: 1-34.

Examples

1
2
3
4
5
library("similarity")

a <- matrix(c(1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1),
            ncol=3, nrow=4, byrow=TRUE)
similarity(a, "soerensen")

sgibb/similarity documentation built on May 29, 2019, 8:04 p.m.