Description Usage Arguments Details Value References See Also Examples
Computes the dissimilarity between n-dimensional vectors.
1 | metrics(vset, method = 'euclidean', p = 2)
|
vset |
matrix (n x m) where each column is a n-dimensional vector. |
method |
a character string indicating the distance/dissimilarity method to be used (see details). |
p |
power of the Minkowski distance. This parameter is only relevant if the method 'minkowski' has been selected. |
Although many of the offered methods compute a proper distance, that is not always the case. For instance, for a non null vector, v, the 'cosine' method gives d(v, 2v) = 0, violating the coincidence axiom. For that reason we prefer to use the term dissimilarity instead of distance. The methods offered can be grouped into families.
('euclidean', 'manhattan', 'minkowski', 'chebyshev')
Euclidean = sqrt( sum | P_i - Q_i |^2)
Manhattan = sum | P_i - Q_i |
Minkowski = ( sum| P_i - Q_i |^p)^1/p
Chebyshev = max | P_i - Q_i |
('sorensen', 'soergel', 'lorentzian', 'kulczynski', 'canberra')
Sorensen = sum | P_i - Q_i | / sum (P_i + Q_i)
Soergel = sum | P_i - Q_i | / sum max(P_i , Q_i)
Lorentzian = sum ln(1 + | P_i - Q_i |)
Kulczynski = sum | P_i - Q_i | / sum min(P_i , Q_i)
Canberra = sum | P_i - Q_i | / (P_i + Q_i)
('non-intersection', 'wavehedges', 'czekanowski', 'motyka')
Non-intersection = 1 - sum min(P_i , Q_i)
Wave-Hedges = sum | P_i - Q_i | / max(P_i , Q_i)
Czekanowski = sum | P_i - Q_i | / sum | P_i + Q_i |
Motyka = sum max(P_i , Q_i) / sum (P_i , Q_i)
('cosine', 'jaccard')
Cosine = - ln(0.5 (1 + (P_i Q_i) / sqrt(sum P_i^2) sqrt(sum Q_i^2)))
Jaccard = 1 - sum (P_i Q_i) / (sum P_i^2 + sum Q_i^2 - sum (P_i Q_i))
('bhattacharyya', 'squared_chord')
Bhattacharyya = - ln sum sqrt(P_i Q_i)
Squared-chord = sum ( sqrt(P_i) - sqrt(Q_i) )^2
('squared_chi')
Squared-Chi = sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
('kullback-leibler', 'jeffreys', 'jensen-shannon', 'jensen_difference')
Kullback-Leibler = sum P_i * log(P_i / Q_i)
Jeffreys = sum (P_i - Q_i) * log(P_i / Q_i)
Jensen-Shannon = 0.5(sum P_i ln(2P_i / (P_i + Q_i)) + sum Q_i ln(2Q_i / (P_i + Q_i)))
Jensen difference = sum (0.5(P_i log(P_i) + Q_i log(Q_i)) - 0.5(P_i + Q_i) ln(0.5(P_i + Q_i))
('hamming', 'mismatch', 'mismatchZero', 'binary')
Hamming = (# coordinates where P_i != Q_i) / n
Mismatch = # coordinates where that P_i != Q_i
MismatchZero = Same as mismatch but after removing the coordinates where both vectors have zero.
Binary = (# coordinates where a vector has 0 and the other has a non-zero value) / n.
('taneja', 'kumar-johnson', 'avg')
Taneja = sum ( P_i + Q_i / 2) log( P_i + Q_i / ( 2 sqrt( P_i * Q_i)) )
Kumar-Johnson = sum (P_i^2 - Q_i^2)^2 / 2 (P_i Q_i)^1.5
Avg = 0.5 (sum | P_i - Q_i| + max | P_i - Q_i |)
A matrix with the computed dissimilarity values.
Sung-Hyuk Cha (2007). International Journal of Mathematical Models and Methods in Applied Sciences. Issue 4, vol. 1
Luczac et al. (2019). Briefings in Bioinformatics 20: 1222-1237.
https://r-snippets.readthedocs.io/en/latest/real_analysis/metrics.html
vcos(), vdis()
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.