dista: Distance between vectors and a matrix - Sum of all pairwise...

View source: R/dista.R

Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix.R Documentation

Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix.

Description

Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix..

Usage

dista(xnew, x, type = "euclidean", k = 0, index = FALSE, 
 trans = TRUE, square = FALSE, p = 0, parallel = FALSE)
total.dista(xnew, x, type = "euclidean", k = 0,
 square = FALSE, p = 0, parallel = FALSE)  

Arguments

xnew

A matrix with some data or a vector.

x

A matrix with the data, where rows denotes observations (vectors) and the columns contain the variables.

type

This can be either "euclidean" or "manhattan".

k

Should the k smaller distances or their indices be returned? If k > 0 this will happen.

index

In case k is greater than 0, you have the option to get the indices of the k smallest distances.

trans

Do you want the returned matrix to be transposed? TRUE or FALSE.

square

If you choose "euclidean" or "hellinger" as the method, then you can have the option to return the squared Euclidean distances by setting this argument to TRUE.

p

This is for the the Minkowski, the power of the metric.

parallel

For methods kullback_leibler, jensen_shannon and itakura_saito, you can run the algorithm in parallel.

Details

The target of this function is to calculate the distances between xnew and x without having to calculate the whole distance matrix of xnew and x. The latter does extra calculations, which can be avoided.

  • euclidean : \sum \sqrt( \sum | P_i - Q_i |^2)

  • manhattan : \sum \sum | P_i - Q_i |

  • minimum : \sum \min | P_i - Q_i |

  • maximum : \sum \max | P_i - Q_i |

  • minkowski : \sum ( \sum | P_i - Q_i |^p)^(1/p)

  • bhattacharyya : \sum - ln \sum \sqrt(P_i * Q_i)

  • hellinger : \sum 2 * \sqrt( 1 - \sum \sqrt(P_i * Q_i))

  • kullback_leibler : \sum \sum P_i * log(P_i / Q_i)

  • jensen_shannon : \sum 0.5 * ( \sum P_i * log(2 * P_i / P_i + Q_i) + \sum Q_i * log(2 * Q_i / P_i + Q_i))

  • canberra : \sum \sum | P_i - Q_i | / (P_i + Q_i)

  • chi_square X^2 : \sum \sum ( (P_i - Q_i )^2 / (P_i + Q_i) )

  • soergel : \sum \sum | P_i - Q_i | / \sum \max(P_i , Q_i)

  • sorensen : \sum \sum | P_i - Q_i | / \sum (P_i + Q_i)

  • cosine : \sum (P_i * Q_i) / \sqrt(\sum P_i^2) * \sqrt(\sum Q_i^2)

  • wave_hedges : \sum \sum | P_i - Q_i | / \max(P_i , Q_i)

  • motyka : \sum \sum \min(P_i , Q_i) / (P_i + Q_i)

  • harmonic_mean : 2 * \sum (P_i * Q_i) / (P_i + Q_i)

  • jeffries_matusita : \sum \sqrt( 2 - 2 * \sum \sqrt(P_i * Q_i))

  • gower : \sum 1/d * \sum | P_i - Q_i |

  • kulczynski : \sum 1 / \sum | P_i - Q_i | / \sum \min(P_i , Q_i)

Value

A matrix with the distances of each xnew from each vector of x. The number of rows of the xnew and and the number of columns of xnew are the dimensions of this matrix.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris <mtsagris@uoc.gr> and Manos Papadakis <papadakm95@gmail.com>.

See Also

mahala, Dist, total.dist, total.dista

Examples

xnew <- as.matrix( iris[1:10, 1:4] )
x <- as.matrix( iris[-c(1:10), 1:4] )
a <- dista(xnew, x)
b <- as.matrix( dist( rbind(xnew, x) ) )
b <- b[ 1:10, -c(1:10) ]
sum( abs(a - b) )

## see the time
x <- matrix( rnorm(1000 * 4), ncol = 4 )
dista(xnew, x)
as.matrix( dist( rbind(xnew, x) ) )

x<-b<-a<-xnew<-NULL

Rfast documentation built on Nov. 9, 2023, 5:06 p.m.