rdist: rdist: an R package for distances

View source: R/distance_functions.r

rdistR Documentation

rdist: an R package for distances

Description

rdist provide a common framework to calculate distances. There are three main functions:

  • rdist computes the pairwise distances between observations in one matrix and returns a dist object,

  • pdist computes the pairwise distances between observations in one matrix and returns a matrix, and

  • cdist computes the distances between observations in two matrices and returns a matrix.

In particular the cdist function is often missing in other distance functions. All calculations involving NA values will consistently return NA.

Usage

rdist(X, metric = "euclidean", p = 2L)

pdist(X, metric = "euclidean", p = 2)

cdist(X, Y, metric = "euclidean", p = 2)

Arguments

X, Y

A matrix

metric

The distance metric to use

p

The power of the Minkowski distance

Details

Available distance measures are (written for two vectors v and w):

  • "euclidean": \sqrt{\sum_i(v_i - w_i)^2}

  • "minkowski": (\sum_i|v_i - w_i|^p)^{1/p}

  • "manhattan": \sum_i(|v_i-w_i|)

  • "maximum" or "chebyshev": \max_i(|v_i-w_i|)

  • "canberra": \sum_i(\frac{|v_i-w_i|}{|v_i|+|w_i|})

  • "angular": \cos^{-1}(cor(v, w))

  • "correlation": \sqrt{\frac{1-cor(v, w)}{2}}

  • "absolute_correlation": \sqrt{1-|cor(v, w)|^2}

  • "hamming": (\sum_i v_i \neq w_i) / \sum_i 1

  • "jaccard": (\sum_i v_i \neq w_i) / \sum_i 1_{v_i \neq 0 \cup w_i \neq 0}

  • Any function that defines a distance between two vectors.


blasern/rdist documentation built on Aug. 29, 2023, 12:27 p.m.