disttools"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

disttools provides the functionality needed to rapidly and intuitively retrieve information from large 'dist' objects. This functionality is encoded in the get_dists function. The function's main use cases are outlined below.

Usage

After installing the package, it can be loaded by executing:

# Load the package.
library(disttools)

Below, some example data is randomly generated.

# Create some data to play with.
set.seed(123456789)
mat <- matrix(rnorm(10), ncol = 2)

A 'dist' object can be generated for these points by executing:

# Generate a 'dist' object.
mat_dists <- dist(mat)

Below, a set of pairs of points are specified for distance retrieval.

# Specify index pairs of interest.
indices <- matrix(c(1,2,3,2,4,4), ncol = 2, byrow = TRUE)

Example 1 - Matrix-based retrieval

The function get_dists can be used to access the distance between pairs of points. This can be accomplished via two methods. First, a matrix of index pairs can be passed to the function along with the 'dist' object itself.

# Retrieve distances using the matrix-based method.
get_dists(mat_dists, indices)

Example 2 - Vector-based retrieval

Second, two vectors corresponding to the columns of the index matrix can be passed to the function along with the 'dist' object.

# Create vectors i and j from the above data.
i <- indices[,1]
j <- indices[,2]

# Retrieve distances using the paired vectors method.
get_dists(mat_dists, i, j)

Example 3 - Distance retrieval for all combinations of a subset of points

Sometimes, the distances for all combinations of a set of points are desired. This information can be easily extracted by executing the following:

# Create a matrix of unique index pairs.
index_pairs <- combn(1:3, 2) # Generate the combinations
index_pairs <- t(index_pairs) # Transpose to put the data in tall format.

# Retrieve the distances as above.
get_dists(mat_dists, index_pairs)

Example 4 - Return both indices and distances

It is often desirable to create a matrix or data.frame composed of two columns that indicate the indices being compared and a third column giving the distances between those indices. For convenience, the argument return_indices can be set to TRUE. Doing so results in a three column matrix being returned. It can be converted into a data.frame using the function as.data.frame.

# Retrieve distances using the matrix-based method.
get_dists(mat_dists, indices, return_indices = TRUE)

Disclaimer

The views expressed are those of the author(s) and do not reflect the official policy of the Department of the Army, the Department of Defense or the U.S. Government.



Try the disttools package in your browser

Any scripts or data that you put into this service are public.

disttools documentation built on Feb. 5, 2022, 1:06 a.m.