get_distances: Get Distances

View source: R/distances.R

get_distancesR Documentation

Get Distances

Description

Calculate distances using between all pairs of cluster fill rates in a data frame using one or more distance measures. The available distance measures absolute distance, Manhattan distance, Euclidean distance, maximum distance, and cosine distance.

Usage

get_distances(df, distance_measures)

Arguments

df

A dataframe of cluster fill rates created with get_cluster_fill_rates and an added column that contains a writer ID.

distance_measures

A vector of distance measures. Use 'abs' to calculate the absolute difference, 'man' for the Manhattan distance, 'euc' for the Euclidean distance, 'max' for the maximum absolute distance, and 'cos' for the cosine distance. The vector can be a single distance, or any combination of these five distance measures.

Details

The absolute distance between two n-length vectors of cluster fill rates, a and b, is a vector of the same length as a and b. It can be calculated as abs(a-b) where subtraction is performed element-wise, then the absolute value of each element is returned. More specifically, element i of the vector is |a_i - b_i| for i=1,2,...,n.

The Manhattan distance between two n-length vectors of cluster fill rates, a and b, is \sum_{i=1}^n |a_i - b_i|. In other words, it is the sum of the absolute distance vector.

The Euclidean distance between two n-length vectors of cluster fill rates, a and b, is \sqrt{\sum_{i=1}^n (a_i - b_i)^2}. In other words, it is the sum of the elements of the absolute distance vector.

The maximum distance between two n-length vectors of cluster fill rates, a and b, is \max_{1 \leq i \leq n}{\{|a_i - b_i|\}}. In other words, it is the sum of the elements of the absolute distance vector.

The cosine distance between two n-length vectors of cluster fill rates, a and b, is \sum_{i=1}^n (a_i - b_i)^2 / (\sqrt{\sum_{i=1}^n a_i^2}\sqrt{\sum_{i=1}^n b_i^2}).

Value

A dataframe of distances

Examples


rates <- test[1:3, ]
# calculate maximum and Euclidean distances between the first 3 documents in test.
distances <- get_distances(df = rates, distance_measures = c("max", "euc"))

# calculate maximum and distances between all documents in test.
distances <- get_distances(df = test, distance_measures = c("man"))


handwriterRF documentation built on April 4, 2025, 5:38 a.m.