textDist: Calculate Text Distance

View source: R/distance.R

textDistR Documentation

Calculate Text Distance

Description

When two vectors are given, this calculates the text distance between them; text distance is calculated as the proportion of unmatched frequencies, i.e., the number of unmatched frequencies divided by the total frequencies among the two vectors. However, if neither vector has any values at all, their distance equals the number provided in the zeroes argument, which is .5 by default. When two matrices are given, the text distance between corresponding columns is calculated.

Usage

textDist(x, y, zeroes = 0.5)

Arguments

x

A numeric vector or matrix

y

A numeric vector or matrix of the same dimension as x

zeroes

Text distance when both vectors are zero vectors; default is .5

Value

When x and y are vectors, the text distance between them. For example, between vectors (1,2,0) and (0,1,1), a total of 5 frequencies are present. However, position 1 matches nothing when it could have matched 1 frequency, position 2 matches 1 frequency when it could have matched both positions, so 1 remains unmatched. Position 3 matches nothing when it could have matched 1. So we have 3 unmatched positions divided by 5 frequencies, resulting in a text distance of 3/5=.6. If x and y are matrices, a vector with the text distance between corresponding columns is returned. So for two 4x2 matrices, a vector with two values is returned, one with the text distance between the first columns of the matrices, and the second one with the text distance between the second columns of the matrices. For large sets of data, it is recommended to use matrices as it is much more efficient than calculating column by column.

Examples

#text distance between two vectors
textDist(c(1,2,0),c(0,1,1))
(M1=matrix(c(0,1,0,2,0,10,0,14),4))
(M2=matrix(c(12,0,8,0,1,3,1,2),4))
#text distance between corresponding columns of M1 and M2
textDist(M1,M2)

phm documentation built on June 8, 2022, 1:05 a.m.