chardist: Get the distance matrix of string data

View source: R/chardist.R

chardistR Documentation

Get the distance matrix of string data

Description

This function computes a distance matrix (as does the function dist) of character values of a dataset (while dist does for numerical data).

Usage

chardist(x, byrow = TRUE)

Arguments

x

A matrix or a data frame.

byrow

By default (byrow=TRUE), the function computes differences between rows, but it also can do so by columns (byrow=FALSE).

Details

The function counts the number of differences between the rows of a dataset. Please note that this function also works with non-character data but will not account for the distance between each element of each pair of rows, but for the number of different elements only.

Value

An object of class "dist".

Examples

set.seed(1)
A<-sample(letters[1:10],20,replace=TRUE) # Set a vector of 20 random letters between 'a' and 'j'
A
B<-replace(A,c(3,7,9),"k") # Replace three values by a 'k', so we can expect 3 differences between A and B
length(which(A!=B)) # We're OK
C<-A # Set an identical vector to A, so we can expect 0 differences between A and C (and 3 between B and C)
length(which(A!=C)) # OK too
D<-replace(A,c(1:4,6:8,11:19),letters[11:26]) # Replace 16 values of A by the 16 following letters
length(which(A!=D)) # Still OK
M<-matrix(c(A,B,C,D),nrow=20,ncol=4)
colnames(M)<-c("A","B","C","D")
DF<-data.frame(A,B,C,D)
t(M)
chardist(t(M))
chardist(M,byrow=FALSE)
chardist(DF,byrow=FALSE)


jacobmaugoust/ULT documentation built on May 16, 2023, 1:29 p.m.