| dist.minmax | R Documentation |
Function for computing a similarity measure bewteen two (or more) vectors. Some scholars (Kestemont et at., 2016) claim that it works well when applied to authorship attribution problems.
dist.minmax(x)
x |
a matrix or data table containing at least 2 rows and 2 cols, the samples (texts) to be compared in rows, the variables in columns. |
The function returns an object of the class dist, containing distances
between each pair of samples. To convert it to a square matrix instead,
use the generic function as.dist.
Maciej Eder
Kestemont, M., Stover, J., Koppel, M., Karsdorp, F. and Daelemans, W. (2016). Authenticating the writings of Julius Caesar. Expert Systems With Applications, 63: 86-96.
stylo, classify, dist,
as.dist, dist.cosine
# first, preparing a table of word frequencies
Iuvenalis_1 = c(3.939, 0.635, 1.143, 0.762, 0.423)
Iuvenalis_2 = c(3.733, 0.822, 1.066, 0.933, 0.511)
Tibullus_1 = c(2.835, 1.302, 0.804, 0.862, 0.881)
Tibullus_2 = c(2.911, 0.436, 0.400, 0.946, 0.618)
Tibullus_3 = c(1.893, 1.082, 0.991, 0.879, 1.487)
dataset = rbind(Iuvenalis_1, Iuvenalis_2, Tibullus_1, Tibullus_2,
Tibullus_3)
colnames(dataset) = c("et", "non", "in", "est", "nec")
# the table of frequencies looks as follows
print(dataset)
# then, applying a distance, in two flavors
dist.minmax(dataset)
as.matrix(dist.minmax(dataset))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.