most_similar: Select the most similar texts to a specific text
In idiolect: Forensic Authorship Analysis

most_similar

R Documentation

Select the most similar texts to a specific text

Description

Select the most similar texts to a specific text

Usage

most_similar(sample, pool, coefficient, n)

Arguments

`sample`	This is a single row of a `quanteda` dfm representing the sample to match.
`pool`	This is a dfm containing all possible samples from which to select the top n.
`coefficient`	The coefficient to use for similarity. Either "minmax", "cosine", or "Phi".
`n`	The number of rows to extract from the pool of potential samples.

Value

The function returns a dfm containing the top n most similar rows to the input sample using the minmax distance.

Examples

text1 <- "The cat sat on the mat"
text2 <- "The dog sat on the chair"
text3 <- "Violence is the last refuge of the incompetent"
c <- quanteda::corpus(c(text1, text2, text3))
d <- quanteda::tokens(c) |> quanteda::dfm() |> quanteda::dfm_weight(scheme = "prop")
most_similar(d[1,], d[-1,], coefficient = "minmax", n = 1)

idiolect documentation built on Sept. 11, 2024, 5:34 p.m.