View source: R/entity-resolution.R
calc_similarity | R Documentation |
This functions compute similarities between entries in a document frequency matrix dfm()
and return a dataframe with distinct id combinations.
It heavily relies on the quanteda package
calc_similarity(data, method, min_sim)
data |
data as a document frequency matrix |
method |
character; the method identifying the similarity or distance measure to be used, see |
min_sim |
numeric; a threshold for the similarity values below which similarity values will not be returned; 0.75-0.8 seems reasonable |
dataframe containing the two id's and the similarity value
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.