calc_similarity: Calculate Similarities

View source: R/entity-resolution.R

calc_similarityR Documentation

Calculate Similarities

Description

This functions compute similarities between entries in a document frequency matrix dfm() and return a dataframe with distinct id combinations. It heavily relies on the quanteda package

Usage

calc_similarity(data, method, min_sim)

Arguments

data

data as a document frequency matrix dfm() with a set doc_id

method

character; the method identifying the similarity or distance measure to be used, see ?quanteda::textstat_simil

min_sim

numeric; a threshold for the similarity values below which similarity values will not be returned; 0.75-0.8 seems reasonable

Value

dataframe containing the two id's and the similarity value


cutterkom/kabrutils documentation built on July 3, 2022, 4:04 p.m.