create_fingerprint: Create Fingerprint of a string

View source: R/clean-text.R

create_fingerprintR Documentation

Create Fingerprint of a string

Description

This function creates a fingerprint of a string. This can be used for de-duplication or calculation of string similarity or string distance. It is bases on normalised tokens and implements Open Refine's clustering algorithm, precisly the Fingerprint Key Collision See https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth

Usage

create_fingerprint(string, tokens = "word", n = NULL)

Arguments

string

input string

tokens

how to generate tokens? word for whitespace-separated tokens, ngram for ngrams/shingles

n

The number of characters in each shingle. If token = "ngram" a n must be provided

Value

character string

Examples

create_fingerprint("Max Spohr Verlag", token = "word")
create_fingerprint("Max Spohr Verlag", token = "ngram", n = 2)

cutterkom/kabrutils documentation built on July 3, 2022, 4:04 p.m.