fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings

Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <https://joeornstein.github.io/publications/fuzzylink.pdf>.

Getting started

Package details

AuthorJoe Ornstein [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-5704-2098>)
MaintainerJoe Ornstein <jornstein@uga.edu>
LicenseMIT + file LICENSE
Version0.2.4
URL https://github.com/joeornstein/fuzzylink
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("fuzzylink")

Try the fuzzylink package in your browser

Any scripts or data that you put into this service are public.

fuzzylink documentation built on Aug. 18, 2025, 5:29 p.m.