fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings

Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <doi:10.1017/pan.2025.10016>.

Getting started

Package details

AuthorJoe Ornstein [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-5704-2098>)
MaintainerJoe Ornstein <jornstein@uga.edu>
LicenseMIT + file LICENSE
Version0.4.1
URL https://joeornstein.github.io/software/fuzzylink/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("fuzzylink")

Try the fuzzylink package in your browser

Any scripts or data that you put into this service are public.

fuzzylink documentation built on Feb. 24, 2026, 1:06 a.m.