blocking: Various Blocking Methods for Entity Resolution

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.

Package details

AuthorMaciej Beręsewicz [aut, cre] (ORCID: <https://orcid.org/0000-0002-8281-4301>), Adam Struzik [aut, ctr]
MaintainerMaciej Beręsewicz <maciej.beresewicz@ue.poznan.pl>
LicenseGPL-3
Version1.0.1
URL https://github.com/ncn-foreigners/blocking https://ncn-foreigners.ue.poznan.pl/blocking/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("blocking")

Try the blocking package in your browser

Any scripts or data that you put into this service are public.

blocking documentation built on June 18, 2025, 9:16 a.m.