zoomerjoin: Superlatively Fast Fuzzy Joins

Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.

Package details

AuthorBeniamino Green [aut, cre, cph], Etienne Bacher [ctb] (<https://orcid.org/0000-0002-9271-5075>), The authors of the dependency Rust crates [ctb, cph] (see inst/AUTHORS file for details)
MaintainerBeniamino Green <beniamino.green@yale.edu>
LicenseGPL (>= 3)
Version0.2.1
URL https://beniamino.org/zoomerjoin/ https://github.com/beniaminogreen/zoomerjoin
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("zoomerjoin")

Try the zoomerjoin package in your browser

Any scripts or data that you put into this service are public.

zoomerjoin documentation built on April 13, 2025, 9:08 a.m.