seunglee98/fedmatch: Record linkage functions in R

Functions for merging two unlinked datasets. The central function of this package is "merge_plus",which extends base R merge functionality to include fuzzy string matching, match scoring based on the similarity of common variables between the two datasets, filtering based on a calculated match score or a user-inputed function, match evaluation (see match_evaluate), and safe merge checks. Other functions include: -match_evaluate, which produces standard matching statistics including percent matched, and duplicate ratios, -tier_match, which is a wrapper for merge_plus that allows you match two datasets in sequential tiers with gradually looser parameters, -calculate_weights, a function that estimates the ability of a common variable to correctly identify a match or a non-match based on the record linkage literature, -clean_strings, a general string cleaning function optimized for company names. See "match_template.R" in the "examples" folder for a self-contained tutorial on the functionality of this package and template for your own matching program.

Getting started

Package details

Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
seunglee98/fedmatch documentation built on June 26, 2019, 11:56 a.m.