pm_fuzzy_match: Get fuzzy party name matches

View source: R/pm_fuzzy_match.R

pm_fuzzy_matchR Documentation

Get fuzzy party name matches

Description

This function is a helper to match party names in new data to a "meta dataset" of other party names using a distance metric. It output is a dataframe that can be checked and manually adjusted.

pm_fuzzy_match is basically a wrapper around stringdist_left_join, which is itself based on stringdist.

Usage

pm_fuzzy_match(survey_data, meta_data, by, method = "osa", threshold = 5, ...)

Arguments

survey_data

A dataframe or similar object with unique party names in survey or poll data

meta_data

A dataframe or similar object with unique party names in reference party-level dataset (e.g., ParlGov)

by

Expression that defines the variables to be matched in the two dataset, e.g. c("name_party_survey" = "name_party_meta)

method

One of stringdist matching methods ("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex")

threshold

Maximum distance value to be kept

...

Additional parameters to pass to fuzzyjoin::stringdist_left_join and stringdist::stringdist

Value

A dataframe of matched strings that respect the distance threshold.

Examples

# Load package
library(partymakeR)

# Create example datasets
dat_survey <- data.frame(
  party_names = c("Big party", "Nationals' assembly", "Loser party"),
  party_id = 1:3)
dat_meta <- data.frame(
  name_party = c("big parties", "Nationalist party", "losers"),
  id_party = letters[1:3])

# Compute match (1 result)
pm_fuzzy_match(survey_data = dat_survey, meta_data = dat_meta,
               c("party_names" = "name_party"), threshold = 5)

# Compute match (6 results)
pm_fuzzy_match(survey_data = dat_survey, meta_data = dat_meta,
               c("party_names" = "name_party"), threshold = 12)

RobertoValli/partymakeR documentation built on June 15, 2022, 2:12 p.m.