View source: R/pm_fuzzy_match.R
pm_fuzzy_match | R Documentation |
This function is a helper to match party names in new data to a "meta dataset" of other party names using a distance metric. It output is a dataframe that can be checked and manually adjusted.
pm_fuzzy_match
is basically a wrapper around stringdist_left_join
, which is itself based on stringdist
.
pm_fuzzy_match(survey_data, meta_data, by, method = "osa", threshold = 5, ...)
survey_data |
A dataframe or similar object with unique party names in survey or poll data |
meta_data |
A dataframe or similar object with unique party names in reference party-level dataset (e.g., ParlGov) |
by |
Expression that defines the variables to be matched in the two dataset, e.g. c("name_party_survey" = "name_party_meta) |
method |
One of stringdist matching methods ("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex") |
threshold |
Maximum distance value to be kept |
... |
Additional parameters to pass to |
A dataframe of matched strings that respect the distance threshold.
# Load package library(partymakeR) # Create example datasets dat_survey <- data.frame( party_names = c("Big party", "Nationals' assembly", "Loser party"), party_id = 1:3) dat_meta <- data.frame( name_party = c("big parties", "Nationalist party", "losers"), id_party = letters[1:3]) # Compute match (1 result) pm_fuzzy_match(survey_data = dat_survey, meta_data = dat_meta, c("party_names" = "name_party"), threshold = 5) # Compute match (6 results) pm_fuzzy_match(survey_data = dat_survey, meta_data = dat_meta, c("party_names" = "name_party"), threshold = 12)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.