fuzzify_country: Matches country names based on fuzzy matching
In datadrivenenvirolab/ClimActor: Data cleaning workflow for climate actor data

View source: R/name_phonetic_matching_functions.R

fuzzify_country

R Documentation

Matches country names based on fuzzy matching

Description

fuzzify_country() cleans the "country" column in the user's dataset for those names that did not find an exact match in the existing country_dictionary. Given the small size of the country dictionary and the relatively fewer number of countries (as compared to climate actors), a fuzzy string matching algorithm using the Levenshtein distance is used for the fuzzy matching instead of the phonetic algorithms used for matching climate actor names.

Usage

fuzzify_country(dataset, country_keydict)

Arguments

`dataset`	Dataset containing countries by user
`country_keydict`	Key dictionary to clean actors' countries against

Value

Cleaned dataset with countries standardized against the country dictionary.

A few vectors of indices will also be created to store the indices of those countries that needs to be matched. The first is a vector of indices of all actors that require cleaning. unmatched_count is a vector of indices of countries not cleaned by the function. custom_count is a vector of indices denoting countries for which custom actor names are given by the user, and will be used to update the country dictionary.

datadrivenenvirolab/ClimActor documentation built on April 23, 2024, 7:40 a.m.