phonetify_names: Cleans name using phonetic matching

View source: R/name_phonetic_matching_functions.R

phonetify_namesR Documentation

Cleans name using phonetic matching

Description

phonetify_names() searches for actors' names which are not standardized and standardizes them according to the key dictionary. Users get to choose which name is correct based on a selection narrowed down by using phonetic matching. Users can choose to input custom names should the names not be in the key dictionary.

Usage

phonetify_names(dataset, key.dict)

Arguments

dataset

Dataset containing actors by user

key.dict

Key dictionary to clean actors' names against

Details

A combination of 5 different phonetic representations (Metaphone, Nysiis modified, Onca modified refined, Phonex, Roger Root) is used in tandem with a variety of string distance metrics (Full Damerau-Levenshtein distance, q-gram distance, cosine distance (between q-gram profiles), Jaccard distance between (q-gram profiles), and Jaro-Winker distance) to get an accurate match of the actor's name within the supplied key dictionary.

Value

Cleaned dataset with actors names standardized against the key dictionary.

A few vectors of indices will also be created to store the indices of those names that needs to be matched. The first is a vector of indices of all actors that require cleaning. unmatched_indices is a vector of indices of names not cleaned by the function. custom_indices is a vector of indices denoting names for which custom actor names are given by the user, and will be used to update the key dictionary.


datadrivenenvirolab/ClimActor documentation built on April 23, 2024, 7:40 a.m.