pick_bestmatch_index: Fuzzy matching functions

View source: R/pick_bestmatch_index.R

pick_bestmatch_indexR Documentation

Fuzzy matching functions

Description

Functions to match text using fuzzy matching from a vector of possible matches. Bases on 'stringdist' functions. Note: the 'stringdist' package is required to be installed to run these functions.

Usage

pick_bestmatch_index(string_vector, string_tomatch, method = "jaccard")

Arguments

string_vector

a character vector of possible values

string_tomatch

a single character value for which a match is requested

method

Method for distance calculation. The default is "jaccard", see stringdist-metrics.

Value

'pick_bestmatch_index' returns the integer index for the best-scoring match “ 'pick_bestmatch_value' returns the character value for the best-scoring match.

'pick_bestmatch_score' returns the calculated score for the best-scoring match.

Examples

single_char <- "A Very Specific Title"
possible_matches <- c("shouldnt_match", "extraneous_text", "random_text", "oiphjhdfkl", "very_specific_title")

pick_bestmatch_index(possible_matches, single_char)
pick_bestmatch_value(possible_matches, single_char)
pick_bestmatch_score(possible_matches, single_char)

# comparison of possible methods:
possible_methods = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex")
sapply(possible_methods, function(x) pick_bestmatch_index(possible_matches, single_char, method = x))
sapply(possible_methods, function(x) pick_bestmatch_value(possible_matches, single_char, method = x))
sapply(possible_methods, function(x) pick_bestmatch_score(possible_matches, single_char, method = x))


JMLuther/tabletools documentation built on July 1, 2024, 2:01 p.m.