fuzzy_match_genus: Fuzzy Match Genus Name Against Peru Mammals Database
In perumammals: Taxonomic Backbone and Name Validation Tools for Mammals of Peru

fuzzy_match_genus

R Documentation

Fuzzy Match Genus Name Against Peru Mammals Database

Description

Performs fuzzy matching of genus names against the peru_mammals database using string distance (Levenshtein) to account for slight spelling variations. Maximum distance is set to 1 character difference.

This implementation uses a two-step approach to avoid warnings when no matches are found:

Perform stringdist_left_join to get all candidates
Split into valid (finite distance) and invalid (NA distance)
Process only valid matches to find best candidates

Usage

fuzzy_match_genus(df, target_df = NULL)

Arguments

`df`	A data frame containing the genus names to be matched. Must include column: Orig.Genus
`target_df`	A data frame representing peru_mammals database. Must include column: genus

Details

If multiple genera match with the same string distance (ambiguous matches), a warning is issued and the first match is automatically selected. To examine ambiguous matches, use get_ambiguous_matches(result, type = "genus").

Ambiguous match information is stored as an attribute and includes: