View source: R/pm_fuzzy_match.R
| fuzzy_match_genus | R Documentation |
Performs fuzzy matching of genus names against the peru_mammals database using string distance (Levenshtein) to account for slight spelling variations. Maximum distance is set to 1 character difference.
This implementation uses a two-step approach to avoid warnings when no matches are found:
Perform stringdist_left_join to get all candidates
Split into valid (finite distance) and invalid (NA distance)
Process only valid matches to find best candidates
fuzzy_match_genus(df, target_df = NULL)
df |
A data frame containing the genus names to be matched. Must include column: Orig.Genus |
target_df |
A data frame representing peru_mammals database. Must include column: genus |
If multiple genera match with the same string distance (ambiguous matches),
a warning is issued and the first match is automatically selected. To
examine ambiguous matches, use get_ambiguous_matches(result, type = "genus").
Ambiguous match information is stored as an attribute and includes:
Original genus
All matched genera with tied distances
Family information from peru_mammals
Number of species per genus
A tibble with two additional columns:
fuzzy_match_genus: Logical indicating if genus was matched
fuzzy_genus_dist: Numeric distance for each match (lower = better)
Matched.Genus: The matched genus name
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.