View source: R/threated_match.R
| matching_threatenedperu | R Documentation |
This function matches given species names against the internal database of threatened plant species in Peru. It uses a hierarchical matching strategy that includes direct matching, genus-level matching, fuzzy matching, and suffix matching to maximize successful matches while maintaining accuracy.
matching_threatenedperu(
splist,
source = c("original", "updated"),
quiet = TRUE
)
splist |
A character vector containing the species names to be matched. Can include duplicate names - results will be expanded to match the input. |
source |
Character string specifying which database version to use. Options are:
|
quiet |
Logical, default TRUE. If FALSE, prints informative messages. |
**Duplicate Handling:** When the input contains duplicate names, the function automatically:
Detects duplicates and creates a tracking column (sorters)
Processes only unique names (efficient matching)
Expands results to restore all original positions
Preserves original input order via sorter column
The duplicate handling uses a 'sorters' column that concatenates all original sorter values for duplicate names (e.g., "1 - 3" for a name appearing at positions 1 and 3), enabling accurate result expansion.
**Matching Strategy:** 1. Direct exact matching 2. Genus-level matching (exact and fuzzy) 3. Species-level matching within genus 4. Infraspecies-level matching (up to 2 levels for original database)
**Rank Validation:** The algorithm implements strict rank validation to prevent false positives.
A tibble with detailed matching results including:
Integer. Original position in input vector
Character. Original input name (standardized)
Character. Matched name from database or "—"
Character. IUCN threat category or "Not threatened"
Integer. Input taxonomic rank (1-4)
Integer. Matched taxonomic rank
Logical. Whether ranks match exactly
Character. Description of match quality
Logical. Whether a match was found
is_threatened_peru for a simplified interface
get_ambiguous_matches to retrieve ambiguous match details
get_threatened_database to access the raw databases
## Not run:
# Basic usage
species_list <- c("Cattleya maxima", "Polylepis incana")
results <- matching_threatenedperu(species_list, source = "original")
# With duplicates
species_dup <- c("Cattleya maxima", "Polylepis incana", "Cattleya maxima")
results_dup <- matching_threatenedperu(species_dup)
nrow(results_dup) == 3 # TRUE - preserves duplicates
# Access metadata
attr(results, "match_rate")
# Check for ambiguous matches
get_ambiguous_matches(results, type = "infraspecies")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.