View source: R/algaebase_api_functions.R
parse_scientific_names | R Documentation |
This function processes a character vector of scientific names, splitting them into genus and species components. It handles binomial names (e.g., "Homo sapiens"), removes undesired descriptors (e.g., 'Cfr.', 'cf.', 'sp.', 'spp.'), and manages cases involving varieties, subspecies, or invalid species names. Special characters and whitespace are handled appropriately.
parse_scientific_names(
scientific_name,
remove_undesired_descriptors = TRUE,
remove_subspecies = TRUE,
remove_invalid_species = TRUE,
encoding = "UTF-8"
)
scientific_name |
A character vector containing scientific names, which may include binomials, additional descriptors, or varieties. |
remove_undesired_descriptors |
Logical, if TRUE, undesired descriptors (e.g., 'Cfr.', 'cf.', 'colony', 'cells', etc.) are removed. Default is TRUE. |
remove_subspecies |
Logical, if TRUE, subspecies/variety descriptors (e.g., 'var.', 'subsp.', 'f.', etc.) are removed. Default is TRUE. |
remove_invalid_species |
Logical, if TRUE, invalid species names (e.g., 'sp.', 'spp.') are removed. Default is TRUE. |
encoding |
A string specifying the encoding to be used for the input names (e.g., 'UTF-8'). Default is 'UTF-8'. |
A data.frame
with two columns:
genus
: Contains the genus names.
species
: Contains the species names (empty if unavailable or invalid).
Invalid descriptors like 'sp.', 'spp.', and numeric entries are excluded from the 'species' column.
# Example with a vector of scientific names
scientific_names <- c("Skeletonema marinoi", "Cf. Azadinium perforatum", "Gymnodinium sp.",
"Melosira varians", "Aulacoseira islandica var. subarctica")
result <- parse_scientific_names(scientific_names)
# Check the resulting data frame
print(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.