check_names: Check species names

View source: R/check_names.R

check_namesR Documentation

Check species names

Description

check_names checks if the species names are correct and searches for suggestions if the name is misspelled or not found in the Flora e Funga do Brasil database

match_names finds approximate matches to the specified pattern (species) within each element of the string x (species_to_match). It is used internally by check_names.

Usage

check_names(data, species, max_distance = 0.1,
                   include_subspecies= FALSE, include_variety = FALSE,
                   kingdom = "Plantae", parallel = FALSE, ncores = 1,
                   progress_bar = FALSE)

match_names(
  species,
  species_to_match,
  max_distance = 0.1,
  parallel = FALSE,
  ncores = 1,
  progress_bar = FALSE
)

Arguments

data

(data.frame) the data.frame imported with the load_florabr function.

species

(character) names of the species to be checked.

max_distance

(numeric) Maximum distance (as a fraction) allowed for searching suggestions when the name is misspelled. It can be any value between 0 and 1. The higher the value, the more suggestions are returned. For more details, see agrep. Default = 0.1.

include_subspecies

(logical) whether to include subspecies. Default = FALSE

include_variety

(logical) whether to include varieties. Default = FALSE

kingdom

(character) the kingdom to which the species belong. It can be "Plantae" or "Fungi". Default = "Plantae".

parallel

(logical) whether to run in parallel. Setting this to TRUE is recommended for improved performance when working with 100 or more species.

ncores

(numeric) number of cores to use for parallel processing. Default is 1. This is only applicable if parallel = TRUE.

progress_bar

(logical) whether to display a progress bar during processing. Default is FALSE

species_to_match

(character) a vector of species names to match against the species parameter.

Value

a data.frame with the following columns:

  • input_name: the species names informed in species argument

  • Spelling: indicates if the species name is Correct (a perfect match with a species name in the Flora e Funga do Brasil), Probably_incorrect (partial match), or Not_found (no match with any species).

  • Suggested name: If Spelling is Correct, it is the same as the input_name. If Spelling is Probably_correct, one or more suggested names are listed, found according to the maximum distance. If Spelling is "Not_found", the value is NA.

  • Distance: The integer Levenshtein edit distance. It represents the number of single-character edits (insertions, deletions, or substitutions) required to transform the input_name into the Suggested_name.

  • taxonomicStatus: the taxonomic status of the species name ("Accepted" or "Synonym").

  • nomenclaturalStatus: the nomenclatural status of the species name. This information is not available for all species.

  • acceptedName: If the species name is not accepted or incorrect, the accepted name of the specie. If the species name is accepted and correct, the same as input_name and Suggested_name.

  • family: the family of the specie.

References

Flora e Funga do Brasil. Jardim Botânico do Rio de Janeiro. Available at: http://floradobrasil.jbrj.gov.br/

Examples

data("bf_data", package = "florabr")
spp <- c("Butia cattarinensis", "Araucaria angustifolia")
check_names(data = bf_data, species = spp)

florabr documentation built on Sept. 11, 2024, 9:10 p.m.