taxon_update: Update scientific names of fungi

View source: R/taxon_update.R

taxon_updateR Documentation

Update scientific names of fungi

Description

Validates or updates the scientific names of fungi, and their associated taxonomic classification, based on currently accepted scientific consensus listed in the GBIF Backbone Taxonomy database.

Usage

taxon_update(
  data,
  taxon_col = "scientificName",
  authorship_col = "scientificNameAuthorship",
  show_names = FALSE,
  species_only = TRUE,
  force_accepted = FALSE,
  show_status = TRUE,
  cores = 1
)

Arguments

data

Data.frame (must be in utf8 encoding) containing a column of canonical names (e.g. "Pleurotus", "Pleurotus ostreatus") and a column of corresponding authorships (e.g. "(Fr.) P.Kumm.", "(Jacq.) P.Kumm."). Taxa listed in the dataframe can be from any taxonomic rank from kingdom to species; however, there are caveats when updated names for ranks other than species. See Simpson & Schilling (2021).

taxon_col

Character string specifying the name of the column containing canonical names. Default is "scientificName".

authorship_col

Character string specifying the name of the column containing authorship. Default is "scientificNameAuthorship". If input data set has no authorship column, use NULL. Taxon names and authorship combined in one column (e.g."Pleurotus ostreatus (Jacq.) P.Kumm." ) is currently not supported.

show_names

Logical. Default is FALSE. If TRUE, taxon names are printed on the console as they are submitted as queries to GBIF.

species_only

Logical. Default if TRUE. If TRUE, records not identified to the species-level are removed from the data set prior to name updates.

force_accepted

Logical. Default is FALSE. If TRUE, records that do not have authorship information will be updated as the ACCEPTED full scientific name, if one exists, regardless of whether or not all potential authorships for the given canonical names would lead to the same ACCEPTED full scientific name.

show_status

Logical. Default is TRUE. If TRUE, percent completion and the number of unique taxa left to process is printed in the console.

cores

Integer. Default is 1. Specifies number of cores to use for processing. Values greater than 1 utilize parallel processing (not allowed on Windows systems). Parallel processing not recommended for use in GUI setting. See parallel::mclapply.

Details

Queries the GBIF database for each taxon. Note that an internet connection is required to retrieve data from the GBIF database.

If a queried taxon is matched to a GBIF record and that record has "accepted" taxonomic status, the queried name is "validated" (i.e. the output "new_name" is the same as the queried name). If the matched GBIF record is a "synonym", the "accepted" record associated with that synonym is used to "update" the queried taxon (i.e. the output "new_name" is different from the queried name).

If a queried taxon has no GBIF matches or the GBIF match has "doubtful" taxonomic status, the queried taxon is not validated or updated (i.e. the output "new_name" will be blank) and an error code is output. See error in "Value" section.

Value

The input data.frame with the following output fields appended:

query_full_name

exact string used in GBIF query

new_name

currently accepted canonical name (may be the same as the name originally listed in the input file, meaning that the orginal name is currently accepted)

new_author

authorship associated with the currently accepted scientific name

new_full_name

name and authorship combined into one character string. Represents the full scientific name.

new_kingdom

kingdom classification based on the new scientific name

new_phylum

phylum classification based on the new scientific name

new_class

class classification based on the new scientific name

new_order

order classification based on the new scientific name

new_family

family classification based on the new scientific name

new_genus

genus associated with the new scientific name

new_specific_epithet

specific epithet asscociated with the new scientific name

rank

taxonomic rank

new_species

canonical species name based on the new scientific name. This is useful for getting the species name for varieties or sub-species, which will have the full variety or subspecies names listed for "new_full_name".

taxon_conf

confidence score (0-100) for match quality of the full scientific name. Ex: "Trametes versicolor (L.) Lloyd" in sporocarp dataset matched with "Trametes versicolor (L.) Lloyd" in GBIF database would give a confidence score of 100.

taxon_matchtype

refers to match type of the canonical name. EXACT means a perfect match. Ex: "Trametes versicolor" in sporocarp dataset matched to "Trametes versicolor" in GBIF database. FUZZY means an imperfect match, likely due to spelling errors. Ex: "Trametes versacolor" in sporocarp dataset matched with "Trametes versicolor" in GBIF database.

error

error code for why a name could not be validated or updated. error1: name has doubtful taxonomic status (not accepted as a valid taxon and has no valid synonyms). error2: name has no authorship listed and all GBIF matches are of a higher taxonomic rank. error3: name has no authorship listed and all GBIF matches have doubtful taxonomic status. error4: name has no authorship listed and all GBIF matches have different accepted GBIF usage keys (accepted keys correspond to accepted taxa). error5: no matches returned from GBIF. error6: the synonym of the matched GBIF record has doubtful taxonomic status. error7: the synonym of the matched GBIF record is also listed as a synonym (may indicate an error within GBIF). error8: name has authorship listed and the best GBIF match is of a higher taxonomic rank.

Note

Http errors may indicate issues with the GBIF database (e.g., the taxonomy backbone is being updated). Monitor GBIF system health at https://www.gbif.org.

Sporadic GBIF connection errors may also occur during parallel processing. The cause of this is currently unknown, but doesn't appear to be connected to GBIF system health. If an error does occur when processing a taxon name, that taxon is automatically reprocessed until an error no longer occurs. So far, this solution seems to work well; however, if the error is related to something that can't immediately be fixed (e.g., GBIF system health issues), the code may loop indefinitely. Track function progress output if you believe you may be experiencing this issue. Progress can be tracked in different ways using either show_status or show_names.

Author(s)

Hunter J. Simpson

References

  1. Scott Chamberlain and Eduard Szocs (2013). taxize - taxonomic search and retrieval in R. F1000Research, 2:191. URL:http://f1000research.com/articles/2-191/v2.

  2. Scott Chamberlain, Eduard Szoecs, Zachary Foster, Zebulun Arendsee, Carl Boettiger, Karthik Ram, Ignasi Bartomeus, John Baumgartner, James O'Donnell, Jari Oksanen, Bastian Greshake Tzovaras, Philippe Marchand, Vinh Tran, Maƫlle Salmon, Gaopeng Li, and Matthias GreniƩ. (2020) taxize: Taxonomic information from around the web. R package version 0.9.95. https://github.com/ropensci/taxize

  3. Hunter J. Simpson & Jonathan S. Schilling (2021) Using aggregated field collection data and the novel r package fungarium to investigate fungal fire association, Mycologia, 113:4, 842-855, DOI: 10.1080/00275514.2021.1884816

Examples

library(fungarium)

#import sample data set
data(agaricales)

#filter for records for specific state
mn_records <- agaricales[agaricales$stateProvince=="Minnesota",]

#update taxon names
mn_updated <- taxon_update(mn_records, show_status=FALSE)

hjsimpso/fungarium documentation built on Aug. 23, 2023, 3:59 p.m.