taxon_update | R Documentation |
Validates or updates the scientific names of fungi, and their associated taxonomic classification, based on currently accepted scientific consensus listed in the GBIF Backbone Taxonomy database.
taxon_update(
data,
taxon_col = "scientificName",
authorship_col = "scientificNameAuthorship",
show_names = FALSE,
species_only = TRUE,
force_accepted = FALSE,
show_status = TRUE,
cores = 1
)
data |
Data.frame (must be in utf8 encoding) containing a column of canonical names (e.g. "Pleurotus", "Pleurotus ostreatus") and a column of corresponding authorships (e.g. "(Fr.) P.Kumm.", "(Jacq.) P.Kumm."). Taxa listed in the dataframe can be from any taxonomic rank from kingdom to species; however, there are caveats when updated names for ranks other than species. See Simpson & Schilling (2021). |
taxon_col |
Character string specifying the name of the column containing canonical names. Default is "scientificName". |
authorship_col |
Character string specifying the name of the column containing authorship. Default is "scientificNameAuthorship". If input data set has no authorship column, use NULL. Taxon names and authorship combined in one column (e.g."Pleurotus ostreatus (Jacq.) P.Kumm." ) is currently not supported. |
show_names |
Logical. Default is FALSE. If TRUE, taxon names are printed on the console as they are submitted as queries to GBIF. |
species_only |
Logical. Default if TRUE. If TRUE, records not identified to the species-level are removed from the data set prior to name updates. |
force_accepted |
Logical. Default is FALSE. If TRUE, records that do not have authorship information will be updated as the ACCEPTED full scientific name, if one exists, regardless of whether or not all potential authorships for the given canonical names would lead to the same ACCEPTED full scientific name. |
show_status |
Logical. Default is TRUE. If TRUE, percent completion and the number of unique taxa left to process is printed in the console. |
cores |
Integer. Default is 1. Specifies number of cores to use for processing. Values greater than 1 utilize parallel processing (not allowed on Windows systems). Parallel processing not recommended for use in GUI setting. See |
Queries the GBIF database for each taxon. Note that an internet
connection is required to retrieve data from the GBIF database.
If a queried taxon is matched to a GBIF record and that record has "accepted" taxonomic
status, the queried name is "validated" (i.e. the output "new_name" is the same as the queried name).
If the matched GBIF record is a "synonym", the "accepted" record associated with that
synonym is used to "update" the queried taxon (i.e. the output "new_name" is different from the queried name).
If a queried taxon has no GBIF matches or the GBIF match has "doubtful"
taxonomic status, the queried taxon is not validated or updated
(i.e. the output "new_name" will be blank) and an error code is output. See error
in "Value" section.
The input data.frame with the following output fields appended:
query_full_name |
exact string used in GBIF query |
new_name |
currently accepted canonical name (may be the same as the name originally listed in the input file, meaning that the orginal name is currently accepted) |
new_author |
authorship associated with the currently accepted scientific name |
new_full_name |
name and authorship combined into one character string. Represents the full scientific name. |
new_kingdom |
kingdom classification based on the new scientific name |
new_phylum |
phylum classification based on the new scientific name |
new_class |
class classification based on the new scientific name |
new_order |
order classification based on the new scientific name |
new_family |
family classification based on the new scientific name |
new_genus |
genus associated with the new scientific name |
new_specific_epithet |
specific epithet asscociated with the new scientific name |
rank |
taxonomic rank |
new_species |
canonical species name based on the new scientific name. This is useful for getting the species name for varieties or sub-species, which will have the full variety or subspecies names listed for "new_full_name". |
taxon_conf |
confidence score (0-100) for match quality of the full scientific name. Ex: "Trametes versicolor (L.) Lloyd" in sporocarp dataset matched with "Trametes versicolor (L.) Lloyd" in GBIF database would give a confidence score of 100. |
taxon_matchtype |
refers to match type of the canonical name. EXACT means a perfect match. Ex: "Trametes versicolor" in sporocarp dataset matched to "Trametes versicolor" in GBIF database. FUZZY means an imperfect match, likely due to spelling errors. Ex: "Trametes versacolor" in sporocarp dataset matched with "Trametes versicolor" in GBIF database. |
error |
error code for why a name could not be validated or updated. error1: name has doubtful taxonomic status (not accepted as a valid taxon and has no valid synonyms). error2: name has no authorship listed and all GBIF matches are of a higher taxonomic rank. error3: name has no authorship listed and all GBIF matches have doubtful taxonomic status. error4: name has no authorship listed and all GBIF matches have different accepted GBIF usage keys (accepted keys correspond to accepted taxa). error5: no matches returned from GBIF. error6: the synonym of the matched GBIF record has doubtful taxonomic status. error7: the synonym of the matched GBIF record is also listed as a synonym (may indicate an error within GBIF). error8: name has authorship listed and the best GBIF match is of a higher taxonomic rank. |
Http errors may indicate issues with the GBIF database (e.g., the taxonomy backbone is being updated). Monitor GBIF system health at https://www.gbif.org.
Sporadic GBIF connection errors may also occur during parallel processing. The cause of this is currently unknown,
but doesn't appear to be connected to GBIF system health. If an error does occur when processing a taxon name, that taxon is automatically reprocessed until an error no longer occurs.
So far, this solution seems to work well; however, if the error is related to something that can't immediately be fixed (e.g., GBIF system health issues),
the code may loop indefinitely. Track function progress output if you believe you may be experiencing this issue.
Progress can be tracked in different ways using either show_status
or show_names
.
Hunter J. Simpson
Scott Chamberlain and Eduard Szocs (2013). taxize - taxonomic search and retrieval in R. F1000Research, 2:191. URL:http://f1000research.com/articles/2-191/v2.
Scott Chamberlain, Eduard Szoecs, Zachary Foster, Zebulun Arendsee, Carl Boettiger, Karthik Ram, Ignasi Bartomeus, John Baumgartner, James O'Donnell, Jari Oksanen, Bastian Greshake Tzovaras, Philippe Marchand, Vinh Tran, Maƫlle Salmon, Gaopeng Li, and Matthias GreniƩ. (2020) taxize: Taxonomic information from around the web. R package version 0.9.95. https://github.com/ropensci/taxize
Hunter J. Simpson & Jonathan S. Schilling (2021) Using aggregated field collection data and the novel r package fungarium to investigate fungal fire association, Mycologia, 113:4, 842-855, DOI: 10.1080/00275514.2021.1884816
library(fungarium)
#import sample data set
data(agaricales)
#filter for records for specific state
mn_records <- agaricales[agaricales$stateProvince=="Minnesota",]
#update taxon names
mn_updated <- taxon_update(mn_records, show_status=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.