knitr::opts_chunk$set( collapse = TRUE, comment = "" )
Start by loading the package and download the Dyntaxa database for local use.
library(dyntaxa) # the first time the package is loaded, it attempts to # download and generate Dyntaxa data assets locally for use with the package # and only after this, the various functions can then be used to query the Dyntaxa database # this happens in the dyntaxa_init() function, which can be forced to download # and overwrite the locally persisted data in order to refresh it, like so: dyntaxa_init(force = TRUE)
The data is available locally after the initial download through the dyntaxa_init()
function, which will requires an internet connection in other to retrieve a remote Darwin Core Archive file with open data from the Dyntaxa database. After this initial download of the data, the package can be used off-line without Internet connectivity.
Basic usage examples follow, showing how to look up taxonomic data from the Dyntaxa database related to specific (single) taxonomic names or identifiers:
library(dyntaxa) # what is the identifier for all of Animalia? dyntaxa_id_from_name("Animalia") # what is the rank for a given identifier? dyntaxa_rank_from_id("5000001") # what is the scientific name given a taxon identifier dyntaxa_name_from_id("206046") # show the upstream classification given the identifier for Alces alces dyntaxa_classification(dyntaxa_id_from_name("Alces alces")) # what are immediate children of Cervidae? dyntaxa_children(dyntaxa_id_from_name("Cervidae")) # what are all downstream taxa, not just the children? dyntaxa_downstream(dyntaxa_id_from_name("Cervidae"))
The locally stored data has a full text search index generated, which can be used for taxonomic full text search queries.
library(dplyr) # search all taxa, literature references and vernacular names at once dyntaxa_search_all("blåklocka") # search for hits where the taxon identifier is 220018 # note more than one hit, because we search # all of taxa, literature reference and vernacular name # and there is more than one vernacular name for this taxon dyntaxa_search_all(220018) # so if you wish to return just the distinct set of taxon identifiers and names dyntaxa_search_all(220018) %>% distinct(taxonId, scientificName) # search for hits with vernaculars that contain the string "räv" # and display identifiers and names only dyntaxa_search_vernacular("*räv*") %>% distinct(taxonId, scientificName, vernacularName) # operators AND, OR, NOT can be used in searches # search for hits where the identifier is 220018 dyntaxa_search_all("220018 AND taxonomicStatus:accepted") %>% select(taxonId, scientificName, vernacularName) # search for hits for identifier 220018 dyntaxa_search_id("220018") # search for hits where title contains "Thomas Karlsson" and discard # hits that mention "Databas" in the title dyntaxa_search_all("title:Thomas Karlsson NOT title:Databas") # search for taxa with vernacular names matching # strings beginning with "fjällräv" OR "rödräv" dyntaxa_search_vernacular("^rödräv* OR ^fjällräv*") %>% distinct(taxonId, scientificName, vernacularName)
Sometimes a use case may involve checking that a given list of scientific names is not misspelled. The dyntaxa_resolve()
function can be used for this purpose.
Here is a short example which shows a list of mostly misspelled scientific names and how these can be "cleaned" through the fuzzy matching provided in that function.
# this list have scientific names that are mostly misspelled misnames <- c("Vulpe vulp", "Canis lupu", "Vulpes vulpez", "Apu apu", "Apus apus") dyntaxa_resolve(misnames)
In general, when working with lists (or vectors) of several species names (or identifiers), using packages such as purrr
and dplyr
will enable convenient automated lookups.
The following usage examples shows how to automate a number of tasks for variable-length lists of taxonomic names or identifiers and return results as neat data tables (tibbles):
library(purrr) library(dplyr) # given a list of several species names, lookup the identifiers lookups <- c("Abies procera", "Pinus contorta") df <- tibble(lookups, taxonId = map_chr(lookups, dyntaxa_id_from_name)) df # the opposite - given a list of several identifiers, lookup the names keys <- df$taxonId tibble(keys, scientificName = map_chr(keys, dyntaxa_name_from_id)) # combine classifications from several identifiers res <- map(keys, dyntaxa_classification) names(res) <- keys bind_rows(res, .id = "key") # combine downstream taxa for several identifiers map(keys, dyntaxa_downstream) %>% bind_rows(.id = "key") # taxonomic immediate children for several identifiers map(keys, dyntaxa_children) %>% bind_rows(.id = "key") # search fulltext index for several terms dyntaxa_search_vernacular("blåklocka OR vitsippa") %>% select(taxonId, scientificName, vernacularName) # alternatively, using purrr terms <- paste0("vernacularName:", c("blåklocka", "vitsippa")) map_df(terms, dyntaxa_search_all) # find synonyms for several identifiers keys <- c(260606, 5509) names(keys) <- map_chr(keys, dyntaxa_name_from_id) map(keys, dyntaxa_synonyms) %>% bind_rows(.id = "acceptedName")
The dyntaxa data used in the package can be exported to a darwin core archive file and saved locally.
# exporting data, saving locally dyntaxa_export()
If needed, an R object with the darwin core archive contents can be accessed directly in tabular format.
# accessing an R object directly with all the darwin core archive contents d <- dyntaxa_dwca() dt <- d$taxon_core dr <- d$reference_ext dv <- d$vernacular_ext di <- d$identifier_ext dd <- d$distribution_ext #meta <- d$meta #eml <- d$eml
We can inspect individual records for this data.
Here is the data for one specific taxon record.
dt %>% head(1) %>% t()
Here is data for one specific vernacular names record.
dv %>% head(1) %>% t()
Here is data for one specific literature reference record.
dr %>% head(1) %>% t()
Here are some example statements that can be used to investigate some aspects of the Dyntaxa dataset.
It shows how to calculate various aggregate numbers, such as counts showing the cardinality of various dimensions present in the taxonomic data.
dt %>% count(taxonId) %>% arrange(desc(n)) dt %>% count(parentNameUsageID) %>% arrange(desc(n)) dt %>% count(taxonRank) %>% arrange(desc(n)) dt %>% count(scientificNameAuthorship) %>% arrange(desc(n)) dt %>% count(nomenclaturalStatus) %>% arrange(desc(n)) dt %>% count(taxonomicStatus) %>% arrange(desc(n)) dv %>% count(language, countryCode) dv %>% filter(language == "en") dv %>% count(source) dv %>% count(isPreferredName) dv %>% count(taxonRemarks) %>% arrange(desc(n)) dr %>% count(title) %>% arrange(desc(n)) dr %>% count(date) %>% arrange(desc(date)) dr %>% count(taxonId) %>% arrange(desc(n)) # taxon ids are sometimes strings and sometimes numbers #dr %>% filter(taxonId == "urn:lsid:dyntaxa.se:TaxonName:153491") #dr %>% filter(taxonId == 2002323)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.