In bioatlas/dyntaxa: Dyntaxa - Swedish Taxonomic Database Of Organisms in Sweden

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ""
)

Start by loading the package and download the Dyntaxa database for local use.

library(dyntaxa)

# the first time the package is loaded, it attempts to
# download and generate Dyntaxa data assets locally for use with the package
# and only after this, the various functions can then be used to query the Dyntaxa database
# this happens in the dyntaxa_init() function, which can be forced to download 
# and overwrite the locally persisted data in order to refresh it, like so:
dyntaxa_init(force = TRUE)

The data is available locally after the initial download through the dyntaxa_init() function, which will requires an internet connection in other to retrieve a remote Darwin Core Archive file with open data from the Dyntaxa database. After this initial download of the data, the package can be used off-line without Internet connectivity.

Basic usage

Basic usage examples follow, showing how to look up taxonomic data from the Dyntaxa database related to specific (single) taxonomic names or identifiers:

library(dyntaxa)

# what is the identifier for all of Animalia?
dyntaxa_id_from_name("Animalia")

# what is the rank for a given identifier?
dyntaxa_rank_from_id("5000001")

# what is the scientific name given a taxon identifier
dyntaxa_name_from_id("206046")

# show the upstream classification given the identifier for Alces alces
dyntaxa_classification(dyntaxa_id_from_name("Alces alces"))

# what are immediate children of Cervidae?
dyntaxa_children(dyntaxa_id_from_name("Cervidae"))

# what are all downstream taxa, not just the children?
dyntaxa_downstream(dyntaxa_id_from_name("Cervidae"))

Searching taxonomic data

The locally stored data has a full text search index generated, which can be used for taxonomic full text search queries.

library(dplyr)

# search all taxa, literature references and vernacular names at once
dyntaxa_search_all("blåklocka")


# search for hits where the taxon identifier is 220018
# note more than one hit, because we search 
# all of taxa, literature reference and vernacular name
# and there is more than one vernacular name for this taxon
dyntaxa_search_all(220018)


# so if you wish to return just the distinct set of taxon identifiers and names
dyntaxa_search_all(220018) %>% 
  distinct(taxonId, scientificName)


# search for hits with vernaculars that contain the string "räv"
# and display identifiers and names only
dyntaxa_search_vernacular("*räv*") %>%
  distinct(taxonId, scientificName, vernacularName)


# operators AND, OR, NOT can be used in searches
# search for hits where the identifier is 220018
dyntaxa_search_all("220018 AND taxonomicStatus:accepted") %>% 
  select(taxonId, scientificName, vernacularName)


# search for hits for identifier 220018
dyntaxa_search_id("220018")


# search for hits where title contains "Thomas Karlsson" and discard 
# hits that mention "Databas" in the title
dyntaxa_search_all("title:Thomas Karlsson NOT title:Databas")


# search for taxa with vernacular names matching 
# strings beginning with "fjällräv" OR "rödräv"
dyntaxa_search_vernacular("^rödräv* OR ^fjällräv*") %>% 
  distinct(taxonId, scientificName, vernacularName)

Fuzzy name matching

Sometimes a use case may involve checking that a given list of scientific names is not misspelled. The dyntaxa_resolve() function can be used for this purpose.

Here is a short example which shows a list of mostly misspelled scientific names and how these can be "cleaned" through the fuzzy matching provided in that function.

# this list have scientific names that are mostly misspelled
misnames <- c("Vulpe vulp", "Canis lupu", "Vulpes vulpez", "Apu apu", "Apus apus")

dyntaxa_resolve(misnames)

Automating lookups for lists with taxonomic names and/or identifiers

In general, when working with lists (or vectors) of several species names (or identifiers), using packages such as purrr and dplyr will enable convenient automated lookups.

The following usage examples shows how to automate a number of tasks for variable-length lists of taxonomic names or identifiers and return results as neat data tables (tibbles):

library(purrr)
library(dplyr)

# given a list of several species names, lookup the identifiers
lookups <- c("Abies procera", "Pinus contorta")
df <- tibble(lookups, taxonId = map_chr(lookups, dyntaxa_id_from_name))
df

# the opposite - given a list of several identifiers, lookup the names
keys <- df$taxonId
tibble(keys, scientificName = map_chr(keys, dyntaxa_name_from_id))


# combine classifications from several identifiers
res <- map(keys, dyntaxa_classification)
names(res) <- keys
bind_rows(res, .id = "key")

# combine downstream taxa for several identifiers
map(keys, dyntaxa_downstream) %>% 
  bind_rows(.id = "key")

# taxonomic immediate children for several identifiers
map(keys, dyntaxa_children) %>% bind_rows(.id = "key")

# search fulltext index for several terms
dyntaxa_search_vernacular("blåklocka OR vitsippa") %>% 
  select(taxonId, scientificName, vernacularName)

# alternatively, using purrr
terms <- paste0("vernacularName:", c("blåklocka", "vitsippa"))
map_df(terms, dyntaxa_search_all)

# find synonyms for several identifiers
keys <- c(260606, 5509)
names(keys) <- map_chr(keys, dyntaxa_name_from_id)
map(keys, dyntaxa_synonyms) %>% bind_rows(.id = "acceptedName")

Exporting data as a Darwin Core Archive file

The dyntaxa data used in the package can be exported to a darwin core archive file and saved locally.

# exporting data, saving locally
dyntaxa_export()

Accessing internal data as R object

If needed, an R object with the darwin core archive contents can be accessed directly in tabular format.

# accessing an R object directly with all the darwin core archive contents
d <- dyntaxa_dwca()

dt <- d$taxon_core
dr <- d$reference_ext
dv <- d$vernacular_ext
di <- d$identifier_ext
dd <- d$distribution_ext

#meta <- d$meta
#eml <- d$eml

We can inspect individual records for this data.

Here is the data for one specific taxon record.

dt %>% head(1) %>% t()

Here is data for one specific vernacular names record.

dv %>% head(1) %>% t()

Here is data for one specific literature reference record.

dr %>% head(1) %>% t()

Aggregates and counts

Here are some example statements that can be used to investigate some aspects of the Dyntaxa dataset.

It shows how to calculate various aggregate numbers, such as counts showing the cardinality of various dimensions present in the taxonomic data.

dt %>% count(taxonId) %>% arrange(desc(n)) 
dt %>% count(parentNameUsageID) %>% arrange(desc(n))
dt %>% count(taxonRank) %>% arrange(desc(n))
dt %>% count(scientificNameAuthorship) %>% arrange(desc(n))
dt %>% count(nomenclaturalStatus) %>% arrange(desc(n))
dt %>% count(taxonomicStatus) %>% arrange(desc(n))


dv %>% count(language, countryCode)
dv %>% filter(language == "en")
dv %>% count(source)
dv %>% count(isPreferredName)
dv %>% count(taxonRemarks) %>% arrange(desc(n))

dr %>% count(title) %>% arrange(desc(n))
dr %>% count(date) %>% arrange(desc(date))
dr %>% count(taxonId) %>% arrange(desc(n))

# taxon ids are sometimes strings and sometimes numbers
#dr %>% filter(taxonId == "urn:lsid:dyntaxa.se:TaxonName:153491")
#dr %>% filter(taxonId == 2002323)