knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
taxalight
provides a lightweight, lightning fast query for resolving taxonomic identifiers to taxonomic names, and vice versa, by using a Lightning Memory Mapped Database backend. Compared to taxadb
, it has few dependencies, fewer functions, and faster performance.
If you just need to resolve scientific names to identifiers and vice versa, taxalight
is a fast and simple option. taxalight
currently supports names from Integrated Taxonomic Information System (ITIS), National Center for Biotechnology Information (NCBI), Global Biodiversity Information Facility (GBIF), Catalogue of Life (COL), and Open Tree Taxonomy (OTT). Like taxadb
, taxalight
uses annual stable version snapshots from these providers and presents the naming data in the simple and consistent tabular format of the Darwin Core Standard.
You can install the released version of taxalight from CRAN with:
install.packages("taxalight")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("cboettig/taxalight")
taxalight
needs to first download and import the provider naming databases. This can take a while, but needs to only be done once.
library(taxalight) tl_create("itis")
Now we can look up species by names, IDs, or a mix. Even vernacular names can be recognized as key. Note that only exact matches are supported though! ITIS (itis
) is the default provider, but GBIF, COL, OTT, and NCBI are also available.
tl("Homo sapiens", provider = "itis")
id <- c("ITIS:180092", "ITIS:179913", "Dendrocygna autumnalis", "Snow Goose", provider = "itis") tl(id)
For convenience, we can request just the name or id as a character vector (paralleling functionality in taxize
). If the name is recognized as an accepted name, the corresponding ID for the provider is returned.
get_ids("Homo sapiens")
get_names("ITIS:179913")
library(bench)
sp <- c("Dendrocygna autumnalis", "Dendrocygna bicolor", "Chen canagica", "Chen caerulescens" )
taxadb::td_create("itis", schema="dwc")
bench::bench_time( df_tb <- taxadb::filter_name(sp, "itis") ) df_tb
bench::bench_time( df_tl <- taxalight::tl(sp, "itis") ) df_tl
bench::bench_time( id_tb <- taxadb::get_ids(sp, "itis") ) id_tb
bench::bench_time( id_tl <- taxalight::get_ids(sp, "itis") ) id_tl
Under the hood, taxalight
consumes a DCAT2/PROV-O based description of the data provenance which generates the standard-format tables imported by taxalight
(and taxadb
) from the original data published by the naming providers. All data and scripts are identified by content-based identifiers, which can be resolved by https://hash-archive.org or the R package, contentid
. This provides several benefits over resolving data from a URL source:
Input data and scripts for transforming the data into the desired format are similarly archived and referenced by content identifiers in the provenance trace.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.