knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
taxadb
is designed to work with a variety of different "backends" -- software that works under the hood to store and retrieve the requested data. taxadb
has an intelligent default method selector which will attempt to use the best method available on your system, which means you can use taxadb
without having to worry about these details. However, to improve performance of taxadb
, becoming familiar with these backends can yield significant improvements in performance.
RSQLite
is the default database backend if no suggested backend is detected. RSQLite
has no external software dependencies and will be automatically installed with taxadb
(it is a hard dependency as an imported rather than suggested package). The term Lite
indicates that SQLite does not require the separate "server" and "client" software model found on traditional databases such as MySQL, and SQLite is widely used in consumer software everywhere. RSQLite packages SQLite for R. It enables persistent local storage for R applications but will be slower than the alternatives. For certain operations it can be significantly slower.
MonetDBLite
is a modern alternative to RSQLite
. MonetDBLite
is both more powerful than SQLite (in supporting a greater array of operations), and can run much faster. Filtering joins in particular can be much faster even than the in-memory operations of dplyr
. Because filtering joins lie at the heart of many taxadb
functions this can yield substantial improvements in performance. Unfortunately, the R interface, MonetDBLite
was removed from CRAN in April 2019. The package can still be installed from GitHub by running devtools::install_github("hannesmuehleisen/MonetDBLite-R")
, though this requires the appropriate compilers. The developer plans to replace MonetDBLite with duckdb
, (see https://github.com/duckdb/duckdb), but this is not yet feature complete and thus not yet fully compatible for taxadb
use. Because installation is more difficult, MonetDBLite
is not a required dependency, but will be used by default if taxadb
detects an existing installation. duckdb
support will be switched on as the first priority in the method waterfall.
taxadb
can also be set to use in-memory only, without a backend. (Note that this is distinct from using RSQlite
or MonetDBLite
with over in-memory
mode, because it uses only native R data.frame
s to store data). This will tend to be faster that RSQLite
but slower than MonetDBLite
or duckdb
. In this mode, data will persist over a single session but not between sessions (since memory is cleared when the user quits out of R). Note that many taxonomic tables are quite large when uncompressed, and users with less than 8-16GB of free RAM may find their machine becomes slow or unresponsive in this mode.
Users can override the automatic preferences of taxadb
by setting the environmental variable TAXADB_DRIVER
. For example, running Sys.setenv(TAXADB_DRIVER="RSQLite")
will make RSQLite
the default driver, even if MonetDBLite
is installed.
The first time taxadb
accesses a data source, it will download and store the full dataset from that provider. Users can trigger a download ahead of time by running td_create()
, e.g. td_create("fb")
will create a local copy of the FishBase taxonomy. If a user does not call td_create()
first, taxadb
simply downloads the data the first time that provider is queried -- e.g. filter_name("Homo sapiens", "gibf")
will first download and install GBIF if that has not been done already. These download and install operations may be slow depending on your internet connection, but need be performed only once. Downloaded data is stored on your local harddisk and will persist between R sessions. The default location depends on the default set by your operating system (see the rappdirs
package). Users can configure this location by setting the environmental variable TAXADB_HOME
. For example, all unit tests in the package use temporary storage by setting Sys.setenv(TAXADB_HOME=tempdir())
, which is cleared out after the R session ends.
A user can install all available name providers up front with td_create("all")
. An overview of the available scientific name providers is found in the providers vignette.
taxadb
will work just as well with any DBI
-compatible database backend (Postgres, MariaDB, etc). All taxadb
functions take an argument taxadb_db
, which is just a DBI
connection used by dplyr
. For example, we can create an in-memory RSQLite connection and use that to store data for a single session:
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") taxadb::get_ids("Homo sapiens", taxadb_db = con)
Users can also call the td_connect()
function to connect to taxadb
's default databases. Running td_connect()
with no arguments will return the current default connection. This is a convenient way to confirm that your system is using the database engine you intended it to use. You can also use that connection to interact directly with the taxadb
databases (e.g. using dplyr
or DBI
functions).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.