This document describes how to use the aquamapsdata
R data package to access curated data through a static database assembled from data sourced from https://aquamaps.org
Immediately after installing the package, a run-once action is needed, in order to download and locally create the SQLite database containing all the AquaMaps data. A small minified limited (< 1MB) variant of the database is included in the package to simplify package development, and can be activated by using default_db("extdata")
.
Approximately 10G disk space is needed locally when remotely downloading the database file. The download is around 2G compressed and therefore a speedy Internet connection is recommended for this initial step.
# install aquamapsdata from GitHub using devtools install.packages("devtools") library("devtools") install_gitub("raquamaps/aquamapsdata", dependencies = TRUE) # initial run-once step required to install remote db locally library(aquamapsdata) download_db(force = TRUE) default_db("sqlite")
Once the database is available locally, it can be queried using a couple of different functions.
Please remember to begin your session by activating the connection to the downloaded database:
library(aquamapsdata) default_db("sqlite")
This package provides data that can be queried with tidyverse tools such as dplyr
.
It also requires some spatial tools (sp
and raster
) to be installed.
library(aquamapsdata) library(dplyr) # This vignette is built using a minified offline db bundled into the package, # so vignettes can be built in the cloud without requiring a # full install and download of the database (time saver) invisible(default_db("extdata")) # NB: normally download_db() would be used first, followed by # default_db("sqlite")
Taxonomy can be searched and queried using fuzzy and exact name searches, returning keys with the internal identifiers used in the database.
Those keys could be said to represent species lists that can be used to retrieve other information, such as environmental envelopes etc.
# fuzzy search allows full text search operators AND, OR, NOT and + # see https://www.sqlitetutorial.net/sqlite-full-text-search/ am_search_fuzzy(search_term = "trevally") %>% pull(key) # exact search without parameters returns all results nrow(am_search_exact()) # exact search giving NULL params shows examples of existing values # here we see what combinations are present in the dataset for # angling, diving, dangerous, highseas, deepwater organisms am_search_exact( angling = NULL, diving = NULL, dangerous = NULL, deepwater = NULL, highseas = NULL, m_invertebrates = NULL) # exact search without NULL params, specifying values hits <- am_search_exact(angling = 1, diving = 1, dangerous = 0) # display results display <- hits %>% mutate(binomen = paste(Genus, Species)) %>% select(SpeciesID, binomen, SpecCode, FBname) knitr::kable(display)
With a species identifier, probability of occurrence within the known native distribution for a species can either be retrieved in raster format or be displayed on a map .
Here we display the computer-generated native map for the Bluespotted trevally:
library(leaflet) library(raster) library(aquamapsdata) # get the identifier for the species key <- am_search_fuzzy("bluespotted")$key ras <- am_raster(key) # show the native habitat map am_map_leaflet(ras, title = "Bluespotted trevally") %>% leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2)
Source: r am_citation("md")
Attribution with citation information and copyright disclaimer should be included. There is a function called am_citation()
which provides this information in "text" or "md" format (suitable for use inside Rmarkdown documents).
# include a citation in text format am_citation()
We can also display a map using several identifiers, for example those associated with the genus "Caranx", and should then provide an aggregation function such as "count".
keys <- am_search_exact(Genus = "Caranx")$SpeciesID ras <- am_raster(keys, fun = "count") am_map_leaflet(ras, title = "Caranx") %>% leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2) am_citation("md")
For biodiversity maps within a specific bounding box, see the function am_csc_from_extent
, which provides biodiversity map data for all species, optionally filtered by a user-defined probability threshold (where 0.5 is the default used at aquamaps.org, see the help for functions am_species_in_csc
and am_species_per_csc
).
A few examples follow below and describe how the data in the database can be queried using the provided functions in the package.
Likely occurrences for a species is provided for locations in a half degree cell grid. Individual cells have characteristics associated, in a Half degree Cell Authority File, with data available through the am_hcaf()
function.
Information about this table is available in the help with fields explained in the am_meta
dataset.
A subset of HCAF records can be retrieved based on a certain criteria or field in that table.
am_hcaf() %>% head(1) %>% collect() %>% names() # compute depth across all cells am_hcaf() %>% summarize(depth = mean(DepthMean, na.rm = TRUE)) %>% collect() %>% pull(depth) # cells with a depth value larger than 4000 deepwater <- am_hcaf() %>% filter(DepthMean > 4000) %>% pull(CsquareCode) # some of the on average deepest locations deepwater
The cell location identifier CsquareCode can be used to look up what species are likely occuring there.
# species likely to occur in deepwater location(s) deepwater_species <- am_species_in_csc(deepwater, min_prob = 0.5) deepwater_species key <- deepwater_species$SpeciesID am_search_exact(SpeciesID = key)
The am_hspen() function provides the input data used to generate a species’environmental envelopes and the envelopes themselves. Information about this table is available in the help with fields explained in the am_meta
dataset.
HSPEN data can be queried for example based on taxonomy for a single species or higher taxa associated with that species, or any other relevant species identifiers.
# use one or more keys for species key <- am_species_in_csc(deepwater, min_prob = 0.5)$SpeciesID am_hspen() %>% filter(SpeciesID == key) %>% head(1) %>% collapse%>% glimpse() # for higher taxa - find the keys associated with higher taxa # am_search_exact(Family = am_search_exact(SpeciesID = key)$Family)
In the beginning of the vignette, the taxonomy name search functions were illustrated with some examples. These allow the user to get a list of available mapped species based on a certain criteria (e.g. taxonomic group), so that biodiversity (or species richness) for that group can be mapped.
We can also use a function to list species occurring in an area; a location identified by a single or multiple CsquareCode identifiers. Another function allows querying for species diversity or richness across a set of cells. Both functions allows for specifying a probability threshold as deemed fit.
A location can be determined based on a user defined bounding box or extent (given by four coordinates). Using am_hcaf() a set of cells can be determined based on other criteria, allowing retrieval of CsquareCode cell identifiers that belong to a specific LME, for example.
# get cell identifiers for a bounding box or extent csc <- am_csc_from_extent(100, 120, -22, -7)$CsquareCode # within in this area, the following species are listed appear, each in n cells am_species_in_csc(csc) # in each cell location, the following number of distinct species are likely # a measure of species diversity or "richness" am_species_per_csc(csc, min_prob = 0.8)
Please note that the database file differs from the complete version available online at https://aquamaps.org in the following respects:
The database is incomplete in terms of species mapped (26,399 /33,518). It is based on AquaMaps’ conservative rule of generating envelopes and predictions for species with >=10 ‘good cells’ and excludes records of data-poor species (i.e. endemic and/or rare species). Please contact the AquaMaps team directly if you want access to the complete dataset.
The map data provided (hcaf_species_native table) give computer-generated predictions. Please contact the AquaMaps team directly if you need to access the latest reviewed/improved species maps.
Map data showing future species distributions for 2050 and 2100 (under different RCP scenarios) are excluded. Please contact the AquaMaps team directly if you are interested in these datasets.
We strongly encourage partnering with the AquaMaps team for larger research projects or publications that would make intensive use of AquaMaps to ensure that you have access to the latest version and/or reviewed maps, that the limitations of the data set are clearly understood and addressed, and that critical maps and/or unlikely results are recognized as such and double-checked for correctness prior to drawing conclusions and/or subsequent publication.
The AquaMaps team can be contacted through Rainer Froese (rfroese@geomar.de) or Kristin Kaschner (Kristin.Kaschner@biologie.uni-freiburg.de).
The dplyr
package can be used to query the various tables available in the database.
Here is a description of tables and fields which are included.
knitr::kable(am_meta)
This section describes how the data was prepared for usage in this R package. It may be of interest maybe not primarily for package users, but for those interested in understanding the data preparation steps involved in preparing the dataset for use in this package.
For data management and preparation, several steps are involved in preparing the dataset used in this package. These steps involve moving the relevant parts of the source data from its primary source into a local SQLite3 database that the package uses.
The source data lives in a MySQL/MariaDB database. If this data is made available in the form of a backup from a raw datadir or, preferably, in the form of a data dump, this can be loaded into a local MariaDB database engine.
With docker-compose
this can be done in one step, using the command docker-compose up -d
and this docker-compose.yml
file:
volumes: db: services: db: image: mariadb:latest ports: - "3306:3306" environment: - MYSQL_ROOT_PASSWORD=your_root_db_password - MYSQL_DATABASE=aquamapsdb - MYSQL_USER=your_db_user - MYSQL_PASSWORD=your_db_password volumes: - db:/var/lib/mysql - ./aquamaps.sql:/docker-entrypoint-initdb.d/aquamaps.sql:ro
After this step, the data is available to access locally through aquamapsdata::am_con()
.
The "metadata" for table and field names and their descriptions is provided through aquamapsdata::am_meta
which is prepared by means of (data-raw/am-meta.R). This metadata is used in package documentation and the am_search_exact()
function allows for taxonomic searches using some of those fields.
A set of functions then allows for syncing the data into an SQLite3 database with full text search support, which gets indexed.
Relevant steps are:
db_sync()
from source connection to target db.am_create_fts()
am_create_indexes()
The function am_search_exact
takes a lot of parameters, which can be combined, to query the taxonomy in a single call.
The am_search_fuzzy
is quick and allows FTS5 search syntax (search terms which can be quoted and also combined with AND, OR, NOT).
These functions returns search results containing keys or identifiers that can be used to retrieve map data in raster format through am_raster()
. With such a raster a leaflet map can be created with am_map_leaflet()
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.