This document describes how to use the aquamapsdata R data package to access curated data through a static database assembled from data sourced from https://aquamaps.org

Immediately after installing the package, a run-once action is needed, in order to download and locally create the SQLite database containing all the AquaMaps data. A small minified limited (< 1MB) variant of the database is included in the package to simplify package development, and can be activated by using default_db("extdata").

Approximately 10G disk space is needed locally when remotely downloading the database file. The download is around 2G compressed and therefore a speedy Internet connection is recommended for this initial step.

# install aquamapsdata from GitHub using devtools

install.packages("devtools") 
library("devtools")
install_gitub("raquamaps/aquamapsdata", dependencies = TRUE)

# initial run-once step required to install remote db locally

library(aquamapsdata)
download_db(force = TRUE)
default_db("sqlite")

Once the database is available locally, it can be queried using a couple of different functions.

Please remember to begin your session by activating the connection to the downloaded database:

library(aquamapsdata)
default_db("sqlite")

Examples of usage

This package provides data that can be queried with tidyverse tools such as dplyr.

It also requires some spatial tools (sp and raster) to be installed.

library(aquamapsdata)
library(dplyr)

# This vignette is built using a minified offline db bundled into the package,
# so vignettes can be built in the cloud without requiring a 
# full install and download of the database (time saver)

invisible(default_db("extdata"))

# NB: normally download_db() would be used first, followed by 
# default_db("sqlite")

Taxonomy can be searched and queried using fuzzy and exact name searches, returning keys with the internal identifiers used in the database.

Those keys could be said to represent species lists that can be used to retrieve other information, such as environmental envelopes etc.

# fuzzy search allows full text search operators AND, OR, NOT and +
# see https://www.sqlitetutorial.net/sqlite-full-text-search/
am_search_fuzzy(search_term = "trevally") %>% pull(key)

# exact search without parameters returns all results
nrow(am_search_exact())

# exact search giving NULL params shows examples of existing values
# here we see what combinations are present in the dataset for 
# angling, diving, dangerous, highseas, deepwater organisms
am_search_exact(
  angling = NULL, diving = NULL, dangerous = NULL, 
  deepwater = NULL, highseas = NULL, m_invertebrates = NULL)

# exact search without NULL params, specifying values
hits <- 
  am_search_exact(angling = 1, diving = 1, dangerous = 0)

# display results
display <- 
  hits %>% mutate(binomen = paste(Genus, Species)) %>%
  select(SpeciesID, binomen, SpecCode, FBname)

knitr::kable(display)

Species maps

With a species identifier, probability of occurrence within the known native distribution for a species can either be retrieved in raster format or be displayed on a map .

Here we display the computer-generated native map for the Bluespotted trevally:

library(leaflet)
library(raster)
library(aquamapsdata)

# get the identifier for the species
key <- am_search_fuzzy("bluespotted")$key
ras <- am_raster(key)

# show the native habitat map
am_map_leaflet(ras, title = "Bluespotted trevally") %>%
  leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2)

Source: r am_citation("md")

Including attribution

Attribution with citation information and copyright disclaimer should be included. There is a function called am_citation() which provides this information in "text" or "md" format (suitable for use inside Rmarkdown documents).

# include a citation in text format

am_citation()

Biodiversity Maps

We can also display a map using several identifiers, for example those associated with the genus "Caranx", and should then provide an aggregation function such as "count".

keys <- am_search_exact(Genus = "Caranx")$SpeciesID

ras <- am_raster(keys, fun = "count")

am_map_leaflet(ras, title = "Caranx") %>%
  leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2)

am_citation("md")

For biodiversity maps within a specific bounding box, see the function am_csc_from_extent, which provides biodiversity map data for all species, optionally filtered by a user-defined probability threshold (where 0.5 is the default used at aquamaps.org, see the help for functions am_species_in_csc and am_species_per_csc).

Other usage examples

A few examples follow below and describe how the data in the database can be queried using the provided functions in the package.

Locations in cells and the half degree cell "authority file" table

Likely occurrences for a species is provided for locations in a half degree cell grid. Individual cells have characteristics associated, in a Half degree Cell Authority File, with data available through the am_hcaf() function.

Information about this table is available in the help with fields explained in the am_meta dataset.

A subset of HCAF records can be retrieved based on a certain criteria or field in that table.

am_hcaf() %>% head(1) %>% collect() %>% names()

# compute depth across all cells
am_hcaf() %>% 
  summarize(depth = mean(DepthMean, na.rm = TRUE)) %>% 
  collect() %>% 
  pull(depth)

# cells with a depth value larger than 4000
deepwater <- 
  am_hcaf() %>% filter(DepthMean > 4000) %>% pull(CsquareCode)

# some of the on average deepest locations
deepwater

The cell location identifier CsquareCode can be used to look up what species are likely occuring there.

# species likely to occur in deepwater location(s)
deepwater_species <- am_species_in_csc(deepwater, min_prob = 0.5)
deepwater_species

key <- deepwater_species$SpeciesID
am_search_exact(SpeciesID = key)

Species preferences or environmental envelope

The am_hspen() function provides the input data used to generate a species’environmental envelopes and the envelopes themselves. Information about this table is available in the help with fields explained in the am_meta dataset.

HSPEN data can be queried for example based on taxonomy for a single species or higher taxa associated with that species, or any other relevant species identifiers.

# use one or more keys for species
key <- am_species_in_csc(deepwater, min_prob = 0.5)$SpeciesID
am_hspen() %>% filter(SpeciesID == key) %>% head(1) %>% collapse%>% glimpse()

# for higher taxa - find the keys associated with higher taxa 
# am_search_exact(Family = am_search_exact(SpeciesID = key)$Family)

Species and locations

In the beginning of the vignette, the taxonomy name search functions were illustrated with some examples. These allow the user to get a list of available mapped species based on a certain criteria (e.g. taxonomic group), so that biodiversity (or species richness) for that group can be mapped.

We can also use a function to list species occurring in an area; a location identified by a single or multiple CsquareCode identifiers. Another function allows querying for species diversity or richness across a set of cells. Both functions allows for specifying a probability threshold as deemed fit.

A location can be determined based on a user defined bounding box or extent (given by four coordinates). Using am_hcaf() a set of cells can be determined based on other criteria, allowing retrieval of CsquareCode cell identifiers that belong to a specific LME, for example.

# get cell identifiers for a bounding box or extent
csc <- am_csc_from_extent(100, 120, -22, -7)$CsquareCode

# within in this area, the following species are listed appear, each in n cells
am_species_in_csc(csc)

# in each cell location, the following number of distinct species are likely
# a measure of species diversity or "richness"
am_species_per_csc(csc, min_prob = 0.8)

Data scope and content

Scope

Please note that the database file differs from the complete version available online at https://aquamaps.org in the following respects:

  1. The database is incomplete in terms of species mapped (26,399 /33,518). It is based on AquaMaps’ conservative rule of generating envelopes and predictions for species with >=10 ‘good cells’ and excludes records of data-poor species (i.e. endemic and/or rare species). Please contact the AquaMaps team directly if you want access to the complete dataset.

  2. The map data provided (hcaf_species_native table) give computer-generated predictions. Please contact the AquaMaps team directly if you need to access the latest reviewed/improved species maps.

  3. Map data showing future species distributions for 2050 and 2100 (under different RCP scenarios) are excluded. Please contact the AquaMaps team directly if you are interested in these datasets.

We strongly encourage partnering with the AquaMaps team for larger research projects or publications that would make intensive use of AquaMaps to ensure that you have access to the latest version and/or reviewed maps, that the limitations of the data set are clearly understood and addressed, and that critical maps and/or unlikely results are recognized as such and double-checked for correctness prior to drawing conclusions and/or subsequent publication.

The AquaMaps team can be contacted through Rainer Froese (rfroese@geomar.de) or Kristin Kaschner (Kristin.Kaschner@biologie.uni-freiburg.de).

Content

The dplyr package can be used to query the various tables available in the database.

Here is a description of tables and fields which are included.

knitr::kable(am_meta)

Data management

This section describes how the data was prepared for usage in this R package. It may be of interest maybe not primarily for package users, but for those interested in understanding the data preparation steps involved in preparing the dataset for use in this package.

For data management and preparation, several steps are involved in preparing the dataset used in this package. These steps involve moving the relevant parts of the source data from its primary source into a local SQLite3 database that the package uses.

Local replication of source database

The source data lives in a MySQL/MariaDB database. If this data is made available in the form of a backup from a raw datadir or, preferably, in the form of a data dump, this can be loaded into a local MariaDB database engine.

With docker-compose this can be done in one step, using the command docker-compose up -d and this docker-compose.yml file:

volumes:
  db:

services:

  db:
    image: mariadb:latest
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=your_root_db_password
      - MYSQL_DATABASE=aquamapsdb
      - MYSQL_USER=your_db_user
      - MYSQL_PASSWORD=your_db_password
    volumes:
      - db:/var/lib/mysql
      - ./aquamaps.sql:/docker-entrypoint-initdb.d/aquamaps.sql:ro

After this step, the data is available to access locally through aquamapsdata::am_con().

The "metadata" for table and field names and their descriptions is provided through aquamapsdata::am_meta which is prepared by means of (data-raw/am-meta.R). This metadata is used in package documentation and the am_search_exact() function allows for taxonomic searches using some of those fields.

Syncing into SQLite3

A set of functions then allows for syncing the data into an SQLite3 database with full text search support, which gets indexed.

Relevant steps are:

Exposing the data

The function am_search_exact takes a lot of parameters, which can be combined, to query the taxonomy in a single call.

The am_search_fuzzy is quick and allows FTS5 search syntax (search terms which can be quoted and also combined with AND, OR, NOT).

These functions returns search results containing keys or identifiers that can be used to retrieve map data in raster format through am_raster(). With such a raster a leaflet map can be created with am_map_leaflet().



raquamaps/aquamapsdata documentation built on Feb. 25, 2021, 10:28 p.m.