The goal of MarineSPEED is to provide a benchmark data set for presence-only species distribution modeling (SDM) in order to facilitate reproducible and comparable SDM research. It contains species occurrences (coordinates) from a wide diversity of marine species and associated environmental data from Bio-ORACLE and MARSPEC. Some additional information about MarineSPEED can be found in the R Shiny viewer at http://marinespeed.org.
Three functions help with exploring
library(marinespeed) # set a data directory, preferably something different from tempdir to avoid # unnecessary downloads for every R session options(marinespeed_datadir = tempdir()) # list all species species <- list_species()
The first 5 species and there aphia_id (WoRMS species id) are:
knitr::kable(species[1:5,], row.names = FALSE)
The species information consists of species identifiers, taxonomic information from the World Register of Marine Species (WoRMS), a visual assessment score for the amount of sampling bias and the covered latitudinal zones.
# all species information info <- species_info() colnames(info)
To loop over the occurrence data of all species you have to call the lapply_species function. For instance if you wanted to count the total number of records in MarineSPEED you'd need the following code. As you can see the function passed to lapply_species expects to parameters, one for the species name and one for the actual occurrences.
get_occ_count <- function(speciesname, occ) { nrow(occ) } record_counts <- lapply_species(get_occ_count) sum(unlist(record_counts))
868151
To enable the usage of the same cross-validation k-fold datasets I splitted species occurrence data upfront in 5 folds (or 4 and 9 for grid) in 3 different ways:
Below code plots the training (blue) and test (red) occurrences for the first two disc folds of the first two species.
## plot first 2 disc folds for the first 2 species (blue=trainig, red=test) plot_occurrences <- function(speciesname, data, fold) { fname <- paste0(sub(" ", "_", speciesname), fold, ".jpeg") jpeg(filename = fname) title <- paste0(speciesname, " (fold = ", fold, ")") plot(data$occurrence_train[,c("longitude", "latitude")], pch=20, col="blue", main = title) points(data$occurrence_test[,c("longitude", "latitude")], pch=20, col="red") dev.off() } x <- lapply_kfold_species(plot_occurrences, species=species[1:2,], fold_type = "disc", k = 1:2)
## plot first 2 disc folds for the first 2 species (blue=trainig, red=test) plot_occurrences <- function(speciesname, data, fold) { title <- paste0(speciesname, " (fold = ", fold, ")") plot(data$occurrence_train[,c("longitude", "latitude")], pch=20, col="blue", main = title) points(data$occurrence_test[,c("longitude", "latitude")], pch=20, col="red") } lapply_kfold_species(plot_occurrences, species=species[1:2,], fold_type = "disc", k = 1:2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.