MarineSPEED quickstart guide"

The goal of MarineSPEED is to provide a benchmark data set for presence-only species distribution modeling (SDM) in order to facilitate reproducible and comparable SDM research. It contains species occurrences (coordinates) from a wide diversity of marine species and associated environmental data from Bio-ORACLE and MARSPEC. Some additional information about MarineSPEED can be found in the R Shiny viewer at http://marinespeed.org.

Èxploring the data

Three functions help with exploring

library(marinespeed)

# set a data directory, preferably something different from tempdir to avoid 
# unnecessary downloads for every R session
options(marinespeed_datadir = tempdir())

# list all species
species <- list_species()

The first 5 species and there aphia_id (WoRMS species id) are:

knitr::kable(species[1:5,], row.names = FALSE)

The species information consists of species identifiers, taxonomic information from the World Register of Marine Species (WoRMS), a visual assessment score for the amount of sampling bias and the covered latitudinal zones.

# all species information
info <- species_info()
colnames(info)

Looping over all species data

To loop over the occurrence data of all species you have to call the lapply_species function. For instance if you wanted to count the total number of records in MarineSPEED you'd need the following code. As you can see the function passed to lapply_species expects to parameters, one for the species name and one for the actual occurrences.

get_occ_count <- function(speciesname, occ) {
  nrow(occ)
}
record_counts <- lapply_species(get_occ_count)
sum(unlist(record_counts))
868151

Cross-validation

To enable the usage of the same cross-validation k-fold datasets I splitted species occurrence data upfront in 5 folds (or 4 and 9 for grid) in 3 different ways:

Below code plots the training (blue) and test (red) occurrences for the first two disc folds of the first two species.

## plot first 2 disc folds for the first 2 species (blue=trainig, red=test)
plot_occurrences <- function(speciesname, data, fold) {
  fname <- paste0(sub(" ", "_", speciesname), fold, ".jpeg")
  jpeg(filename = fname)
  title <- paste0(speciesname, " (fold = ", fold, ")")  
  plot(data$occurrence_train[,c("longitude", "latitude")], pch=20, col="blue",
       main = title)
  points(data$occurrence_test[,c("longitude", "latitude")], pch=20, col="red")

  dev.off()
}

x <- lapply_kfold_species(plot_occurrences, species=species[1:2,],
                     fold_type = "disc", k = 1:2)
## plot first 2 disc folds for the first 2 species (blue=trainig, red=test)
plot_occurrences <- function(speciesname, data, fold) {
  title <- paste0(speciesname, " (fold = ", fold, ")")
  plot(data$occurrence_train[,c("longitude", "latitude")], pch=20, col="blue",
       main = title)
  points(data$occurrence_test[,c("longitude", "latitude")], pch=20, col="red")
}

lapply_kfold_species(plot_occurrences, species=species[1:2,],
                     fold_type = "disc", k = 1:2)

Lower level functions



Try the marinespeed package in your browser

Any scripts or data that you put into this service are public.

marinespeed documentation built on May 1, 2019, 10:26 p.m.