library(knitr)

knitr::opts_knit$set(echo = TRUE, self.contained = FALSE)

Introduction

JNCC has produced species distribution model functions designed to be used with presence records from the NBN Atlas https://nbnatlas.org/, previously the NBN Gateway.

The functions MultiBNG() and MultiLL() are designed to automate the species distribution modelling process when running models for multiple species. By specifying the folders containing the relevant data and the arguments for use in the SDMs() function, the Multi_mod function can iterate through each species in a supplied species list until all of these have been processed. As well as this it can also use parallel processing, set through the 'mult_prssr' argument to run different species models in parallel.

The workflow will prepare the data based on the type of coordinates used (latlon vs bng). As with the single species modelling (see 'single species modelling' vignette), this can incorporate the random occurrence function for low resolution data and will create pseudo-absence points from a supplied mask. The points are then assessed against the environmental predictors in a supplied RasterStack. The ensemble model is customisable allowing you to run any of the following common species distribution models:

You can also specify how many times you would like each model to run and the proportion of your testing data you wish to use in evaluating model performance.

The function will select the best performing model for each species and return the model itself, the distribution that the model predicted as a GTiff file, and a csv file containing the evaluations of each model's performance for each time it was run for every species. These are returned in the folder specified in the outputs argument of the function.

Modelling using the multi-species functions

The main difference between the multiple species functions 'MultiLL' and 'MultiBNG' and the sdms function is that these are designed to automatically iterate through multiple species with minimal user input. Before using the function you will need the prepare:

  1. The dat_flder - A folder containing your species presence records.These should be as txt or csv files exported from NBN gateway or NBN atlas. Each file should contain data for a single species and the naming convention should correspond to your species list in order to be recognised. e.g. 'Triturus cristatus' in the sp_list should have a corresponding data file named 'Triturus cristatus.csv' in the dat_folder.

  2. The bkgd_flder - This is the folder location of your background masks, if you are using these in your models to generate the pseudo-absences for the species. These should be raster files showing the background area in which pseudo-absence points will be placed. These files should be named after the Taxon Group e.g. 'amphibian' and if this is not found in the data by a 'taxonGroup' variable. If no background mask is supplied, then pseudo absences with be generated from the variables layer.

  3. The vars RasterStack - A RasterStack of the environmental parameters to be used as predictor variables for the species range. This will be used to assess all the species you input into the species list.

  4. A sp_list - This is the list of species you wish to model and the names should correspond to the names of the files found in your dat_flder. Where output files for a species in the list are already found in the specified out_flder then this species won't be modelling, so as to avoid duplications. This can be loaded in as an excel file and then converted into a list using the base::unlist() function or simply loaded in as a list as with the example below.

As well as adjusting the model parameters, iterations and which models you run for each species, the function also lets you set whether to process using multiple processors. This helps to prevent overloading your machine by calling functions form the Parallel package to set up parallel processing whilst the models are running.

Example using Multi_BNG

data("ng_data")
data("sd_data")
data("background")

#Provide a list of species you wish to model
sp_list <- c("Notonecta_glauca", "Sigara_dorsalis")

#Organise an Input folder containing your input species files as .csv
if (!file.exists("Inputs")){
  dir.create("Inputs")
}
utils::write.csv(ng_data, file = "Inputs/Notonecta_glauca.csv")
utils::write.csv(sd_data, file = "Inputs/Sigara_dorsalis.csv")

#Organise a folder containing your background masks where your pseudo absences will be generated from.
if (!file.exists("BGmasks")){
 dir.create("BGmasks")
}

save(background, file = "BGmasks/Hemiptera")

#Create outputs folder
if (!file.exists("Outputs")){
  dir.create("Outputs")
}

# Preparing the variables data using worldclim
#get UK extent
UK <- ggplot2::map_data(map = "world", region = "UK")
max.lat <- ceiling(max(UK$lat))
min.lat <- floor(min(UK$lat))
max.lon <- ceiling(max(UK$long))
min.lon <- floor(min(UK$long))
extent <- raster::extent(x = c(min.lon, max.lon, min.lat, max.lat))

#get variables data
bio<-raster::getData('worldclim',var='bio',res=5,lon=-2,lat=40)
bio <- bio[[c("bio1","bio12")]]
names(bio) <- c("Temp","Prec")

#crop to uk
bio<-raster::crop(bio,extent)

#convert to easting northing
vars <- raster::projectRaster(bio,crs=raster::crs("+init=epsg:27700"))

#load the package
library(JNCCsdms)

#run the function
MultiBNG(sp_list = sp_list, vars, out_flder = "Outputs/",dat_flder = "Inputs/", bkgd_flder = "BGmasks/", max_tries = 1, datafrom = "NBNatlas", covarRes = 100, models = c("MaxEnt","BioClim", "SVM", "RF"), prop_test_data = 0.25, bngCol = "OSGR", mult_prssr = FALSE, rndm_occ = TRUE)
unlink('./BGmasks', recursive = T)
unlink('./Inputs', recursive = T)
unlink('./wc5', recursive = T)
unlink('./Outputs', recursive = T)


jncc/sdms documentation built on Aug. 13, 2021, 4:21 a.m.