In sgvignali/SDMtune: Species Distribution Model Selection

knitr::opts_chunk$set(comment = "#>",
                      collapse = TRUE,
                      eval = FALSE,
                      fig.align = "center")

The other vignettes are based on presence only methods. Here you will learn how to train a presence absence model. The following examples are based on the Artificial Neural Networks method [@Venables2002], but you can adapt the code for any of the other supported methods. We use the first 8 environmental variables and the virtualSp dataset selecting the absence instead of the background locations.

library(SDMtune)
library(zeallot)

# Prepare data
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"),
                    pattern = "grd",
                    full.names = TRUE)

predictors <- terra::rast(files)
p_coords <- virtualSp$presence
a_coords <- virtualSp$absence
data <- prepareSWD(species = "Virtual species",
                   p = p_coords,
                   a = a_coords,
                   env = predictors[[1:8]])

# Split data in training and testing datasets
c(train, test) %<-% trainValTest(data,
                                 test = 0.2,
                                 seed = 25)

cat("# Training  : ", nrow(train@data))
cat("\n# Testing   : ", nrow(test@data))

# Create folds
folds <- randomFolds(train,
                     k = 4,
                     seed = 25)

Train the model

We first train the model with default settings and using 10 neurons:

set.seed(25)
model <- train("ANN",
               data = train,
               size = 10,
               folds = folds)

model

Let's check the training and testing AUC:

auc(model)
auc(model, test = TRUE)

Tune model hyperparameters

To check which hyperparameters can be tuned we use the function getTunableArgs function:

getTunableArgs(model)

We use the function optimizeModel to tune the hyperparameters:

h <- list(size = 10:50,
          decay = c(0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5),
          maxit = c(50, 100, 300, 500))

om <- optimizeModel(model,
                    hypers = h,
                    metric = "auc",
                    seed = 25)

The best model is:

best_model <- om@models[[1]]
om@results[1, ]

Evaluate the final model

We now train a model with the same configuration as found by the functionoptimizeModel, without cross validation, using all the train data, and we evaluate it using the held apart testing dataset:

set.seed(25)
final_model <- train("ANN",
                     data = train,
                     size = om@results[1, 1],
                     decay = om@results[1, 2],
                     maxit = om@results[1, 4])

plotROC(final_model,
        test = test)