knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.align = "center") options(knitr.table.format = "html")
library(kableExtra) data <- SDMtune:::t files <- list.files(path = file.path(system.file(package = "dismo"), "ex"), pattern = "grd", full.names = T) predictors <- terra::rast(files)
In the previous articles you have learned how to prepare the data for the analysis, how to train a model, how to make predictions and how to evaluate a model using SDMtune. In this article you will learn different evaluation strategies to achieve a better estimate of the model performance.
First we load the SDMtune package:
library(SDMtune)
It's always a good practice to split the species locations into two parts and use one part to train the model and the remaining part to evaluate it. We can use the trainValTest()
function for this purpose. Let's say we want to use 80\% of the species locations to train our model and 20\% as testing dataset to evaluate it:
library(zeallot) # For unpacking assignment c(train, test) %<-% trainValTest(data, test = 0.2, only_presence = TRUE, seed = 25) maxnet_model <- train("Maxnet", data = train)
The only_presence
argument is used to split only the presence and not the background locations. We can now evaluate the model using the testing dataset that has not been used to train the model:
cat("Training auc: ", auc(maxnet_model)) cat("Testing auc: ", auc(maxnet_model, test = test))
We can plot the ROC curve for both, training and testing datasets, with:
plotROC(maxnet_model, test = test)
This approach is valid when we have a large dataset. In our case, with only r nrow(train@data[train@pa == 1, ])
observations, the evaluation depends strongly on how we split our presence locations. Let's run a small experiment in which we perform different train/test splits and we compute the AUC:
output <- data.frame(matrix(NA, nrow = 10, ncol = 3)) # Create an empty data.frame colnames(output) <- c("seed", "trainAUC", "testAUC") set.seed(25) seeds <- sample.int(1000, 10) # Create 10 different random seeds for (i in seq_along(seeds)) { # Loop through the seeds c(train, test) %<-% trainValTest(data, test = 0.2, seed = seeds[i], only_presence = TRUE) # Make the train/test split m <- train("Maxnet", data = train) # train the model # Populate the output data.frame output[i, 1] <- seeds[i] output[i, 2] <- auc(m) output[i, 3] <- auc(m, test = test) }
The testing AUC varies from r round(min(output[, 3]), 3)
to r round(max(output[, 3]), 3)
.
# Print the output
output
# Print the output kableExtra::kable(output) |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), position = "center", full_width = FALSE)
When we have to deal with a small dataset a better approach is the cross validation.
To perform a cross validation in SDMtune we have to pass the fold
argument to the train()
function. First we have to create the folds. There are several way to create them, here we explain how to make a random partition of 4 folds using the function randomFolds()
:
folds <- randomFolds(data, k = 4, only_presence = TRUE, seed = 25)
The output of the function is a list containing two matrices, the first for the training and the second for the testing locations. Each column of one matrix represents a fold with TRUE
for the locations included in and FALSE
excluded from the partition.
Let's perform a 4 fold cross validation using the Maxnet method (note that we use the full dataset):
cv_model <- train("Maxnet", data = data, folds = folds) cv_model
cv_model <- SDMtune:::bm_maxnet_cv cv_model
The output in this case is an SDMmodelCV()
object. It contains the four trained models in the models
slot and the fold partitions in the folds
slot. We can compute the AUC of a SDMmodelCV()
object using:
cat("Training AUC: ", auc(cv_model)) cat("Testing AUC: ", auc(cv_model, test = TRUE))
this returns the AUC value averaged across the four different models.
Repeat the analysis using the default_model
that we created in the train a model article.
The train()
function accepts folds created with two other packages:
ENMeval
[@Muscarella2014]blockCV
[@Valavi2019]The function will convert internally the created folds into the correct format for SDMtune
. These packages have specific function to create folds partitions that are spatially or environmentally independent.
Block partition using the package ENMeval
:
library(ENMeval) block_folds <- get.block(occ = data@coords[data@pa == 1, ], bg.coords = data@coords[data@pa == 0, ]) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = block_folds)
Checkerboard1 partition using the package ENMeval
:
cb_folds <- get.checkerboard1(occ = data@coords[data@pa == 1, ], env = predictors, bg.coords = data@coords[data@pa == 0, ], aggregation.factor = 4) model <- train(method = "Maxnet", data = data, fc = "l", reg = 0.8, folds = cb_folds)
Environmental block using the package blockCV
:
library(blockCV) # Create sf object sf_df <- sf::st_as_sf(cbind(data@coords, pa = data@pa), coords = c("X", "Y"), crs = terra::crs(predictors, proj = TRUE)) # Spatial blocks spatial_folds <- cv_spatial(x = sf_df, column = "pa", rows_cols = c(8, 10), k = 5, hexagon = FALSE, selection = "systematic") model <- train(method = "Maxent", data = data, fc = "l", reg = 0.8, folds = spatial_folds)
In this article you have learned:
SDMmodelCV()
object.In the next article you will learn how to display the variable importance and how to plot the response curve.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.