knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 6, dpi=100, out.width = "95%" )
kuenm2 has a set of functions that help explore variability and uncertainty of suitability results obtaining from projections to distinct scenarios. In short, the following analyses can be performed:
projection_variability().single_mop() and projection_mop().At this point it is assumed that kuenm2 is installed (if not, see the Main guide). Load kuenm2 and any other required packages, and define a working directory (if needed).
Note: functions from other packages (i.e., not from base R or kuenm2) used in this guide will be displayed as package::function().
# Load packages library(kuenm2) library(terra) # Current directory getwd() # Define new directory #setwd("YOUR/DIRECTORY") # uncomment and modify if setting a new directory # Saving original plotting parameters original_par <- par(no.readonly = TRUE)
When used in projects in which multiple projections were performed, these analyses require a model_projections object, which is the output of project_selected(). Let's load data and produce the required objects to continue with comparisons and variability and uncertainty estimations.
For more details about model projections, see "Project Models to a Single Scenario" and "Project Models to Multiple Scenarios".
# Import calib_results_maxnet data("fitted_model_maxnet", package = "kuenm2") # Import path to raster files with future variables provided as example # The data is located in the "inst/extdata" folder. in_dir <- system.file("extdata", package = "kuenm2") # Import raster layers (same used to calibrate and fit final models) var <- rast(system.file("extdata", "Current_variables.tif", package = "kuenm2")) # Get soilType soiltype <- var$SoilType # Organize and structure WorldClim files # Create folder to save structured files out_dir_future <- file.path(tempdir(), "Future_raw") # Here, in a temporary directory # Organize organize_future_worldclim(input_dir = in_dir, # Path to the raw variables from WorldClim output_dir = out_dir_future, name_format = "bio_", # Name format static_variables = var$SoilType, # Static variables progress_bar = FALSE, overwrite = TRUE) # Create a "Current_raw" folder in a temporary directory # and copy the rawvariables there. out_dir_current <- file.path(tempdir(), "Current_raw") dir.create(out_dir_current, recursive = TRUE) # Save current variables in temporary directory terra::writeRaster(var, file.path(out_dir_current, "Variables.tif"), overwrite = TRUE) # Prepare projections using fitted models to check variables pr <- prepare_projection(models = fitted_model_maxnet, present_dir = out_dir_current, # Directory with present-day variables future_dir = out_dir_future, # Directory with future variables future_period = c("2041-2060", "2081-2100"), future_pscen = c("ssp126", "ssp585"), future_gcm = c("ACCESS-CM2", "MIROC6")) # Project selected models to multiple scenarios ## Create a folder to save projection results # Here, in a temporary directory out_dir <- file.path(tempdir(), "Projection_results/maxnet") dir.create(out_dir, recursive = TRUE) ## Project selected models to multiple scenarios p <- project_selected(models = fitted_model_maxnet, projection_data = pr, out_dir = out_dir, write_replicates = TRUE, progress_bar = FALSE, # Do not print progress bar overwrite = TRUE)
When projecting models, predictions can vary according to different replicates, model parameterizations, and GCMs. The projection_variability() function quantifies variability from these sources, offering valuable insights into what makes predictions fluctuate the most and where.
The projection_variability() function requires a model_projections object, which is generated using project_selected(). By default, it uses the median consensus to summarize variance across selected models and GCMs. Alternatively, users can specify the mean to use it as a representative instead.
To be able to assess variability from replicates, make sure that write_replicates = TRUE is set when running project_selected().
Below, we demonstrate how to calculate variance from these different sources (replicates, models, and GCMs) and save the results to the designated out_dir directory.
# Create a directory to save results v <- projection_variability(model_projections = p, write_files = TRUE, output_dir = out_dir, verbose = FALSE, overwrite = TRUE)
The output is an object named variability_projections, a list containing SpatRaster layers that represent the variance deriving from replicates, models, and GCMs for each scenario (including the present). These results can help detect areas where prediction variability is higher.
For example, for the present time scenario, the variance mainly comes from differences among replicates.
# Variability for the present terra::plot(v$Present, range = c(0, 0.15))
In the most pessimistic scenario (SSP5-8.5) for 2081–2100, variability is not too high and comes primarily from replicates and GCMs.
# Variability for Future_2081-2100_ssp585 terra::plot(v$`Future_2081-2100_ssp585`, range = c(0, 0.1))
Because write_files = TRUE was set, the variability_projections object includes the file path where the results were saved. You can use this path with the import_results() function to load the results when needed.
# See the folder where the results were saved v$root_directory #> [1] "Temp/Projection_results/maxnet/variance"
As an example, we will import the results for the 2041–2060 period and the SSP1-2.6 scenario. In this scenario, the variability mainly originates from differences among the selected models (see below).
# Importing results v_2041_2060_ssp126 <- import_results(projection = v, present = FALSE, # Do not import results from the present time future_period = "2041-2060", future_pscen = "ssp126") # Plot terra::plot(v_2041_2060_ssp126, range = c(0, 0.1))
The variability_projections object is a list that contains resulting SpatRaster layers (if return_raster = TRUE) and the directory path where the results were saved (if write_files = TRUE).
If the results were not saved to disk during the initial run, you can save the SpatRaster layers afterward using the writeRaster() function. For example, to save the variability map for one of the future scenarios:
writeRaster(v$`Future_2081-2100_ssp585`, file.path(out_dir, "Future_2081-2100_ssp585.tif"))
If the results were saved to disk, the variability_projections object is automatically stored in a folder named variance within the specified output_dir. You can read this object to the R environment in future sessions using the readRDS() function:
v <- readRDS(file.path(out_dir, "variance/variability_projections.rds"))
This object can then be used to import the results with the import_results() function.
When projecting models to new regions or time periods, it is common to encounter conditions that are non-analogous to those used to train models.
For example, the maximum annual mean temperature (bio_1) in our model's calibration data was $22.7^\circ C$:
max(fitted_model_maxnet$calibration_data$bio_1)
In contrast, in future scenarios, conditions are projected to become warmer. To illustrate this, let's import environmental variables from one of the GCMs (ACCESS-CM2) under the future scenario SSP5-8.5 and examine the maximum temperature:
# Import variables from the 2081-2100 period (SSP585, GCM MIROC6) future_ACCESS_CM2 <- rast(file.path(out_dir_future, "2081-2100/ssp585/ACCESS-CM2/Variables.tif")) # Range of values in bio_1 terra::minmax(future_ACCESS_CM2$bio_1) # Plot terra::plot(future_ACCESS_CM2$bio_1, breaks = c(-Inf, 22.7, Inf)) # Highlight regions with values above 22.7ºC
Note that temperatures are expected to exceed the current maximum temperature in most of the area under future conditions.
The functions single_mop() and projection_mop() compute the Mobility-Oriented Parity (MOP) metric (Owens et al. 2013) with recent updates implemented in the mop package (Cobos et al. 2024). This metric calculates dissimilarity between training and projection conditions, and detect areas with conditions outside training ranges. Areas identified outside training ranges and those with large dissimilarity (distances) values are where model predictions are a product of extrapolation. Depending on how the model is predicting outside ranges (see response curves), interpreting those results can be risky because of uncertainties in prediction behavior (hence the term extrapolation risks).
The MOP analysis requires the following inputs:
fitted_models returned by the fit_selected() function, or an object of class prepared_data returned by prepare_data(). These objects contain the environmental data used during model training.single_mop(), a SpatRaster with the environmental variables representing a single scenario for model projections. projection_mop(), an object of class projection_data returned by the prepare_projection() function, with the paths to raster layers representing environmental conditions of multiple scenarios for model projections.Important note: Most likely, MOP needs to consider only the variables used in the selected models. For that reason, using a fitted_models object and setting subset_variables = TRUE is preferred.
By default, projection_mop() does not compute distances and performs a basic type of MOP, which highlights only regions with conditions outside training ranges. Alternatively, you can set:
type = "simple" to compute an additional layer with the number of variables with values outside training conditions per location.type = "detailed" to add multiple layers that identify exactly which variables exhibit conditions outside training ranges, and how.calculate_distance = TRUE to compute multivariate distances from training to all conditions in the scenarios of projections.Below, we perform a detailed MOP to identify areas with extrapolation risk in the future scenarios for which predictions were made:
# Create a folder to save MOP results out_dir_mop <- file.path(tempdir(), "MOPresults") dir.create(out_dir_mop, recursive = TRUE) # Run MOP kmop <- projection_mop(data = fitted_model_maxnet, projection_data = pr, subset_variables = TRUE, calculate_distance = TRUE, out_dir = out_dir_mop, type = "detailed", overwrite = TRUE, progress_bar = FALSE)
This application returns a mop_projections object, which contains the paths to the directories where the results were saved. This object can be used with the import_results() function to load the results.
The results of the MOP analysis provide multiple perspectives on extrapolation risks. Each component a different aspect of the dissimilarity between training and projection environmental conditions.
When importing results, you can specify the periods and scenarios (e.g., "2081-2100" and "ssp585"), as well as the type of MOP results to load. By default, all available MOP types are imported, these include: distances (if calculated), basic, simple, towards_high_end, towards_low_end, towards_high_combined, and towards_low_combined.
Below, we explore all MOP results for the SSP5-8.5 scenario and the 2081–2100 period:
# Import results from MOP runs mop_ssp585_2100 <- import_results(projection = kmop, future_period = "2081-2100", future_pscen = "ssp585") # See types of results names(mop_ssp585_2100)
The element distances represents Euclidean or Mahalanobis distances depending on what is defined in the argument distance. Large distance values indicate greater dissimilarity from training conditions, highlighting areas with high extrapolation risk.
terra::plot(mop_ssp585_2100$distances)
The element basic identifies areas where at least one environmental variable is outside training conditions. A value of '1' indicates the presence of non-analogous conditions in that specific area and scenario.
terra::plot(mop_ssp585_2100$basic)
The results in simple quantify the number of environmental variables in the projected area that outside training conditions.
terra::plot(mop_ssp585_2100$simple)
The results in towards_high_end and towards_low_end identify which areas show non-analogous conditions for each of the variables independently. An important detailed considered here is that the results show conditions above the maximum (towards high) and below the minimum (towards low) values in training data.
# Non-analogous conditions towards high values in the ACCESS-CM2 scenario terra::plot(mop_ssp585_2100$towards_high_end$`Future_2081-2100_ssp585_ACCESS-CM2`) # Non-analogous conditions towards low values in the MIROC6 scenario terra::plot(mop_ssp585_2100$towards_low_end$`Future_2081-2100_ssp585_ACCESS-CM2`, main = names(mop_ssp585_2100$towards_low_end$`Future_2081-2100_ssp585_ACCESS-CM2`))
The towards_high_combined and towards_low_combined combine the previous results but keep the identity of the variables detected outside ranges. Specifically, towards_high_combined identifies variables with values exceeding those observed in training data, whereas towards_low_combined shows variables with values below the training range.
# Variables with conditions above training range terra::plot(mop_ssp585_2100$towards_high_combined) # Variables with conditions below training range terra::plot(mop_ssp585_2100$towards_low_combined)
By default, the functions that run the MOP assign NA to cells with values within the training range. Alternatively, you can assign a value of 0 to these cells by setting na_in_range = FALSE when running projection_mop().
# Create a folder to save MOP results, now assigning 0 to cells within the range out_dir_mop_zero <- file.path(tempdir(), "MOPresults_zero") dir.create(out_dir_mop_zero, recursive = TRUE) # Run MOP kmop_zero <- projection_mop(data = fitted_model_maxnet, projection_data = pr, subset_variables = TRUE, na_in_range = FALSE, # Assign 0 to cells within range calculate_distance = TRUE, out_dir = out_dir_mop_zero, type = "detailed", overwrite = TRUE, progress_bar = FALSE)
Let's explore how this setting affects the simple and detailed MOP outputs:
# Import MOP for the scenario ssp585 in 2081-2100 mop_ssp585_2100_zero <- import_results(projection = kmop_zero, future_period = "2081-2100", future_pscen = "ssp585") # Compare with the MOP that assigns NA to cells within the calibration range # Simple MOP terra::plot(c(mop_ssp585_2100$simple$`Future_2081-2100_ssp585_ACCESS-CM2`, mop_ssp585_2100_zero$simple$`Future_2081-2100_ssp585_ACCESS-CM2`), main = c("Within range as NA", "Within range as 0")) # Detailed MOP terra::plot(c(mop_ssp585_2100$towards_high_combined$`Future_2081-2100_ssp585_ACCESS-CM2`, mop_ssp585_2100_zero$towards_high_combined$`Future_2081-2100_ssp585_ACCESS-CM2`), main = c("Within range as NA", "Within range as 0"), plg=list(cex=0.6))
While MOP helps us identify areas with non-analogous environmental conditions, whether risks of extrapolation exist depend on how model is behaving under non-analogous conditions. Model response curves are of great help to check model behavior under distinct conditions.
For example, consider a future scenario represented by the ACCESS-CM2 GCM and the SSP5-8.5 for 2081–2100. Here, values of bio_1 (Annual Mean Temperature), bio_12 (Annual Precipitation), and bio_15 (Precipitation Seasonality) exceed the upper limits observed in training data (see below).
# Non-analogous conditions towards high values in the ACCESS-CM2 scenario terra::plot(mop_ssp585_2100$towards_high_combined$`Future_2081-2100_ssp585_ACCESS-CM2`)
Now, let's check model response curves for these variables. To better visualize how the model responds to environmental conditions in this future scenario, we can set the plotting limits using this scenario’s layers as new_data:
# Plotting response curves to check extrapolation towards the high end of variables par(mfrow = c(1,3)) # Set plot grid response_curve(models = fitted_model_maxnet, variable = "bio_1", new_data = future_ACCESS_CM2) response_curve(models = fitted_model_maxnet, variable = "bio_12", new_data = future_ACCESS_CM2) response_curve(models = fitted_model_maxnet, variable = "bio_15", new_data = future_ACCESS_CM2)
In the response curves for bio_1, bio_12, and bio_15, higher values correspond to lower suitability, reaching zero near the upper limit of training data (indicated by a dashed line). Beyond this upper limit, suitability remains close to zero, thus, we are not concerned about extrapolation risks towards higher values for these variables.
Next, let's investigate the variables with values below the lower limit of training data:
# Non-analogous conditions towards low values in the ACCESS-CM2 scenario par(mfrow = c(1, 2)) # Set grid terra::plot(mop_ssp585_2100$towards_low_combined$`Future_2081-2100_ssp585_ACCESS-CM2`) ## It's bio 7. Plot response curve: response_curve(models = fitted_model_maxnet, variable = "bio_7", new_data = future_ACCESS_CM2)
In some regions of the projection scenario, bio_7 (Temperature Annual Range) exhibits values below the lower limit of the calibration data. The response curve indicates that suitability continues to increase when extrapolating to these lower values. This is a situation of concern due to extrapolation risks because we are uncertain about when suitability should start to decrease.
This example highlights why we strongly recommend interpreting MOP results alongside the response curves.
# Reset plotting parameters par(original_par)
When projection_mop() is run, it automatically saves the mop_projections object as an RDS file in out_dir. You can load this object in R at any time using the readRDS() function:
# Check for RDS files in the directory where we saved the MOP results list.files(path = out_dir_mop, pattern = "rds") # Import the mop_projections file kmop <- readRDS(file.path(out_dir_mop, "mop_projections.rds"))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.