knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "man/figures/README-", # out.width = "100%", fig.width = 6, fig.height = 6, # dpi = 60, echo = TRUE, warning = FALSE, eval = TRUE )
library(knitr)
Species distribution modeling (SDM) has become a standard tool in multiple research areas, including ecology, conservation biology, biogeography, paleobiogeography, and epidemiology. SDM is an area of active theoretical and methodological research The flexsdm package provides users the ability to manipulate and parameterize models in a variety of ways that meet their unique research needs.
This flexibility enables users to define their own complete or partial modeling procedure specific for their modeling situations (e.g., number of variables, number of records, different algorithms and ensemble methods, algorithms tuning, etc.).
In this vignette, users will learn about the post-modeling set of functions in the flexsdm package. These functions were designed with the aim of assisting the flexsdm user in predicting, evaluating, and correcting SDMs.
These are the functions created for model prediction, evaluation and correction:
Post-modeling functions
sdm_predict() Spatial predictions from individual and ensemble models
sdm_summarize() Merge model performance tables
interp() Raster interpolation between SDM predictions to two time periods
extra_eval() Measure model extrapolation
extra_correct() Constraint suitability values under a given extrapolation value
msdm_priori() Create spatial predictor variables to reduce overprediction from species distribution models
msdm_posteriori() Methods to correct overprediction of species distribution models based on occurrences and suitability patterns
Install the flexsdm package. You can install the released version of flexsdm from github with:
# devtools::install_github('sjevelazco/flexsdm') library(flexsdm) library(dplyr) library(terra)
Decide where on your computer you would like to store the inputs and outputs of your project (this will be your main directory). Use an existing one or use dir.create() to create your main directory. Then specify whether or not to include folders for projections, calibration areas, algorithms, ensembles, and thresholds. For more details see Vignette 01_pre_modeling
In this tutorial, we will be using the "spp" example dataset which includes pr_ab (presence = 1, and absence = 0), location (x, y) data for 3 plant species found in California and a raster with environmental data. You can load these data into your local R environment by using the code below:
data("spp") somevar <- system.file("external/somevar.tif", package = "flexsdm") somevar <- terra::rast(somevar)
If you want to replace the spp dataset with your own data, make sure that contains coordinates, species presence = 1 / absence = 0 and a raster with environmental data.
First, prepare the occurrences, environmental conditions and partitions
# Select only one species some_sp <- spp %>% filter(species == "sp3") # Extract the environmental condition from the rsater for sp3 some_sp <- sdm_extract( data = some_sp, x = "x", y = "y", env_layer = somevar ) # Make a partition defining the method, folds and replicates some_sp <- part_random( data = some_sp, pr_ab = "pr_ab", method = c(method = "rep_kfold", folds = 3, replicates = 5) )
Next, fit different models
# Fit and validate a [generalized linear model](https://sjevelazco.github.io/flexsdm/reference/fit_glm.html) mglm <- fit_glm( data = some_sp, response = "pr_ab", predictors = c("CFP_1", "CFP_2", "CFP_3", "CFP_4"), partition = ".part", poly = 2 ) # Fit and validate a [random forest model](https://sjevelazco.github.io/flexsdm/reference/fit_raf.html) mraf <- fit_raf( data = some_sp, response = "pr_ab", predictors = c("CFP_1", "CFP_2", "CFP_3", "CFP_4"), partition = ".part", ) # Fit and validate a [general boosted regression model](https://sjevelazco.github.io/flexsdm/reference/fit_gbm.html) mgbm <- fit_gbm( data = some_sp, response = "pr_ab", predictors = c("CFP_1", "CFP_2", "CFP_3", "CFP_4"), partition = ".part" )
# Fit and ensemble the models. To choose the arguments that best fit your own data, see all options available in [fit_ensemble](https://sjevelazco.github.io/flexsdm/reference/fit_ensemble.html) mensemble <- fit_ensemble( models = list(mglm, mraf, mgbm), ens_method = "meansup", thr = NULL, thr_model = "max_sens_spec", metric = "TSS" )
You can also fit a model using an Ensembles of Small Models approach. In this example, we fit without threshold specification and with k-fold cross-validation.
msmall <- esm_gam( data = some_sp, response = "pr_ab", predictors = c("CFP_1", "CFP_2", "CFP_3", "CFP_4"), partition = ".part", thr = NULL )
Finally, we can predict different kinds of models from the same data (some_sp). sdm_predict can be used for predicting one or more models fitted with fit_ or tune_ functions. The output will be a list of SpatRaster with continuous and/or binary predictions.
# Predict using a single model, which is an mglm model in this example, # and a threshold type for binary predictions ind_p <- sdm_predict( models = mglm, pred = somevar, thr = "max_fpb", con_thr = FALSE, predict_area = NULL ) # Inspect the object. It's a SpatRaster with 2 layers: glm, max_fpb # These are the continuous and binary prediction from the model ind_p # Plot to see this layers ind_p_rst <- terra::rast(ind_p) plot(ind_p_rst) # Predict a list of more than one model, specifying a threshold type list_p <- sdm_predict( models = list(mglm, mraf, mgbm), pred = somevar, thr = "max_fpb", con_thr = FALSE, predict_area = NULL ) # Inspect the object. It's a list with 3 SpatRaster, one for each model, # each of which contains 2 layers, for the continuous and thresholded binary predictions. list_p # Plot to see this layers list_p_rst <- terra::rast(list_p) plot(list_p_rst) # Predict an ensemble model. This is only possible using one fit_ensemble object. It's not possible to include e.g., list(fit_ensemble1, fit_ensemble2) in the model argument. ensemble_p <- sdm_predict( models = mensemble, pred = somevar, thr = "max_fpb", con_thr = FALSE, predict_area = NULL ) # Inspect the object. It's a SpatRaster with 2 layers, mensemble and max_fpb # These are the continuous and binary prediction from the ensemble model ensemble_p # Plot to see this layers ensemble_p_rst <- terra::rast(ensemble_p) plot(ensemble_p_rst) # Predict an ensembles of small models. small_p <- sdm_predict( models = msmall, pred = somevar, thr = "max_fpb", con_thr = FALSE, predict_area = NULL ) # Inspect the object It's a SpatRaster with 2 layers, msmall and max_fpb # These are the continuous and binary prediction from the ESM model small_p # Plot to see this layers small_p_rst <- terra::rast(small_p) plot(small_p_rst)
This function combines model performance tables for all input models. This function requires a list of one or more models fitted with fit_ or tune_ functions, a fit_ensemble output, or a esm_ family function output.
# Load abies data data(abies) abies # We will partition the data with the k-fold method abies2 <- part_random( data = abies, pr_ab = "pr_ab", method = c(method = "kfold", folds = 5) )
Build some models to use for the performance table merge
# Build a generalized additive model, and a generalized linear model using fit_ family functions gam_t1 <- fit_gam( data = abies2, response = "pr_ab", predictors = c("aet", "ppt_jja", "pH", "awc", "depth"), predictors_f = c("landform"), partition = ".part", thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen") ) glm_t1 <- fit_glm( data = abies2, response = "pr_ab", predictors = c("aet", "ppt_jja", "pH", "awc", "depth"), predictors_f = c("landform"), partition = ".part", thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen"), poly = 0, inter_order = 0 ) # Build a tuned model using tune_ family functions # Prepare the grid object to use in grid argument tune_grid <- expand.grid( mtry = seq(1, 7, 1), ntree = seq(300, 1500, 200) ) # Build a tuned random forest model rf_t1 <- tune_raf( data = abies2, response = "pr_ab", predictors = c( "aet", "cwd", "tmin", "ppt_djf", "ppt_jja", "pH", "awc", "depth" ), predictors_f = c("landform"), partition = ".part", grid = tune_grid, thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen"), metric = "TSS", ) rf_t1$performance # Note in rf_t1$performance the best model was the one with mtry = 3 and threshold = 'max_sens_spec'
Finally, merge the three sdm performance tables.
merge_df <- sdm_summarize(models = list(gam_t1, glm_t1, rf_t1)) merge_df
This function is useful for calculating projected suitability values between two time periods with simple interpolation using two raster objects with suitability values. This is useful if, for example, a SDM has been projected to a future or past time period (using maps of predictor variables, such as climate variables, for different time periods), and the user requires an estimate of suitability for intermediate time periods. For example this may be needed for input into other types of models for risk analysis.
This function returns a SpatRaster if dir_save is used as NULL. However, if the user specifies dir_save, the function will save the interpolated raster files in a given directory.
library(terra) library(dplyr) f <- system.file("external/suit_time_step.tif", package = "flexsdm") abma <- terra::rast(f) plot(abma)
This function will create an object of interpolated values with n annual layers ranging from the initial to final year. The resolution and dimensions of the result object will remain the same as the initial and final maps. In this example, there are nine annual (2011-2019) interpolated maps generated from the initial (2010) and final (2020) prediction maps. A cell with a starting value of 1 and and ending value of 0 would be changed in increments of (1-0)/((2020-2010)-1), and given interpolated values of 0.9, 0.8, 0.7...0.1
int <- interp( r1 = abma[[1]], # set the raster of initial year r2 = abma[[2]], # set the raster of final year y1 = 2010, # set the numeric initial year y2 = 2020, # set the numeric final year rastername = "Abies", dir_save = NULL ) # Layers in the abma SpatRaster names(abma) # plot(abma) # Layers in the int SpatRaster int plot(int)
This function measures the extent of model extrapolation by comparing data used for modeling calibration and the area for model projection using the approach proposed by Velazco et al., in prep.
library(dplyr) library(terra) data(spp) f <- system.file("external/somevar.tif", package = "flexsdm") somevar <- terra::rast(f) # Inspect the unique values for species spp$species %>% unique() # Subset spp data into a tibble only with coordinates for sp3 and pr_ab == 1 sp <- spp %>% dplyr::filter(species == "sp3", pr_ab == 1) %>% dplyr::select(x, y) # Define accessible area for sp3 based on a buffer with around each point that is related to dispersal ability or some other ecological criterion ca <- calib_area( sp, x = "x", y = "y", method = c("buffer", width = 30000), crs = crs(somevar) ) # Plot the SpatRaster, occurrences and accessible area plot(somevar$CFP_1) points(sp) plot(ca, add = T)
The accessible area defines calibration area used to extract the environmental conditions
somevar_ca <- somevar %>% crop(., ca) %>% mask(., ca) # Plot environmental conditions of the calibration area plot(somevar_ca) xp <- extra_eval( training_data = somevar_ca, projection_data = somevar, n_cores = 1, aggreg_factor = 3 ) # Plot the SpatRaster object with the extrapolation values measured in percentage plot(xp)
This function creates geographical predictor variables that, together with environmental variables, can be used to construct constrained species distribution models. This function returns a SpatRaster object, which have to be used together with environmental variables to construct species distribution models. The 'xy' approach creates a single pair of raster layers that can be used for all species that share the same study region. Otherwise, 'cml', 'min', and 'ker' create a species-specific raster layer.
library(dplyr) library(terra) data("spp") somevar <- system.file("external/somevar.tif", package = "flexsdm") somevar <- terra::rast(somevar) # Select the presences of one species (sp3) occ <- spp %>% dplyr::filter(species == "sp3", pr_ab == 1) # Select a raster layer to be used as a basic raster a_variable <- somevar[[1]] plot(a_variable) points(occ %>% dplyr::select(x, y))
Next, use different methods according to your data.
# Use xy method m_xy <- msdm_priori( data = occ, x = "x", y = "y", method = "xy", env_layer = a_variable ) plot(m_xy) # Explore the object. This method assumes that spatial structure can partially explain species distribution (Bahn & Mcgill, 2007). Therefore, the result are two raster layers containing the latitude and longitude of pixels, respectively. This method could be used for all species set that share the same study area region. m_xy
m_cml <- msdm_priori( data = occ, x = "x", y = "y", method = "cml", env_layer = a_variable ) plot(m_cml) # Explore the object. This method assumes that pixels closer to presences are likely included in species distributions. The results is a raster layer containing the sum of euclidean geographic distances from each pixel to all occurrences of a species. m_cml
These methods are designed to reduce overprediction of species distribution models based on an a posteriori method (see Mendes et al 2020), i.e., the combination of the patterns of species occurrences and predicted suitability.
First, prepare the data
library(dplyr) library(terra) data("spp") somevar <- system.file("external/somevar.tif", package = "flexsdm") somevar <- terra::rast(somevar) # Prepare data for modeling a species set.seed(10) occ <- spp %>% dplyr::filter(species == "sp2") %>% # filter using only sp2 sdm_extract( data = ., x = "x", y = "y", env_layer = somevar, filter_na = TRUE ) %>% # extract variables values from the raster layer part_random(., pr_ab = "pr_ab", method = c(method = "kfold", folds = 10) ) # add columns with partition
Next, we fit and predict a model
m_glm <- fit_glm( data = occ, response = "pr_ab", predictors = names(somevar), partition = ".part", thr = "equal_sens_spec", )
Next, let's predict this model and plot it in a map
# Predict this model m_pred <- sdm_predict(models = m_glm, pred = somevar, thr = NULL, con_thr = FALSE) # Predicting individual models plot(m_pred[[1]])
Finally, perform correction to avoid models overpredictions.
# Using mcp method. The Minimum Convex Polygon (mcp) method excludes from SDMs climate suitable pixels that do not intercept a minimum convex polygon, with interior angles smaller than 180, enclosing all occurrences of a species. m_mcp <- msdm_posteriori( records = occ, x = "x", y = "y", pr_ab = "pr_ab", method = "mcp", cont_suit = m_pred[[1]], thr = "equal_sens_spec", buffer = NULL ) plot(m_mcp) # Using bmcp method. The Buffered Minimum Convex Polygon (bmcp) method is similar to the 'mcp' except by the inclusion of a buffer zone surrounding minimum convex polygons. m_bmcp <- msdm_posteriori( records = occ, x = "x", y = "y", pr_ab = "pr_ab", method = "bmcp", cont_suit = m_pred[[1]], thr = "equal_sens_spec", buffer = 30000, crs = crs(m_pred[[1]]) ) plot(m_bmcp) # Using obr method. The Occurrences Based Restriction (obr) method assumes that suitable patches intercepting species occurrences are more likely a part of species distributions than suitable patches that do not intercept any occurrence. m_obr <- msdm_posteriori( records = occ, x = "x", y = "y", pr_ab = "pr_ab", method = "obr", cont_suit = m_pred[[1]], thr = "equal_sens_spec", buffer = NULL ) plot(m_obr) # Using pres method. The only occurrences based restriction (pres) method only retains those pixels in suitability patches intercepting occurrences. m_pres <- msdm_posteriori( records = occ, x = "x", y = "y", pr_ab = "pr_ab", method = "pres", cont_suit = m_pred[[1]], thr = "equal_sens_spec", buffer = NULL ) plot(m_pres) # Using lq method. The Lower Quantile (lq) method works whenever a suitable pixel is within a k patch, i.e., not within this lower quartile, the suitability of the pixel is reduced to zero. This means that 75% of k patches were withdrawn from the model. m_lq <- msdm_posteriori( records = occ, x = "x", y = "y", pr_ab = "pr_ab", method = "lq", cont_suit = m_pred[[1]], thr = "equal_sens_spec", buffer = NULL ) plot(m_lq)
Vignette still under construction and changes
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.