model_fit: Ecological niche models fit, prediction, projection and...
In Model-R/modleR: A Workflow for Ecological Niche Models

do_any

R Documentation

Ecological niche models fit, prediction, projection and evaluation using one or several algorithms

Description

do_any reads the output from setup_sdmdata and computes ecological niche models for a species based on an algorithm specified by the user. It fits the model, predicts it into the current environmental layers and calculates basic statistics for model evaluation. In addition to commonly adopted metrics such as AUC and TSS, this package also calculates partial ROC \insertCitepeterson_rethinking_2008,cobos_kuenm_2019modleR. For details on model evaluation see \insertCitephillips_maximum_2006;textualmodleR and \insertCitepeterson_ecological_2011;textualmodleR. do_any performs one algorithm at a time. do_many runs internally do_any and can be used to run multiple algorithms at a time. Given that there are "no silver bullets in correlative ecological niche modeling" \insertCiteqiao_no_2015modleR the choice of which algorithm to run is on the user. See Details for a description of how each algorithm supported in this package is implemented.

Usage

do_any(species_name, predictors, models_dir = "./models",
  algorithm = c("bioclim"), project_model = FALSE,
  proj_data_folder = "./data/proj", mask = NULL, write_rda = FALSE,
  png_partitions = FALSE, write_bin_cut = FALSE,
  dismo_threshold = "spec_sens", equalize = TRUE, sensitivity = 0.9,
  proc_threshold = 0.5, ...)

do_many(species_name, bioclim = FALSE, domain = FALSE, glm = FALSE,
  mahal = FALSE, maxent = FALSE, maxnet = FALSE, rf = FALSE,
  svmk = FALSE, svme = FALSE, brt = FALSE, ...)

Arguments

`species_name`	A character string with the species name. Because species name will be used as a directory name, avoid non-ASCII characters, spaces and punctuation marks. Recommendation is to adopt "Genus_species" format. See names in `example_occs` as an example
`predictors`	A Raster or RasterStack object with the environmental raster layers
`models_dir`	Folder path to save the output files. Defaults to "`./models`"
`algorithm`	Character string of length 1 specifying the algorithm to be fit: "`bioclim`", "`brt`", "`domain`", "`glm`", "`maxent`", "`maxnet`", "`mahal`", "`svme`", "`svmk`", "`rf`"
`project_model`	Logical, whether to project the models to variable sets in `proj_data_folder` directory
`proj_data_folder`	Path to directory with projections containing one or more folders with the projection datasets (e.g. "./env/proj/proj1"). This directory should only contain raster files corresponding to the environmental variables. If more than one projection, each projection should be at one directory (e.g. "./env/proj/proj1" and "./env/proj/proj2") and equivalent raster files at diferent subdirectories must have the same names (e.g. "./env/proj/proj1/layer1.asc" and "./env/proj/proj2/layer1.asc")
`mask`	A SpatialPolygonsDataFrame to be used to mask the models. This mask can be used if the final area of interest is smaller than the area used for model fitting, to save disk space
`write_rda`	Logical, whether .rda objects with the fitted models will be written
`png_partitions`	Logical, whether png files will be written
`write_bin_cut`	Logical, whether binary and cut model files(.tif, .png) should be written
`dismo_threshold`	Character string indicating threshold (cut-off) to transform model predictions to a binary score as in `threshold`: "`kappa`", "`spec_sens`", "`no_omission`", "`prevalence`", "`equal_sens_spec`", "`sensitivity`". Default value is "`spec_sens`"
`equalize`	Logical, whether the number of presences and absences should be equalized in randomForest and brt
`sensitivity`	Numeric, value from 0 to 0.9 to indicate the sensitivity value to calculate the threshold. Defaults to 0.9 as in dismo package
`proc_threshold`	Numeric, value from 0 to 100 that will be used as (E) for partialROC calculations in `kuenm_proc`. Default is `proc_threshold = 5`
`...`	Other arguments from `kuenm_proc`
`bioclim`	Execute bioclim algorithm from the dismo implementation with `bioclim` function
`domain`	Execute domain from the dismo implementation with `domain` function
`glm`	Execute GLM as suggested by the dismo documentation with `glm` and `step`
`mahal`	Execute Mahalanobis distance from the dismo implementation with `mahal`
`maxent`	Execute Maxent algorithm from the dismo implementation with `maxent` function
`maxnet`	Execute Maxent algorithm from the maxnet implementation with `maxnet` function
`rf`	Execute Random forest algorithm from randomForest package with function `tuneRF` as suggested by the dismo documentation
`svmk`	Execute Support Vector Machines (SVM) algorithm from kernlab package with `ksvm` function
`svme`	Execute Support Vector Machines (SVM) algorithm from e1071 package with `best.tune` function
`brt`	Execute Boosted Regression Trees with `gbm.step` from dismo

Details

See below for a description on the implementation of the algorithms supported in this package.

Bioclim: Specified by algo = "bioclim" uses bioclim function in dismo package \insertCitehijmans_dismo_2017modleR. Bioclim is the climate-envelope-model implemented by Henry Nix \insertCitenix_biogeographic_1986modleR, the first species distribution modelling package. It is based on climate interpolation methods and despite its limitations it is still used in ecological niche modeling, specially for exploration and teaching purposes \insertCite@see also @booth_bioclim_2014modleR. In this package it is implemented by the function bioclim, evaluated and predicted using evaluate and predict also from dismo package.
Boosted Regression Trees (BRT): Specified by algo = "brt", it uses gbm.step function from dismo package. Runs the cross-validation procedure of \insertCitehastie_elements_2001;textualmodleR \insertCite@see also @elith_working_2009modleR. It consists in a regression modeling technique combined with the boosting method, a method for combining many simple models. It is implemented by the function gbm.step as a regression with the response variable set to Bernoulli distribution, evaluated and predicted using evaluate and predict from dismo package.
Domain: Specified by algo = "domain" uses domain function from dismo package. Computes point-to-point similarity based on Gower distance between environmental variables \insertCitecarpenter_domain_1993modleR. \insertCitehijmans_dismo_2017modleR state that one should use it with caution because it does not perform well compared to other algorithms \insertCiteelith_novel_2006,hijmans_ability_2006modleR. We add that it is a slow algorithm. In this package it is implemented by the function domain, evaluated and predicted using evaluate and predict also from dismo package.
Generalized Linear Model (GLM): Specified by algo = "glm" runs a GLM with modeling presences and absences as a response variable following a binomial error distribution. It runs a step-wise model selection based on AIC both backward and forward considering all possible combinations of predictor variables in the RasterStack. In this package it is implemented using functions glm and step to fit a model and choose a model by AIC in a stepwise procedure. Model is evaluated and predicted using evaluate function from dismo and predict function from raster package both with argument type = "response" to return values in the scale of the response variable.
Mahalanobis: Specified by algo = "mahal" uses mahal function from dismo package. Corresponds to a distribution model based on Mahalanobis distance, a measure of the distance between a point P and a distribution D \insertCitemahalanobis_generalized_1936modleR. In this package it is implemented by the function mahal, evaluated and predicted using evaluate and predict also from dismo package.
Maximum Entropy (Maxent): Specified either by algo = "maxent" or algo = "maxnet" corresponding to implementation by dismo \insertCitehijmans_dismo_2017modleR and maxnet \insertCitephillips_maxnet_2017modleR packages respectively. Maxent is a machine learning method for modeling species distributions based in incomplete data allowing ENM with presence-only data \insertCitephillips_maximum_2006modleR. If algo = "maxent" model is fit by the function maxent, evaluated and predicted using evaluate and predict also in dismo package. If algo = "maxnet" model is fit by the function maxnet from maxnet package, evaluated using evaluate from dismo package with argument type = "logistic" and predicted using predict function from raster package.
Random Forest: Specified by algo = "rf" uses tuneRF function from randomForest package \insertCiteliaw_classification_2002modleR. Corresponds to machine learning regression based on decision trees. In this package uses tuneRF function with the optimal number of variables available for splitting at each tree node (i.e. mtry) found as set by parameter doBest = TRUE. Random Forest model is evaluated with evaluate function from dismo and predicted with predict function from raster package.
Support Vector Machines (SVM): Specified either by algo = "svme" or algo = "svmk" corresponding to implementation on e1071 \insertCitemeyer_e1071_2017modleR and kernlab \insertCitekaratzoglou_kernlab_2004modleR packages respectively. SVM are supervised learning models that use learning algorithms for classification and regression analysis. In e1071 package SVM is implemented through function best.tune with method set to "svm" which uses RBF-kernel (radial basis function kernel) for classification. In kernlab package SVM is implemented through function ksvm also with RBF-kernel method (in this case the default method "kbfdot"). We expect both implementations to differ only in performance. Both svme and svmk are evaluated with evaluate function from dismo and predicted with predict function from raster package.

Value

Returns a data frame with some key threshold values and evaluation statistics of each algorithm (FNR, FPR, TSSmax, AUC, pROC, FScore, Jaccard dissimilarity etc.) for the selected threshold

Writes on disk a .tif model for each partition of each algorithm

Writes in disk a .csv file with thresholds and evaluation statistics of each algorithm for a given threshold #' @return Writes in disk a .csv file with evaluation statistics for all threshold values

References

\insertAllCited

Examples

## Not run: 
# run setup_sdmdata first from one species in example_occs data
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
                          occurrences = sp_coord,
                          predictors = example_vars,
                          clean_uni = TRUE)

# run bioclim for one species
sp_any <- do_any(species_name = sp,
                 predictors = example_vars,
                 algorithm = "bioclim")

# run do_many
sp_many <- do_many(species_name = sp,
                   predictors = example_vars,
                   bioclim = TRUE)
                   
## End(Not run)

Model-R/modleR documentation built on Aug. 24, 2023, 6:50 p.m.