model_fit: Ecological niche models fit, prediction, projection and...

Description Usage Arguments Details Value References See Also Examples

Description

do_any reads the output from setup_sdmdata and computes ecological niche models for a species based on an algorithm specified by the user. It fits the model, predicts it into the current environmental layers and calculates basic statistics for model evaluation. In addition to commonly adopted metrics such as AUC and TSS, this package also calculates partial ROC \insertCitepeterson_rethinking_2008,cobos_kuenm_2019modleR. For details on model evaluation see \insertCitephillips_maximum_2006;textualmodleR and \insertCitepeterson_ecological_2011;textualmodleR. do_any performs one algorithm at a time. do_many runs internally do_any and can be used to run multiple algorithms at a time. Given that there are "no silver bullets in correlative ecological niche modeling" \insertCiteqiao_no_2015modleR the choice of which algorithm to run is on the user. See Details for a description of how each algorithm supported in this package is implemented.

Usage

1
2
3
4
5
6
7
8
9
do_any(species_name, predictors, models_dir = "./models",
  algorithm = c("bioclim"), project_model = FALSE,
  proj_data_folder = "./data/proj", mask = NULL, write_png = FALSE,
  write_bin_cut = FALSE, dismo_threshold = "spec_sens",
  conf_mat = TRUE, equalize = TRUE, proc_threshold = 0.5, ...)

do_many(species_name, bioclim = FALSE, domain = FALSE, glm = FALSE,
  mahal = FALSE, maxent = FALSE, maxnet = FALSE, rf = FALSE,
  svmk = FALSE, svme = FALSE, brt = FALSE, ...)

Arguments

species_name

A character string with the species name. Because species name will be used as a directory name, avoid non-ASCII characters, spaces and punctuation marks. Recommendation is to adopt "Genus_species" format. See names in example_occs as an example

predictors

A Raster or RasterStack object with the environmental raster layers

models_dir

Folder path to save the output files. Defaults to "./models"

algorithm

Character string of length 1 specifying the algorithm to be fit: "bioclim", "brt", "domain", "glm", "maxent", "mahal", "svme", "svmk", "rf"

project_model

Logical, whether to perform a projection

proj_data_folder

Path to directory with projections containing one or more folders with the projection datasets (e.g. "./env/proj/proj1"). Projection diretctory should only contain raster files corresponding to the environmental variables. If more than one projection, each projection should be at one directory (e.g. "./env/proj/proj1" and "./env/proj/proj2") and equivalent raster files at diferent subdirectories must have the same names (e.g. "./env/proj/proj1/layer1.asc" and "./env/proj/proj2/layer1.asc")

mask

A SpatialPolygonsDataFrame to be used to mask the models. This mask can be used if the final area of interest is smaller than the area used for model fitting, to save disk space

write_png

Logical, whether png files will be written

write_bin_cut

Logical, whether binary and cut model files(.tif, .png) should be written

dismo_threshold

Character string indicating threshold (cut-off) to transform model predictions to a binary score as in threshold: "kappa", "spec_sens", "no_omission", "prevalence", "equal_sens_spec", "sensitivity". Default value is "spec_sens"

conf_mat

Logical, whether confusion tables should be written in the HD

equalize

Logical, whether the number of presences and absences should be equalized in randomForest and brt

proc_threshold

Numeric, value from 0 to 100 that will be used as (E) for partialROC calculations in kuenm_proc. Default is proc_threshold = 5

...

Other arguments from kuenm_proc

bioclim

Execute bioclim algorithm from the dismo implementation with bioclim function

domain

Execute domain from the dismo implementation with domain function

glm

Execute GLM as suggested by the dismo documentation with glm and step

mahal

Execute Mahalanobis distance from the dismo implementation with mahal

maxent

Execute Maxent algorithm from the dismo implementation with maxent function

maxnet

Execute Maxent algorithm from the maxnet implementation with maxnet function

rf

Execute Random forest algorithm from randomForest package with function tuneRF as suggested by the dismo documentation

svmk

Execute Support Vector Machines (SVM) algorithm from kernlab package with ksvm function

svme

Execute Support Vector Machines (SVM) algorithm from e1071 package with best.tune function

brt

Execute Boosted Regression Trees with gbm.step from dismo

Details

See below for a description on the implementation of the algorithms supported in this package.

Bioclim

Specified by algo = "bioclim" uses bioclim function in dismo package \insertCitehijmans_dismo_2017modleR. Bioclim is the climate-envelope-model implemented by Henry Nix \insertCitenix_biogeographic_1986modleR, the first species distribution modelling package. It is based on climate interpolation methods and despite its limitations it is still used in ecological niche modeling, specially for exploration and teaching purposes \[email protected] also @booth_bioclim_2014modleR. In this package it is implemented by the function bioclim, evaluated and predicted using evaluate and predict also from dismo package.

Boosted Regression Trees (BRT)

Specified by algo = "brt", it uses gbm.step function from dismo package. Runs the cross-validation procedure of \insertCitehastie_elements_2001;textualmodleR \[email protected] also @elith_working_2009modleR. It consists in a regression modeling technique combined with the boosting method, a method for combining many simple models. It is implemented by the function gbm.step as a regression with the response variable set to Bernoulli distribution, evaluated and predicted using evaluate and predict from dismo package.

Domain

Specified by algo = "domain" uses domain function from dismo package. Computes point-to-point similarity based on Gower distance between environmental variables \insertCitecarpenter_domain_1993modleR. \insertCitehijmans_dismo_2017modleR state that one should use it with caution because it does not perform well compared to other algorithms \insertCiteelith_novel_2006,hijmans_ability_2006modleR. We add that it is a slow algorithm. In this package it is implemented by the function domain, evaluated and predicted using evaluate and predict also from dismo package.

Generalized Linear Model (GLM)

Specified by algo = "glm" runs a GLM with modeling presences and absences as a response variable following a binomial error distribution. It runs a step-wise model selection based on AIC both backward and forward considering all possible combinations of predictor variables in the RasterStack. In this package it is implemented using functions glm and step to fit a model and choose a model by AIC in a stepwise procedure. Model is evaluated and predicted using evaluate function from dismo and predict function from raster package both with argument type = "response" to return values in the scale of the response variable.

Mahalanobis

Specified by algo = "mahal" uses mahal function from dismo package. Corresponds to a distribution model based on Mahalanobis distance, a measure of the distance between a point P and a distribution D \insertCitemahalanobis_generalized_1936modleR. In this package it is implemented by the function mahal, evaluated and predicted using evaluate and predict also from dismo package.

Maximum Entropy (Maxent)

Specified either by algo = "maxent" or algo = "maxnet" corresponding to implementation by dismo \insertCitehijmans_dismo_2017modleR and maxnet \insertCitephillips_maxnet_2017modleR packages respectively. Maxent is a machine learning method for modeling species distributions based in incomplete data allowing ENM with presence-only data \insertCitephillips_maximum_2006modleR. If algo = "maxent" model is fit by the function maxent, evaluated and predicted using evaluate and predict also in dismo package. If algo = "maxnet" model is fit by the function maxnet from maxnet package, evaluated using evaluate from dismo package with argument type = "logistic" and predicted using predict function from raster package.

Random Forest

Specified by algo = "rf" uses tuneRF function from randomForest package \insertCiteliaw_classification_2002modleR. Corresponds to machine learning regression based on decision trees. In this package uses tuneRF function with the optimal number of variables available for splitting at each tree node (i.e. mtry) found as set by parameter doBest = TRUE. Random Forest model is evaluated with evaluate function from dismo and predicted with predict function from raster package.

Support Vector Machines (SVM)

Specified either by algo = "svme" or algo = "svmk" corresponding to implementation on e1071 \insertCitemeyer_e1071_2017modleR and kernlab \insertCitekaratzoglou_kernlab_2004modleR packages respectively. SVM are supervised learning models that use learning algorithms for classification and regression analysis. In e1071 package SVM is implemented through function best.tune with method set to "svm" which uses RBF-kernel (radial basis function kernel) for classification. In kernlab package SVM is implemented through function ksvm also with RBF-kernel method (in this case the default method "kbfdot"). We expect both implementations to differ only in performance. Both svme and svmk are evaluated with evaluate function from dismo and predicted with predict function from raster package.

Value

Returns a data frame with some key threshold values and evaluation statistics of each algorithm (omission, TSSmax, AUC, pROC etc.)

Writes on disk a .tif model for each partition of each algorithm

Writes in disk a .csv file with evaluation statistics of each algorithm

References

\insertAllCited

See Also

bioclim in dismo package

domain in dismo package

do_many

evaluate in dismo package

maxent in dismo package

maxnet in maxnet package

mahal in dismo package

predict in dismo package

predict in raster package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# run setup_sdmdata first from one species in example_occs data
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
                          occurrences = sp_coord,
                          predictors = example_vars)

# run bioclim for one species
sp_any <- do_any(species_name = sp,
                 predictors = example_vars,
                 algorithm = "bioclim")

# run do_many
sp_many <- do_many(species_name = sp,
                   predictors = example_vars,
                   bioclim = TRUE,
                   maxnet = TRUE)

Model-R/modleR documentation built on Dec. 3, 2019, 4:54 p.m.