do_any | R Documentation |
do_any
reads the output from setup_sdmdata
and
computes ecological niche models for a species based on an algorithm
specified by the user. It fits the model, predicts it into the current
environmental layers and calculates basic statistics for model evaluation. In
addition to commonly adopted metrics such as AUC and TSS, this package also
calculates partial ROC
\insertCitepeterson_rethinking_2008,cobos_kuenm_2019modleR. For details
on model evaluation see
\insertCitephillips_maximum_2006;textualmodleR and
\insertCitepeterson_ecological_2011;textualmodleR. do_any
performs one algorithm at a time. do_many
runs internally
do_any
and can be used to run multiple algorithms at a time.
Given that there are "no silver bullets in correlative ecological
niche modeling" \insertCiteqiao_no_2015modleR the choice of which
algorithm to run is on the user. See Details for a description of
how each algorithm supported in this package is implemented.
do_any(species_name, predictors, models_dir = "./models",
algorithm = c("bioclim"), project_model = FALSE,
proj_data_folder = "./data/proj", mask = NULL, write_rda = FALSE,
png_partitions = FALSE, write_bin_cut = FALSE,
dismo_threshold = "spec_sens", equalize = TRUE, sensitivity = 0.9,
proc_threshold = 0.5, ...)
do_many(species_name, bioclim = FALSE, domain = FALSE, glm = FALSE,
mahal = FALSE, maxent = FALSE, maxnet = FALSE, rf = FALSE,
svmk = FALSE, svme = FALSE, brt = FALSE, ...)
species_name |
A character string with the species name. Because species
name will be used as a directory name, avoid non-ASCII characters, spaces and
punctuation marks.
Recommendation is to adopt "Genus_species" format. See names in
|
predictors |
A Raster or RasterStack object with the environmental raster layers |
models_dir |
Folder path to save the output files. Defaults to
" |
algorithm |
Character string of length 1 specifying the algorithm to
be fit: " |
project_model |
Logical, whether to project the models to variable sets
in |
proj_data_folder |
Path to directory with projections containing one or more folders with the projection datasets (e.g. "./env/proj/proj1"). This directory should only contain raster files corresponding to the environmental variables. If more than one projection, each projection should be at one directory (e.g. "./env/proj/proj1" and "./env/proj/proj2") and equivalent raster files at diferent subdirectories must have the same names (e.g. "./env/proj/proj1/layer1.asc" and "./env/proj/proj2/layer1.asc") |
mask |
A SpatialPolygonsDataFrame to be used to mask the models. This mask can be used if the final area of interest is smaller than the area used for model fitting, to save disk space |
write_rda |
Logical, whether .rda objects with the fitted models will be written |
png_partitions |
Logical, whether png files will be written |
write_bin_cut |
Logical, whether binary and cut model files(.tif, .png) should be written |
dismo_threshold |
Character string indicating threshold (cut-off) to
transform model predictions to a binary score as in
|
equalize |
Logical, whether the number of presences and absences should be equalized in randomForest and brt |
sensitivity |
Numeric, value from 0 to 0.9 to indicate the sensitivity value to calculate the threshold. Defaults to 0.9 as in dismo package |
proc_threshold |
Numeric, value from 0 to 100 that will be used as (E)
for partialROC calculations in |
... |
Other arguments from |
bioclim |
Execute bioclim algorithm from the dismo implementation
with |
domain |
Execute domain from the dismo implementation with
|
glm |
Execute GLM as suggested by the dismo documentation with
|
mahal |
Execute Mahalanobis distance from the dismo implementation
with |
maxent |
Execute Maxent algorithm from the dismo implementation
with |
maxnet |
Execute Maxent algorithm from the maxnet implementation
with |
rf |
Execute Random forest algorithm from randomForest package
with function |
svmk |
Execute Support Vector Machines (SVM) algorithm from
kernlab package with |
svme |
Execute Support Vector Machines (SVM) algorithm from e1071
package with |
brt |
Execute Boosted Regression Trees with
|
See below for a description on the implementation of the algorithms supported in this package.
Specified by algo = "bioclim"
uses bioclim
function in dismo package \insertCitehijmans_dismo_2017modleR.
Bioclim is the climate-envelope-model implemented by Henry Nix
\insertCitenix_biogeographic_1986modleR, the first species distribution
modelling package. It is based on climate interpolation methods and despite
its limitations it is still used in ecological niche modeling, specially for
exploration and teaching purposes
\insertCite@see also @booth_bioclim_2014modleR. In this package it is
implemented by the
function bioclim
, evaluated and predicted using
evaluate
and predict
also from
dismo package.
Specified by algo = "brt"
, it uses gbm.step
function from dismo package. Runs the cross-validation procedure of
\insertCitehastie_elements_2001;textualmodleR
\insertCite@see also @elith_working_2009modleR. It consists in a
regression modeling technique combined with the boosting method, a method for
combining many simple models. It is implemented by the function
gbm.step
as a regression with the response variable set
to Bernoulli distribution, evaluated and predicted using
evaluate
and predict
from
dismo package.
Specified by algo = "domain"
uses domain
function
from dismo package. Computes point-to-point similarity based on Gower
distance between environmental variables
\insertCitecarpenter_domain_1993modleR.
\insertCitehijmans_dismo_2017modleR state that one should use it with
caution because it does not perform well compared to other algorithms
\insertCiteelith_novel_2006,hijmans_ability_2006modleR. We add that it is
a slow algorithm. In this package it is implemented by the function
domain
, evaluated and predicted using
evaluate
and predict
also from
dismo package.
Specified by algo = "glm"
runs a GLM with modeling presences and
absences as a response variable following a binomial error distribution. It
runs a step-wise model selection based on AIC both backward and forward
considering all possible combinations of predictor variables in the
RasterStack. In this package it is implemented using functions glm
and
step
to fit a model and choose a model by AIC in a stepwise procedure.
Model is evaluated and predicted using evaluate
function from dismo and predict
function from
raster package both with argument type = "response"
to return
values in the scale of the response variable.
Specified by algo = "mahal"
uses mahal
function
from dismo package. Corresponds to a distribution model based on
Mahalanobis distance, a measure of the distance between a point P and a
distribution D \insertCitemahalanobis_generalized_1936modleR. In this
package it is implemented by the function mahal
,
evaluated and predicted using evaluate
and
predict
also from dismo package.
Specified either by algo = "maxent"
or algo = "maxnet"
corresponding to implementation by dismo
\insertCitehijmans_dismo_2017modleR and maxnet
\insertCitephillips_maxnet_2017modleR packages respectively. Maxent is a
machine learning method for modeling species distributions based in
incomplete data allowing ENM with presence-only data
\insertCitephillips_maximum_2006modleR. If algo = "maxent"
model
is fit by the function maxent
, evaluated and predicted
using evaluate
and predict
also in
dismo package. If algo = "maxnet"
model is fit by the function
maxnet
from maxnet package, evaluated using
evaluate
from dismo package with argument
type = "logistic"
and predicted using predict
function from raster package.
Specified by algo = "rf"
uses tuneRF
function from randomForest package
\insertCiteliaw_classification_2002modleR. Corresponds to machine
learning regression based on decision trees. In this package uses
tuneRF
function with the optimal number of
variables available for splitting at each tree node (i.e. mtry
) found
as set by parameter doBest = TRUE
. Random Forest model is evaluated
with evaluate
function from dismo and predicted
with predict
function from raster package.
Specified either by algo = "svme"
or algo = "svmk"
corresponding to implementation on e1071
\insertCitemeyer_e1071_2017modleR and kernlab
\insertCitekaratzoglou_kernlab_2004modleR packages respectively. SVM are
supervised learning models that use learning algorithms for classification
and regression analysis. In e1071 package SVM is implemented through
function best.tune
with method set to "svm
"
which uses RBF-kernel (radial basis function kernel) for classification. In
kernlab package SVM is implemented through function
ksvm
also with RBF-kernel method (in this case the
default method "kbfdot
"). We expect both implementations to differ
only in performance. Both svme
and svmk
are evaluated with
evaluate
function from dismo and predicted with
predict
function from raster package.
Returns a data frame with some key threshold values and evaluation statistics of each algorithm (FNR, FPR, TSSmax, AUC, pROC, FScore, Jaccard dissimilarity etc.) for the selected threshold
Writes on disk a .tif model for each partition of each algorithm
Writes in disk a .csv file with thresholds and evaluation statistics of each algorithm for a given threshold #' @return Writes in disk a .csv file with evaluation statistics for all threshold values
bioclim
in dismo package
domain
in dismo package
do_many
evaluate
in dismo package
maxent
in dismo package
maxnet
in maxnet package
mahal
in dismo package
predict
in dismo package
predict
in raster package
## Not run:
# run setup_sdmdata first from one species in example_occs data
sp <- names(example_occs)[1]
sp_coord <- example_occs[[1]]
sp_setup <- setup_sdmdata(species_name = sp,
occurrences = sp_coord,
predictors = example_vars,
clean_uni = TRUE)
# run bioclim for one species
sp_any <- do_any(species_name = sp,
predictors = example_vars,
algorithm = "bioclim")
# run do_many
sp_many <- do_many(species_name = sp,
predictors = example_vars,
bioclim = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.