In andrefaa/ENM_TheMetaLand: Create and process Ecological Niche, including several pre- and post-processing methods

The logic behind ENMTML

We structured ENMTML as a single function with multiple arguments, which, once filled, require a single Ctrl+R to fit, project, evaluate models and present them to users in a clear and simple way.

The main function (ENMTML) has several arguments, which user's need to specify according to their modeling needs.

As we know this is not an simple task, we indicate the papers which proposed those methods in our paper. Coupled with a better (but brief) explanation on those.

How to run?

ENMTML(pred_dir, 
       proj_dir = NULL, 
       result_dir = NULL,
       occ_file, 
       sp, 
       x, 
       y, 
       min_occ = 10,
       thin_occ = NULL, 
       eval_occ = NULL, 
       colin_var = NULL,
       imp_var = FALSE, 
       sp_accessible_area = NULL, 
       pseudoabs_method,
       pres_abs_ratio = 1, 
       part, save_part = FALSE, 
       save_final = TRUE,
       algorithm, 
       thr, 
       msdm = NULL, 
       ensemble = NULL,
       extrapolation = FALSE, 
       cores = 1)

See possible input options below

Function Arguments

pred_dir: character. Directory path with predictors (file formats supported are ASC, BILL, TIFF or TXT)
proj_dir: character. Directory path containing folders with predictors for different regions or time periods used to project models (file formats supported are ASC, BILL, TIFF, or TXT).
result_dir: character. Directory path with the folder in which model results will be recorded:
NULL: Results will be recorded in a default Result folder, at the same level as the pred_dir folder (usage
result_dir=NULL).
Simple name: A folder with the specified name will be created at the same level as the pred_dir folder (e.g. usage
result_dir="MyFolderName")
Complete path: A folder will be created at the specified path (e.g.
result_dir="C:/Users/mypc/Documents/MyFolderName").
occ_file: character. Directory path with the tab-delimited TXT file, which will contain at least three columns with information about species names, and the latitude and longitude of species occurrences.
sp: character. Name of the column with information about species names.
x: character. Name of the column with information about longitude.
y: character. Name of the column with information about latitude.
min_occ: integer. Minimum number of unique occurrences (species with less than this number will be excluded).
thin_occ: character. Perform spatial filtering (Thinning, based on spThin package) on the presences. For this augment it is necessary provide a vector in which its elements need to have the names 'method' or 'method' and 'distance' (more information below). Three thinning methods are available (default NULL):
MORAN Distance defined by Moran Variogram. Usage
thin_occ=c(method='MORAN').
CELLSIZE Distance defined by 2x cellsize (Haversine Transformation). Usage
thin_occ=c(method='CELLSIZE').
USER-DEFINED User defined distance. For this option it is necessary to provide a vector with two values. Usage
thin_occ=c(method='USER-DEFINED', ditance='300'). The second numeric value refers to the distance in km that will be used for thinning. So distance=300 means that all records within a radius of 300 km will be deleted.
eval_occ: character. Directory path with tab-delimited TXT file with species names, latitude and longitude, these three columns must have the same columns names than the database used in the occ_file argument. This external occurrence database will be used to external models validation (i.e., it will no be use to model fitting). (default NULL).
colin_var: character. Method to reduce variable collinearity:
PCA: Perform a Principal Component Analysis on predictors and use Principal Components as environmental variables. Usage
colin_var=c(method='PCA').
VIF: Variance Inflation Factor. Usage
colin_var=c(method='VIF').
PEARSON: Select variables by Pearson correlation, a threshold of maximum correlation must be specified by user. Usage
colin_var=c(method='PEARSON', threshold='0.7').
imp_var: logical. Perform variable importance and data for curves response for selected algorithms? (default FALSE)
sp_accessible_area: character. Restrict for each species the accessible area, i.e., the area used to model fitting. It is necessary to provide a vector for this argument. Three methods were implemented
BUFFER area used to model fitting delimited by a buffer with a width size equal to the maximum distance among pair of occurrences for each species. Usage
sp_accessible_area=c(method='BUFFER', type='1').
BUFFER area used to model fitting delimted by a buffer with a width size defined by the user in km. Note this width size of buffer will be used for all species. Usage
sp_accessible_area=c(method='BUFFER', type='2', width='300').
MASK this method consists in delimit the area used to model fitting based on the polygon where a species occurrences fall. For instance, it is possible delimit the calibration area based on ecoregion shapefile. For this option it is necessary inform the path to the file that will be used as mask. Next file format can be loaded '.bil', '.asc', '.tif', '.shp', and '.txt'. Usage
sp_accessible_area=c(method='MASK', filepath='C:/Users/mycomputer/ecoregion/olson.shp')..
pseudoabs_method: character. Pseudo-absence allocation method. It is necessary to provide a vector for this argument. Only one method can be chosen. The next methods are implemented:
RND: Random allocation of pseudo-absences throughout the area used for model fitting. Usage
pseudoabs_method=c(method='RND').
ENV_CONST: Pseudo-absences are environmentally constrained to a region with lower suitability values predicted by a Bioclim model. Usage pseudoabs_method=c(method='ENV_CONST'). Usage
pseudoabs_method=c(method='ENV_CONST').
GEO_CONST: Pseudo-absences are allocated far from occurrences based on a geographical buffer. For this method it is necessary provie a second value wich express the buffer width in km. Usage
pseudoabs_method=c(method='GEO_CONST', width='50').
GEO_ENV_CONST: Pseudo-absences are constrained environmentally (based on Bioclim model) but distributed geographically far from occurrences based on a geographical buffer. For this method it is necessary provide a second value which express the buffer width in km. Usage
pseudoabs_method=c(method='GEO_ENV_CONST', width='50')
GEO_ENV_KM_CONST: Pseudo-absences are constrained on a three-level procedure; it is similar to the GEO_ENV_CONST with an additional step which distributes the pseudo-absences in the environmental space using k-means cluster analysis. For this method it is necessary provide a second value which express the buffer width in km. Usage
pseudoabs_method=c(method='GEO_ENV_KM_CONST', width='50').
pres_abs_ratio: numeric. Presence-Absence ratio (values between 0 and 1).
part: character. Partition method for model's validation. Only one method can be chosen. It is necessary to provide a vector for this argument. The next methods are implemented:
BOOT: Random bootstrap partition. Usage
part=c(method='BOOT', replicates='2', proportion='0.7'). replicate refers to the number of replicates, it assumes a value >=1. proportion refers to the proportion of occurrences used for model fitting, it assumes a value >0 and <=1. In this example proportion='0.7' mean that 70% of data will be used for model training, while 30% for model testing.
KFOLD: Random partition in k-fold cross-validation. Usage
part=c(method= 'KFOLD', folds='5'). folds refers to the number of folds for data partitioning, it assumes value >=1.
BANDS: Geographic partition structured as bands arranged in a latitudinal way (type 1) or longitudinal way (type 2). Usage
part=c(method= 'BANDS', type='1'). type refers to the bands disposition.
BLOCK: Geographic partition structured as a checkerboard (a.k.a. block cross-validation). Usage
part=c(method= 'BLOCK').
save_part: logical. If TRUE, function will save .tif files of partial models, i.e. model created by each occurrence partitions. (default FALSE).
save_final: logical. If TRUE, function will save .tif files of the final model, i.e. fitted with all occurrences data. (default TRUE)
algorithm: character. Algorithm to construct ecological niche models (it is possible to use more than one method):
BIO: Bioclim
MAH: Mahalanobis
DOM: Domain
ENF: Ecological-Niche Factor Analysis
MXS: Maxent simple (only linear and quadratic features, based on MaxNet package)
MXD: Maxent default features (all features, based on MaxNet package)
SVM: Support Vector Machine
GLM: Generalized Linear Model
GAM: Generalized Additive Model
BRT: Boosted Regression Tree
RDF: Random Forest
MLK: Maximum Likelihood
GAU: Gaussian Process
Usage algorithm=c('BIO', 'SVM', 'GLM', 'GAM', 'GAU').
thr: character. Threshold used for presence-absence predictions. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument:
LPT: The highest threshold at which there is no omission. Usage
thr=c(type='LPT').
MAX_TSS: Threshold at which the sum of the sensitivity and specificity is the highest. Usage
thr=c(type='MAX_TSS').
MAX_KAPPA: The threshold at which kappa is the highest ("max kappa"). Usage
thr=c(type='MAX_KAPPA').
SENSITIVITY: A threshold value specified by user. Usage
thr=c(type='SENSITIVITY', sens='0.6'). 'sens' refers to models will be binarized using this suitability value. Note that this method assumes 'sens' value for all algorithm and species.
JACCARD: The threshold at which Jaccard is the highest. Usage
thr=c(type='JACCARD').
SORENSEN: The threshold at which Sorensen is highest. Usage
thr=c(type='SORENSEN').

In the case of use more than one threshold type it is necessary concatenate the names of threshold types, e.g., thr=c(type=c('LPT', 'MAX_TSS', 'JACCARD')). When SENSITIVITY threshold is used in combination with other it is necessary specify the desired sensitivity value, e.g., thr=c(type=c('LPT', 'MAX_TSS', 'SENSITIVITY'), sens='0.8').

msdm: character. Include spatial restrictions to model projection. These methods restrict ecological niche models in order to have less potential prediction and turn models closer to species distribution models. They are classified in 'a Priori' and 'a Posteriori' methods. The first one encompasses method that include geographical layers as predictor of models' fitting, whereas a Posteriori constrain models based on occurrence and suitability patterns. This argument is filled only with a method, in the case of use MCP-B method msdm is filled in a different way se below (default NULL):

a Priori methods (layer created area added as a predictor at moment of model fitting): + XY: Create two layers latitude and longitude layer. Usage
msdm=c(method='XY'). + MIN: Create a layer with information of the distance from each cell to the closest occurrence. Usage
msdm=c(method='MIN'). + CML: Create a layer with information of the summed distance from each cell to all occurrences. Usage
msdm=c(method='CML'). + KER: Create a layer with a Gaussian-Kernel on the occurrence data. Usage
msdm=c(method='KER').

a Posteriori methods: + OBR: Occurrence based restriction, uses the distance between points to exclude far suitable patches (Mendes et al., in prep). Usage
msdm=c(method='OBR'). + LR: Lower Quantile, select the nearest 25\% patches (Mendes et al., in prep). Usage
msdm=c(method='LR'). + PRES: Select only the patches with confirmed occurrence data (Mendes et al, in prep). Usage
msdm=c(method='PRES'). + MCP: Excludes suitable cells outside the Minimum Convex Polygon (MCP) built based on occurrences data. Usage
msdm=c(method='MCP'). + MCP-B: Creates a buffer (with a width size defined by user in km) around the MCP. Usage
msdm=c(method='MCP-B', width=100). In this case width=100 means that a buffer with 100km of width will be created around the MCP.

ensemble: character. Method used to ensemble different algorithms. It is possible to use more than one method. A vector must be provided for this argument. For SUP, W_MEAN or PCA_SUP method it is necessary provide an evaluation metric to ensemble arguments (i.e., AUC, Kappa, TSS, Jaccard, Sorensen or Fpb) see below. (default NULL):
MEAN: Simple average of the different models. Usage
ensemble=c(method='MEAN').
W_MEAN: Weighted average of models based on their performance. An evaluation metric must be provided. Usage
ensemble=c(method='W_MEAN', metric='TSS').
SUP: Average of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage
ensemble=c(method='SUP', metric='TSS').
PCA: Performs a Principal Component Analysis (PCA) and returns the first axis. Usage
ensemble=c(method='PCA').
PCA_SUP: PCA of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage
ensemble=c(method='PCA_SUP', metric='Fpb').
PCA_THR: PCA performed only with those cells with suitability values above the selected threshold. Usage
ensemble=c(method='PCA_THR').

In the case of use more than one ensemble method it is necessary concatenate the names of ensemble methods within the argument, e.g., ensemble=c(method=c('MEAN', 'PCA')), ensemble=c(method=c('MEAN, 'W_MEAN', 'PCA_SUP'), metric='Fpb').

extrapolation logical. If TRUE the function will calculate extrapolation based on Mobility-Oriented Parity analysis (MOP) for current conditions. If the argument proj_dir is used, the extrapolation layers for other regions or time periods will also be calculated (default FALSE).
cores numeric. Define the number of CPU cores to run modeling procedures in parallel (default 1).

What are my results?

Within the result_dir folder you will find several sub-folders: Algorithm, Ensemble(decision-based), Projection(decision-based), Extrapolation(decision-based), BLOCK(decision-based), Extent Masks(decision-based).

There are also some .txt files (some txt will only be created under ceratin modeling settings):
Evaluation_Table.txt Contains the results for model evaluation, with several metrics
InfoModeling.txt Information of the chosen modeling parameters
Number_Unique_Occurrences.txt Number of unique occurrences for each species
Occurrences_Cleaned.txt Dataset produced after selecting a single occurrence per grid-cell(unique occurrences)
Occurrences_Filtered.txt Datasets produced after occurrences were corrected for sampling spatial bias (thinned occurrences)
Thresholds_Algorithm.txt Information about the thresholds used to create the presence-absence maps for each algorithm (Presence-absence maps are created from the Threshold of complete models)
Thresholds_Ensemble.txt Information about the thresholds used to create the presence-absence maps for ensembled models
Moran_&_Mess Contains information about autocorrelation and environmental similatiry between the datasets used to fit and evaluate the model

CITATION:

Andrade, A.F.A., Velazco, S.J.E., De Marco Jr, P., 2020. ENMTML: An R package for a straightforward construction of complex ecological niche models. Environmental Modelling & Software 125, 104615. https://doi.org/10.1016/j.envsoft.2019.104615

Test the package and give us feedback here or send an e-mail to andrefaandrade@gmail.com or sjevelazco@gmail.com!

andrefaa/ENM_TheMetaLand documentation built on Nov. 15, 2023, 10:19 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

andrefaa/ENM_TheMetaLand
Create and process Ecological Niche, including several pre- and post-processing methods

In andrefaa/ENM_TheMetaLand: Create and process Ecological Niche, including several pre- and post-processing methods

The logic behind ENMTML

How to run?

Function Arguments

What are my results?

CITATION:

R Package Documentation

Browse R Packages

We want your feedback!

andrefaa/ENM_TheMetaLand Create and process Ecological Niche, including several pre- and post-processing methods

In andrefaa/ENM_TheMetaLand: Create and process Ecological Niche, including several pre- and post-processing methods

The logic behind ENMTML

How to run?

Function Arguments

What are my results?

CITATION:

R Package Documentation

Browse R Packages

We want your feedback!

andrefaa/ENM_TheMetaLand
Create and process Ecological Niche, including several pre- and post-processing methods