ENMTML: Create and process Ecological Niche Models

View source: R/ENMTML.R

ENMTMLR Documentation

Create and process Ecological Niche Models

Description

Create and process Ecological Niche Models

Usage

ENMTML(
  pred_dir,
  proj_dir = NULL,
  result_dir = NULL,
  occ_file,
  sp,
  x,
  y,
  min_occ = 10,
  thin_occ = NULL,
  eval_occ = NULL,
  colin_var = NULL,
  imp_var = FALSE,
  sp_accessible_area = NULL,
  pseudoabs_method,
  pres_abs_ratio = 1,
  part,
  save_part = FALSE,
  save_final = TRUE,
  algorithm,
  thr,
  msdm = NULL,
  ensemble = NULL,
  extrapolation = FALSE,
  cores = 1
)

Arguments

pred_dir

character. Directory path with predictors (file formats supported are ASC, BILL, TIFF or TXT)

proj_dir

character. Directory path containing folders with predictors for different regions or time periods used to project models (file formats supported are ASC, BILL, TIFF, or TXT).

result_dir

character. Directory path with the folder in which model results will be recorded.

  • NULL: Results will be recorded in a default Result folder, at the same level as the pred_dir folder.

  • Simple name: A folder with the specified name will be created at the same level as the pred_dir folder (e.g. usage result_dir="MyFolderName")

  • Complete path: A folder will be created at the specified path (e.g. result_dir="C:/Users/mypc/Documents/MyFolderName").

occ_file

character. Directory path with the tab-delimited TXT file, which will contain at least three columns with information about species names, and the latitude and longitude of species occurrences.

sp

character. Name of the column with information about species names.

x

character. Name of the column with information about longitude.

y

character. Name of the column with information about latitude.

min_occ

integer. Minimum number of unique occurrences (species with less than this number will be excluded).

thin_occ

character. Perform spatial filtering (Thinning, based on spThin package) on the presences. For this augment it is necessary provide a vector in which its elements need to have the names 'method' or 'method' and 'distance' (more information below). Three thinning methods are available (default NULL):

  • MORAN: Distance defined by Moran Variogram, usage thin_occ=c(method='MORAN').

  • CELLSIZE: Distance defined by 2x cellsize (Haversine Transformation), usage thin_occ=c(method='CELLSIZE').

  • USER-DEFINED: User defined distance. For this option it is necessary to provide a vector with two values. Usage thin_occ=c(method='USER-DEFINED', distance='300'). The second numeric value refers to the distance in km that will be used for thinning. So distance=300 means that all records within a radius of 300 km will be deleted.

eval_occ

character. Directory path with tab-delimited TXT file with species names, latitude and longitude, these three columns must have the same columns names than the database used in the occ_file argument. This external occurrence database will be used to external models validation (i.e., it will no be use to model fitting). (default NULL).

colin_var

character. Method to reduce variable collinearity:

  • PCA: Perform a Principal Component Analysis on predictors and use Principal Components as environmental variables, usage colin_var=c(method='PCA').

  • VIF: Variance Inflation Factor; usage colin_var=c(method='VIF').

  • PEARSON: Select variables by Pearson correlation, a threshold of maximum correlation must be specified by user, usage colin_var=c(method='PEARSON', threshold='0.7').

imp_var

logical. Perform variable importance and data for curves response for selected algorithms? (default FALSE)

sp_accessible_area

character. Restrict for each species the accessible area, i.e., the area used to model fitting. It is necessary to provide a vector for this argument. Three methods were implemented

  • BUFFER area used to model fitting delimited by a buffer with a width size equal to the maximum distance among pair of occurrences for each species. Usage sp_accessible_area=c(method='BUFFER', type='1').

  • BUFFER area used to model fitting delimited by a buffer with a width size defined by the user in km. Note this width size of buffer will be used for all species. Usage sp_accessible_area=c(method='BUFFER', type='2', width='300').

  • MASK: this method consists in delimit the area used to model fitting based on the polygon where a species occurrences fall. For instance, it is possible delimit the calibration area based on ecoregion shapefile. For this option it is necessary inform the path to the file that will be used as mask. Next file format can be loaded '.bil', '.asc', '.tif', '.shp', and '.txt'. Usage sp_accessible_area=c(method='MASK', filepath='C:/Users/mycomputer/ecoregion/olson.shp').

  • USER_DEFINED: users can inform their own masks for accessible area. In this situation the program requires a folder within species-specific masks, one for each species, being that the mask name must match the species name within the occurrence file.For this option it is necessary inform the path to the folder containing the accessible areas. The following file formats can be loaded '.bil', '.asc', '.tif', '.shp', and '.txt'. Usage sp_accessible_area=c(method='USER-DEFINED', filepath='C:/Users/mycomputer/accessibleareafolder').

pseudoabs_method

character. Pseudo-absence allocation method. It is necessary to provide a vector for this argument. Only one method can be chosen. The next methods are implemented:

  • RND: Random allocation of pseudo-absences throughout the area used for model fitting. Usage pseudoabs_method=c(method='RND').

  • ENV_CONST: Pseudo-absences are environmentally constrained to a region with lower suitability values predicted by a Bioclim model. Usage pseudoabs_method=c(method='ENV_CONST').

  • GEO_CONST: Pseudo-absences are allocated far from occurrences based on a geographical buffer. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_CONST', width='50').

  • GEO_ENV_CONST: Pseudo-absences are constrained environmentally (based on Bioclim model) but distributed geographically far from occurrences based on a geographical buffer. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_ENV_CONST', width='50')

  • GEO_ENV_KM_CONST: Pseudo-absences are constrained on a three-level procedure; it is similar to the GEO_ENV_CONST with an additional step which distributes the pseudo-absences in the environmental space using k-means cluster analysis. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_ENV_KM_CONST', width='50')

pres_abs_ratio

numeric. Presence-Absence ratio (values between 0 and 1)

part

character. Partition method for model's validation. Only one method can be chosen. It is necessary to provide a vector for this argument. The next methods are implemented:

  • BOOT: Random bootstrap partition. Usage part=c(method='BOOT', replicates='2', proportion='0.7'). 'replicate' refers to the number of replicates, it assumes a value >=1. 'proportion' refers to the proportion of occurrences used for model fitting, it assumes a value >0 and <=1. In this example proportion='0.7' mean that 70% of data will be used for model training, while 30% for model testing.

  • KFOLD: Random partition in k-fold cross-validation. Usage part=c(method= 'KFOLD', folds='5'). 'folds' refers to the number of folds for data partitioning, it assumes value >=1.

  • BANDS: Geographic partition structured as bands arranged in a latitudinal way (type 1) or longitudinal way (type 2). Usage part=c(method= 'BANDS', type='1'). 'type' refers to the bands disposition

  • BLOCK: Geographic partition structured as a checkerboard (a.k.a. block cross-validation). Usage part=c(method= 'BLOCK').

save_part

logical. If TRUE, function will save .tif files of partial models, i.e. model created by each occurrence partitions. (default FALSE).

save_final

logical. If TRUE, function will save .tif files of the final model, i.e. fitted with all occurrences data. (default TRUE)

algorithm

character. Algorithm to construct ecological niche models (it is possible to use more than one method):

  • BIO: Bioclim

  • MAH: Mahalanobis

  • DOM: Domain

  • ENF: Ecological Niche Factor Analysis

  • MXS: Maxent Simple (only linear and quadratic features, based on MaxNet package)

  • MXD: Maxent Default (all features, based on MaxNet package)

  • SVM: Support Vector Machine

  • SVM-B: Support Vector Machine (using Background instead of Pseudo-Absences)

  • GLM: Generalized Linear Model

  • GAM: Generalizes Additive Model

  • BRT: Boosted Regression Tree

  • RDF: Random Forest

  • MLK: Maximum Likelihood

  • GAU: Gaussian Process

thr

character. Threshold used for presence-absence predictions. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument:

  • LPT: The highest threshold at which there is no omission. Usage thr=c(type='LPT').

  • MAX_TSS: Threshold at which the sum of the sensitivity and specificity is the highest. Usage thr=c(type='MAX_TSS').

  • MAX_KAPPA: The threshold at which kappa is the highest ("max kappa"). Usage thr=c(type='MAX_KAPPA').

  • SENSITIVITY: A threshold value specified by user. Usage thr=c(type='SENSITIVITY', sens='0.6'). 'sens' refers to models will be binarized using this suitability value. Note that this method assumes 'sens' value for all algorithm and species.

  • JACCARD: The threshold at which Jaccard is the highest. Usage thr=c(type='JACCARD').

  • SORENSEN: The threshold at which Sorensen is highest. Usage thr=c(type='SORENSEN').

In the case of use more than one threshold type it is necessary concatenate the names of threshold types, e.g., thr=c(type=c('LPT', 'MAX_TSS', 'JACCARD')). When SENSITIVITY threshold is used in combination with other it is necessary specify the desired sensitivity value, e.g. thr=c(type=c('LPT', 'MAX_TSS', 'SENSITIVITY'), sens='0.8')

msdm

character. Include spatial restrictions to model projection. These methods restrict ecological niche models in order to have less potential prediction and turn models closer to species distribution models. They are classified in 'a Priori' and 'a Posteriori' methods. The first one encompasses method that include geographical layers as predictor of models' fitting, whereas a Posteriori constrain models based on occurrence and suitability patterns. This argument is filled only with a method, in the case of use MCP-B method msdm is filled in a different way se below:

a Priori methods (layer created area added as a predictor at moment of model fitting):

  • XY: Create two layers latitude and longitude layer. Usage msdm=c(method='XY').

  • MIN: Create a layer with information of the distance from each cell to the closest occurrence. Usage msdm=c(method='MIN').

  • CML: Create a layer with information of the summed distance from each cell to all occurrences. Usage msdm=c(method='CML').

  • KER: Create a layer with a Gaussian-Kernel on the occurrence data. Usage msdm=c(method='KER').

a Posteriori methods

  • OBR: Occurrence based restriction, uses the distance between points to exclude far suitable patches (Mendes et al., in prep). Usage msdm=c(method='OBR').

  • LR: Lower Quantile, select the nearest 25% patches (Mendes et al., in prep). Usage msdm=c(method='LR').

  • PRES: Select only the patches with confirmed occurrence data (Mendes et al, in prep). Usage msdm=c(method='PRES').

  • MCP: Excludes suitable cells outside the Minimum Convex Polygon (MCP) built based on occurrences data. Usage msdm=c(method='MCP').

  • MCP-B: Creates a buffer (with a width size defined by user in km) around the MCP. Usage msdm=c(method='MCP-B', width=100). In this case width=100 means that a buffer with 100km of width will be created around the MCP.

ensemble

character. Method used to ensemble different algorithms. It is possible to use more than one method. A vector must be provided for this argument. For SUP, W_MEAN or PCA_SUP method it is necessary provide an evaluation metric to ensemble arguments (i.e., AUC, Kappa, TSS, Jaccard, Sorensen or Fpb) see below. (default NULL):

  • MEAN: Simple average of the different models. Usage ensemble=c(method='MEAN').

  • W_MEAN: Weighted average of models based on their performance. An evaluation metric must be provided. Usage ensemble=c(method='W_MEAN', metric='TSS').

  • SUP: Average of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage ensemble=c(method='SUP', metric='TSS').

  • PCA: Performs a Principal Component Analysis (PCA) and returns the first axis. Usage ensemble=c(method='PCA').

  • PCA_SUP: PCA of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage ensemble=c(method='PCA_SUP', metric='Fpb').

  • PCA_THR: PCA performed only with those cells with suitability values above the selected threshold. Usage ensemble=c(method='PCA_THR').

In the case of use more than one ensemble method it is necessary concatenate the names of ensemble methods within the argument, e.g., ensemble=c(method=c('MEAN', 'PCA')), ensemble=c(method=c('MEAN, 'W_MEAN', 'PCA_SUP'), metric='Fpb')

extrapolation

logical. If TRUE the function will calculate extrapolation based on Mobility-Oriented Parity analysis (MOP) for current conditions. If the argument proj_dir is used, the extrapolation layers for other regions or time periods will also be calculated.

cores

numeric. Define the number of CPU cores to run modeling procedures in parallel (default 1).

Examples

require(ENMTML)
require(raster)

##%######################################################%##
#                                                          #
####           Directories and data creation            ####
#                                                          #
##%######################################################%##
# ENMTML package account with some bioclimantic variables
# used to test ENMTML function.
# In order to simulate the files and folders needed for an ENMTML function
# will be created different folders with some data

# First will be created a folder with a working directory
getwd() #' Working directory of R session
d_ex <- file.path(getwd(), 'ENMTML_example')
d_ex
dir.create(d_ex)

# Will be saved some ENMTML data sets to ENMTML_example folder
# Virtual species occurrences
data("occ")
d_occ <- file.path(d_ex, 'occ.txt')
utils::write.table(occ, d_occ, sep = '\t', row.names = FALSE)
# Five bioclimatic variables for current conditions
data("env")
d_env <- file.path(d_ex, 'current_env_var')
dir.create(d_env)
raster::writeRaster(env, file.path(d_env, names(env)), bylayer=TRUE, format='GTiff')
# Five bioclimatic variables for future conditions
# (for more details see predictors_future help)
data("env_fut")
d_fut <- file.path(d_ex, 'future_env_var')
dir.create(d_fut)
d0 <- file.path(d_fut, names(env_fut))
sapply(d0, dir.create)

raster::writeRaster(env_fut$`2080_4.5`, file.path(d0[1],
            names(env_fut$`2080_4.5`)), bylayer=TRUE, format='GTiff')
raster::writeRaster(env_fut$`2080_8.5`, file.path(d0[2],
            names(env_fut$`2080_8.5`)), bylayer=TRUE, format='GTiff')

# Polygon of terrestrial ecoregions
data("ecoregions")
d_eco <- file.path(d_ex, 'ecoregions')
dir.create(d_eco)
d_eco <- file.path(d_eco, paste0('eco','.shp'))
shapefile(ecoregions, d_eco)

# shell.exec(d_ex) # open the directory and folders created
rm(list = c('d0', 'd_ex', 'ecoregions', 'env', 'env_fut', 'occ'))

# Now we have the minimum data needed to create models with ENMTML package
# a directory with environmental rasters and a .txt file with occurrence


##%######################################################%##
#                                                          #
####           Construction ENM with ENMTML            ####
#                                                          #
##%######################################################%##
args(ENMTML)

# ENMTML provides a variety of tools to build different models
# depending on the modeling objectives.
# Here will be provided a single modeling procedure.
# For more example and exploration of models
# see <https://github.com/andrefaa/ENMTML>

# Will be fitted models for five virtual species with
# current and future conditions. Please read ENMTML arguments.

# The next object contains the directory and file path data and folders that will be used
d_occ # file path with species occurrences
d_env # directory path with current environmental conditions (raster in tiff format)
d_fut # directory path with folders with future environmental conditions (raster in tiff format)
d_eco # file path with shapefile used to constrain models


ENMTML(
 pred_dir = d_env,
 proj_dir = NULL,
 result_dir = NULL,
 occ_file = d_occ,
 sp = 'species',
 x = 'x',
 y = 'y',
 min_occ = 10,
 thin_occ = NULL,
 eval_occ = NULL,
 colin_var = c(method='PCA'),
 imp_var = FALSE,
 sp_accessible_area = c(method='BUFFER', type='2', width='500'),
 pseudoabs_method = c(method = 'RND'),
 pres_abs_ratio = 1,
 part=c(method= 'KFOLD', folds='2'),
 save_part = FALSE,
 save_final = TRUE,
 algorithm = c('SVM', 'RDF', 'MXD'),
 thr = c(type='MAX_TSS'),
 msdm = NULL,
 ensemble = c(method='PCA'),
 extrapolation = FALSE,
 cores = 1
)

# ENMTML function will create a folder named Result a directory
# prior to the directory specified in the pred_dir argument

d_env # Directory used to define environmental variables
d_rslt <- file.path(dirname(d_env), 'Result')
d_rslt
# shell.exec(d_rslt) # for Windows users
# List of txt files and subdirectories
list.files(d_rslt)
list.dirs(d_rslt)



andrefaa/ENM_TheMetaLand documentation built on Nov. 15, 2023, 10:19 a.m.