kuenm_cal_swd: Complete Maxent model calibration in SWD format

View source: R/kuenm_cal_swd.R

kuenm_cal_swdR Documentation

Complete Maxent model calibration in SWD format

Description

kuenm_cal_swd performs the whole process of model calibration (i.e., candidate model creation and evaluation) using Maxent in SWD format. Models are created with multiple parameter combinations, including distinct regularization multiplier values, various feature classes, and different sets of environmental variables represented by csv files that contain the background. Evaluation is done in terms of statistical significance (partial ROC), prediction ability (omission rates), and model complexity (AICc). After evaluation, this function selects the best models based on user-defined criteria.

Usage

kuenm_cal_swd(occ.joint, occ.tra, occ.test, back.dir, batch,
              out.dir.models, reg.mult, f.clas = "all",
              max.memory = 1000, args = NULL, maxent.path,
              selection = "OR_AICc", threshold = 5,
              rand.percent = 50, iterations = 500,
              kept = TRUE, out.dir.eval)

Arguments

occ.joint

(character) the name of csv file with training and testing occurrences combined; columns must be: species, longitude, latitude, and two or more columns representing distinct variables. See details.

occ.tra

(character) the name of the csv file with the training occurrences; columns as in occ.joint.

occ.test

(character) the name of the csv file with the evaluation occurrences; columns as in occ.joint.

back.dir

(character) the name of the folder containing one or more csv files representing one or more sets of predictor variables for a background. Columns in background files must be: background, longitude, latitude, and two or more columns representing distinct variables. See details.

batch

(character) name of the batch file (bash for Unix) with the code to create all candidate models for calibration.

out.dir.models

(character) name of the folder that will contain all calibration model subfolders.

reg.mult

(numeric vector) regularization multiplier(s) to be evaluated.

f.clas

(character) feature classes can be selected from five different combination sets or manually. Combination sets are: "all", "basic", "no.t.h", "no.h", and "no.t". Default = "all". basic = "l", "lq", "lqp", "lqpt", "lqpth". Combinations "no.t.h", "no.h", and "no.t", exclude t and/or h. See details for all the available potential combinations of feature classes.

max.memory

(numeric) maximum memory (in megabytes) to be used by maxent while creating the models. Default = 1000.

args

(character) additional arguments that can be passed to Maxent. See the Maxent help for more information on how to write these arguments, default = NULL. Note that some arguments cannot be changed here because they are part of the parameters of the function already (e.g., "betamultiplier" or "plots"). See details for other options.

maxent.path

(character) the path were maxent.jar file is in your computer.

selection

(character) model selection criterion, can be "OR_AICc", "AICc", or "OR"; OR = omission rates. Default = "OR_AICc", which means that among models that are statistically significant and that present omission rates below the threshold, those with delta AICc up to 2 will be selected. See details for other selection criteria.

threshold

(numeric) the percentage of training data omission error allowed (E); default = 5.

rand.percent

(numeric) the percentage of data to be used for the bootstrapping process when calculating partial ROCs; default = 50.

iterations

(numeric) the number of times that the bootstrap is going to be repeated; default = 500.

kept

(logical) if FALSE, all candidate models will be erased after evaluation, default = TRUE.

out.dir.eval

(character) name of the folder where evaluation results will be written.

Details

Java needs to be installed in the computer and maxent.jar needs to be in a known place in the computer. Java can be obtained from https://java.com/es/download/manual.jsp. Users of Linux and Mac need the entire Java Development Kit available in http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html. Maxent can be downloaded from https://biodiversityinformatics.amnh.org/open_source/maxent/

To prepare occurrence and background csv files as needed for using this function, use prepare_swd. Occurrence csv files must contain information and be organized as in the example below:

Species Longitude Latitude bio_1 bio_12 bio_15
My_species -79.24999 37.91667 113 1085 11
My_species -79.41666 35.41667 155 1173 16
My_species -76.41666 37.91667 142 1060 12

Background csv files must contain information and be organized as in the example below:

background Longitude Latitude bio_1 bio_12 bio_15
background -79.24999 37.91667 113 1085 11
background -79.41666 35.41667 155 1173 16
background -76.41666 37.91667 142 1060 12

Below all potential combinations of feature classes are shown. Manual selection can be done by creating a vector of one or more of the combinations of this list. l = linear, q = quadratic, p = product, t = threshold, and h = hinge. "l", "q", "p", "t", "h", "lq", "lp", "lt", "lh", "qp", "qt", "qh", "pt", "ph", "th", "lqp", "lqt", "lqh", "lpt", "lph", "lth", "qpt", "qph", "qth", "pth", "lqpt", "lqph", "lqth", "lpth", "qpth", and "lqpth".

Other selecton criteria are described below: If "AICc" criterion is chosen, all significant models with delta AICc up to 2 will be selected If "OR" is chosen, the 10 first significant models with the lowest omission rates will be selected.

The way to include further arguments is as follows: args = "biasfile=COMPLETE_PATH\bias.asc biastype=3" in windows, or args = "biasfile=COMPLETE_PATH/bias.asc biastype=3" in Unix based systems. If the path contains spaces the way to write it is: args = "biasfile=\"COMPLETE PATH\bias.asc\" biastype=3" in windows, or args = "biasfile=\"COMPLETE PATH/bias.asc\" biastype=3" in Unix based systems.

Other options that can be included in args are all "Flags" from the following list:

Flag | Abbrv | Type | Default | Meaning

  • maximumbackground | MB | integer | 10000 | If the number of background points / grid cells is larger than this number, then this number of cells is chosen randomly for background points.

  • togglelayertype | t | string | | Toggle continuous/categorical for environmental layers whose names begin with this prefix (default: all continuous).

  • biasfile | | file | | Sampling is assumed to be biased according to the sampling distribution given in this grid file. Values in this file must not be zero or negative. MaxEnt will factor out the bias. We recomend to create this file as a kernell density of geographic points representing all localities were samplings of similar organisms have been performed (multiply this layer by 1000 and round it to reduce number of decimals). IMPORTANT: A biasfile must be included with its entire path, as indicated above above.

  • biastype | | integer | | If biasfile is defined, this integer needs to be defined depending on the type of bias added. If the bias file is prepared as recomended, biastype=3.

  • writebackgroundpredictions | | boolean | FALSE | Write .csv file with predictions at background points.

  • maximumiterations | m | integer | 500 | Stop training after this many iterations of the optimization algorithm.

  • convergencethreshold | c | double | 0.00001 | Stop training when the drop in log loss per iteration drops below this number.

  • threads | | integer | 1 | Number of processor threads to use. Matching this number to the number of cores on your computer speeds up some operations, especially variable jackknifing.

  • logfile | | string | maxent.log | File name to be used for writing debugging information about a run in output directory.

  • cache | | boolean | TRUE | Make a .mxe cached version of ascii files, for faster access.

  • defaultprevalence | | double | 0.5 | Default prevalence of the species: probability of presence at ordinary occurrence points. See Elith et al., Diversity and Distributions, 2011 for details.

Other more advanced arguments are (use these ones only if you understand them completely):

  • lq2lqptthreshold | | integer | 80 | Number of samples at which product and threshold features start being used.

  • l2lqthreshold | | integer | 10 | Number of samples at which quadratic features start being used.

  • hingethreshold | | integer | 15 | Number of samples at which hinge features start being used.

  • beta_threshold | | double | -1 | Regularization parameter to be applied to all threshold features; negative value enables automatic setting.

  • beta_categorical | | double | -1 | Regularization parameter to be applied to all categorical features; negative value enables automatic setting.

  • beta_lqp | | double | -1 | Regularization parameter to be applied to all linear, quadratic and product features; negative value enables automatic setting.

  • beta_hinge | | double | -1 | Regularization parameter to be applied to all hinge features; negative value enables automatic setting.

Value

A folder named out.dir.models with all the subfolders to save Maxent results when running the .bat file (.sh for Unix). A .bat file (.sh for Unix) containing the java codes to run the calibration models, it will run automatically or on some computers a dialog box will ask if running is allowed.

A list with three data.frames containing results from the calibration process and a scatterplot of all models based on the AICc values and omission rates. In addition, a folder, in the working directory, containing a csv file with information about models meeting the user-defined selection criterion, another csv file with a summary of the evaluation and selection process, an extra csv file containing all the statistics of model performance (pROC, AICc, and omission rates) for all candidate models, a png scatterplot of all models based on the AICc values and rates, and an HTML file summarizing all the information produced after evaluation for helping with further interpretation.


manubio13/ku.enm documentation built on Jan. 5, 2024, 5:55 a.m.