R/kuenm_cal.R

Defines functions kuenm_cal

Documented in kuenm_cal

#' Creation of candidate models for calibration
#'
#' @description kuenm_cal creates and executes a batch file (bash for Unix) for generating candidate models in Maxent
#' to test multiple parameter combinations, including distinct regularization multiplier values,
#' various feature classes, and different sets of environmental variables.
#'
#' @param occ.joint (character) is the name of the csv file with all the occurrences; columns must be: species, longitude, latitude.
#' @param occ.tra (character) is the name of the csv file with the calibration occurrences; columns equal to occ.joint.
#' @param M.var.dir (character) is the name of the folder containing other folders with different sets of environmental variables.
#' @param batch (character) name of the batch file (bash for Unix) with the code to create all candidate models.
#' @param out.dir (character) name of the folder that will contain all calibration model subfolders.
#' @param max.memory (numeric) maximum memory (in megabytes) to be used by maxent while creating the models. Default = 1000.
#' @param reg.mult (numeric vector) regularization multiplier(s) to be evaluated.
#' @param f.clas (character) feature clases can be selected from five different combination sets or manually.
#' Combination sets are: "all", "basic", "no.t.h", "no.h", and "no.t". Default = "all".
#' basic = "l", "lq", "lqp", "lqpt", "lqpth". Combinations "no.t.h", "no.h", and "no.t", exclude t and/or h.
#' See details for all the available potential combinations of feature classes.
#' @param args (character) additional arguments that can be passed to Maxent. See the Maxent help for more information
#' on how to write these arguments, default = NULL. Note that some arguments cannot be changed here because they are
#' part of the parameters of the function already (e.g., "betamultiplier" or "plots"). See details for other options.
#' @param maxent.path (character) the path were the maxent.jar file is in your computer.
#' @param wait (logical) if TRUE R will wait until all the Maxent models are created. If FALSE the process of
#' model creation will be performed separately and R could be used at the same time. This may be useful for evaluating
#' candidate models parallelly. Default = TRUE.
#' @param run (logical) if TRUE the batch runs after its creation, if FALSE it will only be created and its runnig would be
#' manual, default = TRUE.
#'
#' @return A folder named out.dir with all the subfolders to save Maxent results when running the .bat file (.sh for Unix).
#' A .bat file (.sh for Unix) containing the java codes to run the calibration models, it will run auotmatically or on some
#' computers a dialog box will ask if running is allowed.
#'
#' @details Java needs to be installed in the computer and maxent.jar needs to be in a known place in the computer.
#' Java can be obtained from \url{https://java.com/es/download/manual.jsp}. Users of Linux and Mac need the entire
#' Java Development Kit available in \url{http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html}.
#' Maxent can be downloaded from \url{https://biodiversityinformatics.amnh.org/open_source/maxent/}
#'
#' Below all potential combinations of feature classes are shown. Manual selection can be done by creating
#' a vector of one or more of the combinations of this list. l = linear, q = quadratic, p = product,
#' t = threshold, and h = hinge.
#' "l", "q", "p", "t", "h", "lq", "lp", "lt", "lh", "qp", "qt", "qh", "pt", "ph",
#' "th", "lqp", "lqt", "lqh", "lpt", "lph", "lth", "qpt", "qph", "qth", "pth",
#' "lqpt", "lqph", "lqth", "lpth", "qpth", and "lqpth".
#'
#' The way to include further arguments is as follows:
#' args = "biasfile=COMPLETE_PATH\\bias.asc biastype=3" in windows,
#' or args = "biasfile=COMPLETE_PATH/bias.asc biastype=3" in Unix based systems.
#' If the path contains spaces the way to write it is:
#' args = "biasfile=\"COMPLETE PATH\\bias.asc\" biastype=3" in windows, or
#' args = "biasfile=\"COMPLETE PATH/bias.asc\" biastype=3" in Unix based systems.
#'
#' Other options that can be included in args are all "Flags" from the following
#' list:
#'
#' Flag | Abbrv | Type | Default | Meaning
#' - maximumbackground | MB | integer | 10000 | If the number of background points / grid cells is larger than this number, then this number of cells is chosen randomly for background points.
#' - togglelayertype | t | string | | Toggle continuous/categorical for environmental layers whose names begin with this prefix (default: all continuous).
#' - biasfile | | file | | Sampling is assumed to be biased according to the sampling distribution given in this grid file. Values in this file must not be zero or negative. MaxEnt will factor out the bias. We recomend to create this file as a kernell density of geographic points representing all localities were samplings of similar organisms have been performed (multiply this layer by 1000 and round it to reduce number of decimals). IMPORTANT: A biasfile must be included with its entire path, as indicated above above.
#' - biastype | | integer | | If biasfile is defined, this integer needs to be defined depending on the type of bias added. If the bias file is prepared as recomended, biastype=3.
#' - writebackgroundpredictions | | boolean | FALSE | Write .csv file with predictions at background points.
#' - maximumiterations | m | integer | 500 | Stop training after this many iterations of the optimization algorithm.
#' - convergencethreshold | c | double | 0.00001 | Stop training when the drop in log loss per iteration drops below this number.
#' - threads | | integer | 1 | Number of processor threads to use. Matching this number to the number of cores on your computer speeds up some operations, especially variable jackknifing.
#' - logfile | | string | maxent.log | File name to be used for writing debugging information about a run in output directory.
#' - cache | | boolean | TRUE | Make a .mxe cached version of ascii files, for faster access.
#' - defaultprevalence | | double | 0.5 | Default prevalence of the species: probability of presence at ordinary occurrence points. See Elith et al., Diversity and Distributions, 2011 for details.
#'
#' Other more advanced arguments are (use these ones only if you understand them completely):
#' - lq2lqptthreshold | | integer | 80 | Number of samples at which product and threshold features start being used.
#' - l2lqthreshold | | integer | 10 | Number of samples at which quadratic features start being used.
#' - hingethreshold | | integer | 15 | Number of samples at which hinge features start being used.
#' - beta_threshold | | double | -1 | Regularization parameter to be applied to all threshold features; negative value enables automatic setting.
#' - beta_categorical | | double | -1 | Regularization parameter to be applied to all categorical features; negative value enables automatic setting.
#' - beta_lqp | | double | -1 | Regularization parameter to be applied to all linear, quadratic and product features; negative value enables automatic setting.
#' - beta_hinge | | double | -1 | Regularization parameter to be applied to all hinge features; negative value enables automatic setting.
#'
#' @usage
#' kuenm_cal(occ.joint, occ.tra, M.var.dir, batch, out.dir, max.memory = 1000,
#'           reg.mult, f.clas = "all", args = NULL, maxent.path,
#'           wait = TRUE, run = TRUE)
#'
#' @export
#'
#' @examples
#' # To replicate this example dowload the data from the following link:
#' # https://kuscholarworks.ku.edu/bitstream/handle/1808/26376/ku.enm_example_data.zip?sequence=3&isAllowed=y
#'
#' # Variables with information to be used as arguments.
#' occ_joint <- "aame_joint.csv"
#' occ_tra <- "aame_train.csv"
#' M_var_dir <- "M_variables"
#' batch_cal <- "Candidate_models"
#' out_dir <- "Candidate_Models"
#' reg_mult <- c(seq(0.1, 1, 0.1), seq(2, 6, 1), 8, 10)
#' f_clas <- "all"
#' maxent_path <- "YOUR/DIRECTORY/WITH/MAXENT"
#' wait <- FALSE
#' run <- TRUE
#'
#' kuenm_cal(occ.joint = occ_joint, occ.tra = occ_tra, M.var.dir = M_var_dir, batch = batch_cal,
#'           out.dir = out_dir, reg.mult = reg_mult, f.clas = f_clas, maxent.path = maxent_path,
#'           wait = wait, run = run)

kuenm_cal <- function(occ.joint, occ.tra, M.var.dir, batch, out.dir, max.memory = 1000,
                      reg.mult, f.clas = "all", args = NULL, maxent.path,
                      wait = TRUE, run = TRUE) {

  #Checking potential issues
  if (!file.exists(occ.joint)) {
    stop(paste(occ.joint, "does not exist in the working directory, check file name",
               "\nor extension, example: species_joint.csv"))
  }
  if (!file.exists(occ.tra)) {
    stop(paste(occ.tra, "does not exist in the working directory, check file name",
               "\nor extension, example: species_train.csv"))
  }
  if (missing(M.var.dir)) {
    stop("Argument M.var.dir is not defined.")
  }
  if (!dir.exists(M.var.dir)) {
    stop(paste(M.var.dir, "does not exist in the working directory, check folder name",
               "\nor its existence."))
  }
  if (length(list.dirs(M.var.dir, recursive = FALSE)) == 0) {
    stop(paste(M.var.dir, "does not contain any subdirectory with environmental variables,",
               "\neach set of variables must be in a subdirectory inside",
               paste(M.var.dir, ".", sep = "")))
  }
  if (missing(reg.mult)) {
    warning(paste("Argument reg.mult is not defined, a default set will be used:",
                  "\n0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 6.0 8.0 10.0"))
    reg.mult <- c(seq(0.1, 1, 0.1), seq(2, 6, 1), 8, 10)
  }
  if (class(reg.mult) != "numeric") {
    stop("Argument reg.mult must be numeric.")
  }
  if (missing(batch)) {
    warning(paste("Argument batch is not defined, the default name candidate_models",
                  "\nwill be used."))
    batch <- "candidate_models"
  }
  if (missing(out.dir)) {
    warning(paste("Argument out.dir is not defined, the default name Candidate_Models",
                  "\nwill be used."))
    out.dir <- "Candidate_Models"
  }
  if (missing(maxent.path)) {
    stop(paste("Argument maxent.path is not defined, it is necessary for executing",
               "\nthe Maxent software."))
  }

  #Slash
  if(.Platform$OS.type == "unix") {
    sl <- "/"
    dl <- "/"
  } else {
    sl <- "\\"
    dl <- "\\\\"
  }

  #Data
  ##Environmental variables sets
  m <- dir(M.var.dir)
  ms <- paste(gsub("/", dl, paste(getwd(), M.var.dir, sep = sl)), sl, m, sep = "")
  env <- paste("environmentallayers=", paste("\"", ms, "\"", sep = ""), sep = "")

  ##Species occurrences
  oc <- occ.joint
  samp <- paste("samplesfile=", gsub("/", dl, paste("\"", paste(getwd(), oc, sep = sl),
                                                    "\"", sep = "")), sep = "")
  occ <- occ.tra
  samp1 <- paste("samplesfile=", gsub("/", dl, paste("\"", paste(getwd(), occ, sep = sl),
                                                     "\"", sep = "")), sep = "")

  #Maxent settings
  ##Featire classes combinations
  fea <- feature_classes(f.clas)

  #output directories
  dir.create(out.dir)
  out.dir <- gsub("/", dl, paste(getwd(), out.dir, sep = sl))

  #Getting ram to be used
  ram <- paste("-mx", max.memory, "m", sep = "")

  #Fixed commands
  ##Intitial command
  in.comm <- paste("java", ram,
                   paste("-jar",
                         gsub("/", dl,
                              paste("\"", paste(maxent.path, "maxent.jar", sep = sl), "\"", sep = ""))),
                   sep = " ")

  ##Autofeature
  a.fea <- "autofeature=false"

  ##Other maxent settings
  fin.com <- "extrapolate=false doclamp=false replicates=1 replicatetype=Crossvalidate responsecurves=false jackknife=false plots=false pictures=false outputformat=raw warnings=false visible=false redoifexists autorun\n"
  fin.com1 <- "extrapolate=false doclamp=false replicates=1 replicatetype=Crossvalidate responsecurves=false jackknife=false plots=false pictures=false outputformat=logistic warnings=false visible=false redoifexists autorun\n"

  #Final code
  if(.Platform$OS.type == "unix") {
    cat("\nCreating directories and maxent batch file, please wait...\n")
    sink(paste(batch, ".sh", sep = "")) # beginning file preparation
    cat("#! /bin/csh\n")
  } else {
    pb <- winProgressBar(title = "Progress bar", min = 0, max = length(reg.mult), width = 300) #progress bar
    sink(paste(batch, ".bat", sep = "")) # beginning file preparation
  }

  for (i in 1:length(reg.mult)) {
    Sys.sleep(0.1)
    if(.Platform$OS.type == "unix") {

    } else {
      setWinProgressBar(pb, i, title = paste( round(i / length(reg.mult) * 100, 0), "% finished"))
    }

    for (j in 1:length(fea)) {
      for (k in 1:length(ms)) {
        subfol <- paste("outputdirectory=", paste("\"", out.dir, sl,
                        paste("M", reg.mult[i], "F", names(fea)[j], m[k], "all", sep = "_"),
                        "\"", sep = ""), sep = "")
        dir.create(paste(out.dir, sl,
                         paste("M", reg.mult[i], "F", names(fea)[j], m[k], "all", sep = "_"), sep = ""))
        reg.m <- paste("betamultiplier=", reg.mult[i], sep = "")
        cat(paste(in.comm, env[k], samp, subfol, reg.m, a.fea, fea[j], args, fin.com, sep = " "))

        subfol1 <- paste("outputdirectory=", paste("\"", out.dir, sl,
                         paste("M", reg.mult[i], "F", names(fea)[j], m[k], "cal", sep = "_"),
                         "\"", sep = ""), sep = "")
        dir.create(paste(out.dir, sl,
                         paste("M", reg.mult[i], "F", names(fea)[j], m[k], "cal", sep = "_"), sep = ""))
        cat(paste(in.comm, env[k], samp1, subfol1, reg.m, a.fea, fea[j], args, fin.com1, sep = " "))
      }
    }
  }
  sink()
  if(.Platform$OS.type != "unix") {
    suppressMessages(close(pb))
  }

  cat("\nIf asked and run = TRUE, allow runing as administrator.")

  if(run == TRUE){
    if(.Platform$OS.type == "unix") {
      batfile_path <- file.path(getwd(), paste(batch, ".sh", sep = "")) # bat file
      r_wd <- getwd() # real working directory
      setwd(maxent.path) # change temporally the working directory

      system(paste("bash", batfile_path), wait = wait)

    } else {
      batfile_path <- file.path(getwd(), paste(batch, ".bat", sep = "")) # bat file
      r_wd <- getwd() # real working directory
      setwd(maxent.path) # change temporally the working directory

      system2(batfile_path, wait = wait, invisible = FALSE)
    }
    setwd(r_wd) # return actual working directory
  }

  cat("\nProcess finished\n")

  if(.Platform$OS.type == "unix") {
    cat(paste("A maxent shell script for creating", i * j * k, "calibration models has been written", sep = " "))
  } else {
    cat(paste("A maxent batch file for creating", i * j * k, "calibration models has been written", sep = " "))
  }

  cat(paste("\nCheck your working directory!!!", getwd(), sep = "    "))
}
manubio13/ku.enm documentation built on Jan. 5, 2024, 5:55 a.m.