simplify: Identify an uncorrelated, useful subset of Maxent predictors
In johnbaums/rmaxent: Tools for working with Maxent in R

simplify

R Documentation

Identify an uncorrelated, useful subset of Maxent predictors

Description

Given a candidate set of predictor variables, this function identifies a subset that meets specified multicollinearity criteria. Subsequently, backward stepwise variable selection is used to iteratively drop the variable that contributes least to the model, until the contribution of each variable meets a specified minimum, or until a predetermined minimum number of predictors remains.

Usage

simplify(
  occ,
  bg,
  path,
  species_column = "species",
  response_curves = TRUE,
  logistic_format = TRUE,
  type = "PI",
  cor_thr,
  pct_thr,
  k_thr,
  features = "lpq",
  replicates = 1,
  quiet = TRUE
)

Arguments

`occ`	A `data.frame` with predictor values for presence localities, where columns are predictors, and rows are samples. Additionally, a column indicating the species being modelled, which should have column name as specified by argument `species_column` (can be a character string, numeric ID, etc.). The set of values given in this species indicator column must be identical to the set given in the corresponding column of `bg`.
`bg`	A `data.frame` with predictor values for background localities, where columns are predictors, and rows are samples. Additionally, a column indicating the species being modelled, which should have column name as specified by argument `species_column` (can be a character string, numeric ID, etc.). The set of values given in this species indicator column must be identical to the set given in the corresponding column of `occ`.
`path`	The output path within which output subdirectories will be created for each species given in the column of `occ` (and `bg`) specified by `species_column`. If missing, a temporary directory will be used.
`species_column`	The column of `occ` (and `bg`) that contains values indicating which species the samples belong to (e.g., this might be species name, or species ID).
`response_curves`	Logical value indicating whether response curves should be included in Maxent model html output.
`logistic_format`	Logical value indicating whether maxentResults.csv should report logistic value thresholds (`TRUE`) or cloglog value thresholds (`FALSE`). This has no effect for versions of Maxent prior to 3.4.
`type`	The variable contribution metric to use when dropping variables. This can be `'PC'` (percent contribution) or `'PI'` (permutation importance; the default). See the Maxent tutorial for additional details.
`cor_thr`	The maximum allowable pairwise correlation between predictor variables (calculated across presence and background localities).
`pct_thr`	The minimum allowable percent variable contribution (where contribution type is specified by `type`). This should be specified as a value between 0 and 100.
`k_thr`	The minimum number of variables to be kept in the model.
`features`	Features to include. Specify as a string comprising one or more of 'l' (linear), 'p' (product), 'q' (quadratic), 't' (threshold), and 'h' (hinge). E.g., `features='lpq'` (equivalently, `features='plq'`). The default is `'lpq'`.
`replicates`	The number of cross-validation replicates to perform. When cross-validation is used, the average (over folds) of the variable contribution metric is used.
`quiet`	Logical value indicating whether progress messages should be suppressed (`TRUE`) or printed (`FALSE`).

Details

If path is provided, subdirectories will be created within path, with names equal to the values provided in the species_column column of occ. Within these species subdirectories, two additional directories will be created: "full" contains the Maxent output corresponding to the model using the full uncorrelated subset of variables, while "final" contains the Maxent output corresponding to the model fit with the subset of those variables that each contribute at least pct_thr% to the model. Additionally, the MaxEnt R objects for the full and final fitted models are saved into these directories, each with the name "model.rds". These can be read back into R with readRDS().

Value

The final fitted MaxEnt object.

Examples

# Below we modify the example given at ?dismo::maxent:
if (require(dismo) && require(rJava) &&
    file.exists(system.file('java/maxent.jar', package='dismo'))) {
  fnames <- list.files(system.file('ex', package="dismo"), '\\.grd$',
                       full.names=TRUE)
  fnames <- grep('biome', fnames, value=TRUE, invert=TRUE)
  predictors <- scale(stack(fnames))
  occurrence <- system.file('ex/bradypus.csv', package='dismo')
  occ <- read.table(occurrence, header=TRUE, sep=',')[,-1]
  bg <- xyFromCell(predictors, Which(!is.na(sum(predictors)), cells=TRUE))
  occ_swd <- data.frame(species='bradypus', extract(predictors, occ))
  bg_swd <- data.frame(species='bradypus', extract(predictors, bg))
  m <- simplify(occ_swd, bg_swd, cor_thr=0.7, pct_thr=5, k_thr=4, quiet=FALSE)
}

johnbaums/rmaxent documentation built on Oct. 11, 2024, 11:14 a.m.