simplify: Identify an uncorrelated, useful subset of Maxent predictors

Description Usage Arguments Details Value Examples

View source: R/simplify.R

Description

Given a candidate set of predictor variables, this function identifies a subset that meets specified multicollinearity criteria. Subsequently, backward stepwise variable selection is used to iteratively drop the variable that contributes least to the model, until the contribution of each variable meets a specified minimum, or until a predetermined minimum number of predictors remains.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
simplify(
  occ,
  bg,
  path,
  species_column = "species",
  response_curves = TRUE,
  logistic_format = TRUE,
  type = "PI",
  cor_thr,
  pct_thr,
  k_thr,
  features = "lpq",
  replicates = 1,
  quiet = TRUE
)

Arguments

occ

A data.frame with predictor values for presence localities, where columns are predictors, and rows are samples. Additionally, a column indicating the species being modelled, which should have column name as specified by argument species_column (can be a character string, numeric ID, etc.). The set of values given in this species indicator column must be identical to the set given in the corresponding column of bg.

bg

A data.frame with predictor values for background localities, where columns are predictors, and rows are samples. Additionally, a column indicating the species being modelled, which should have column name as specified by argument species_column (can be a character string, numeric ID, etc.). The set of values given in this species indicator column must be identical to the set given in the corresponding column of occ.

path

The output path within which output subdirectories will be created for each species given in the column of occ (and bg) specified by species_column. If missing, a temporary directory will be used.

species_column

The column of occ (and bg) that contains values indicating which species the samples belong to (e.g., this might be species name, or species ID).

response_curves

Logical value indicating whether response curves should be included in Maxent model html output.

logistic_format

Logical value indicating whether maxentResults.csv should report logistic value thresholds (TRUE) or cloglog value thresholds (FALSE). This has no effect for versions of Maxent prior to 3.4.

type

The variable contribution metric to use when dropping variables. This can be 'PC' (percent contribution) or 'PI' (permutation importance; the default). See the Maxent tutorial for additional details.

cor_thr

The maximum allowable pairwise correlation between predictor variables (calculated across presence and background localities).

pct_thr

The minimum allowable percent variable contribution (where contribution type is specified by type). This should be specified as a value between 0 and 100.

k_thr

The minimum number of variables to be kept in the model.

features

Features to include. Specify as a string comprising one or more of 'l' (linear), 'p' (product), 'q' (quadratic), 't' (threshold), and 'h' (hinge). E.g., features='lpq' (equivalently, features='plq'). The default is 'lpq'.

replicates

The number of cross-validation replicates to perform. When cross-validation is used, the average (over folds) of the variable contribution metric is used.

quiet

Logical value indicating whether progress messages should be suppressed (TRUE) or printed (FALSE).

Details

If path is provided, subdirectories will be created within path, with names equal to the values provided in the species_column column of occ. Within these species subdirectories, two additional directories will be created: "full" contains the Maxent output corresponding to the model using the full uncorrelated subset of variables, while "final" contains the Maxent output corresponding to the model fit with the subset of those variables that each contribute at least pct_thr% to the model. Additionally, the MaxEnt R objects for the full and final fitted models are saved into these directories, each with the name "model.rds". These can be read back into R with readRDS().

Value

The final fitted MaxEnt object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Below we modify the example given at ?dismo::maxent:
if (require(dismo) && require(rJava) &&
    file.exists(system.file('java/maxent.jar', package='dismo'))) {
  fnames <- list.files(system.file('ex', package="dismo"), '\\.grd$', 
                       full.names=TRUE)
  fnames <- grep('biome', fnames, value=TRUE, invert=TRUE)
  predictors <- scale(stack(fnames))
  occurrence <- system.file('ex/bradypus.csv', package='dismo')
  occ <- read.table(occurrence, header=TRUE, sep=',')[,-1]
  bg <- xyFromCell(predictors, Which(!is.na(sum(predictors)), cells=TRUE))
  occ_swd <- data.frame(species='bradypus', extract(predictors, occ))
  bg_swd <- data.frame(species='bradypus', extract(predictors, bg))
  m <- simplify(occ_swd, bg_swd, cor_thr=0.7, pct_thr=5, k_thr=4, quiet=FALSE)
}

johnbaums/rmaxent documentation built on July 3, 2020, 5:36 p.m.