enmSdm: Tools for Modeling Niches and Distributions of Species

summaryByCrossValid

R Documentation

Summarize distribution/niche model cross-validation object

Description

This function summarizes models calibrated using the trainByCrossValid function. It returns aspects of the best models across k-folds (the particular aspects depends on the kind of models used).

Usage

summaryByCrossValid(
  x,
  trainFxName = "trainGlm",
  metric = "cbiTest",
  decreasing = TRUE
)

Arguments

`x`	An object of class `crossValid` (which is also a list). Note that the object must include a sublist named `tuning`.
`trainFxName`	Character, name of function used to train the SDM (examples: `'trainGlm'`, `'trainMaxEnt'`, `'trainBrt'`)
`metric`	Metric by which to select the best model in each k-fold. This can be any of the columns that appear in the data frames in `x$tuning` (or any columns added manually), but typically is one of the following plus either `Train`, `Test`, or `Delta` (e.g., `'logLossTrain'`, `'logLossTest'`, or `'logLossDelta'`): `'logLoss'`: Log loss. `'cbi'`: Continuous Boyce Index (CBI). Calculated with `contBoyce`. `'auc'`: Area under the receiver-operator characteristic curve (AUC). Calculated with `aucWeighted`. `'tss'`: Maximum value of the True Skill Statistic. Calculated with `tssWeighted`. `'msss'`: Sensitivity and specificity calculated at the threshold that maximizes sensitivity (true presence prediction rate) plus specificity (true absence prediction rate). `'mdss'`: Sensitivity (se) and specificity (sp) calculated at the threshold that minimizes the difference between sensitivity and specificity. `'minTrainPres'`: Sensitivity and specificity at the greatest threshold at which all training presences are classified as "present". `'trainSe95'` and/or `'trainSe90'`: Sensitivity at the threshold that ensures either 95
`decreasing`	Logical, if `TRUE` (default), for each k-fold sort models by the value listed in `metric` in decreasing order (highest connotes "best", lowest "worst"). If `FALSE` use the lowest value of `metric`.

Value

Data frame with statistics on the best set of models across k-folds. Depending on the model algorithm, this could be:

BRTs (boosted regression trees): Learning rate, tree complexity, and bag fraction.
GLMs (generalized linear models): Frequency of use of each term in the best models.
Maxent: Frequency of times each specific combination of feature classes was used in the best models plus mean master regularization multiplier for each feature set.
NSs (natural splines): Data frame, one row per fold and one column per predictor, with values representing the maximum degrees of freedom used for each variable in the best model of each fold.

Examples

## Not run: 
set.seed(123)
### contrived example
# generate training/testing data
n <- 10000
x1 <- seq(-1, 1, length.out=n) + rnorm(n)
x2 <- seq(10, 0, length.out=n) + rnorm(n)
x3 <- rnorm(n)
y <- 2 * x1 + x1^2 - 10 * x2 - x1 * x2
y <- statisfactory::probitAdj(y, 0)
y <- y^3
presAbs <- runif(n) < y
data <- data.frame(presAbs=presAbs, x1=x1, x2=x2, x3=x3)

model <- trainGlm(data)
summary(model)

folds <- dismo::kfold(data, 3)
out <- trainByCrossValid(data, folds=folds, verbose=1)

summaryByCrossValid(out)

str(out, 1)
head(out$tuning[[1]])
head(out$tuning[[2]])
head(out$tuning[[3]])

# can do following for each fold (3 of them)
lapply(out$models[[1]], coefficients)
sapply(out$models[[1]], logLik)
sapply(out$models[[1]], AIC)

# select model for k = 1 with greatest CBI
top <- which.max(out$tuning[[1]]$cbiTest)
summary(out$models[[1]][[top]])

# in fold k = 1, which models perform well but aren not overfit?
plot(out$tuning[[1]]$cbiTrain, out$tuning[[1]]$cbiTest, pch='.',
		main='Model Numbers for k = 1')
abline(0, 1, col='red')
numModels <- nrow(out$tuning[[1]])
text(out$tuning[[1]]$cbiTrain, out$tuning[[1]]$cbiTest, labels=1:numModels)
usr <- par('usr')
x <- usr[1] + 0.9 * (usr[4] - usr[3])
y <- usr[3] + 0.1 * (usr[4] - usr[3])
text(x, y, labels='overfit', col='red', xpd=NA)
x <- usr[1] + 0.1 * (usr[4] - usr[3])
y <- usr[3] + 0.9 * (usr[4] - usr[3])
text(x, y, labels='suspicious', col='red', xpd=NA)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.