View source: R/trainByCrossValid.r
trainByCrossValid | R Documentation |
This function is an extension of any of the trainXYZ
functions for calibrating species distribution and ecological niche models. This function uses the trainXYZ
function to calibrate and evaluate a suite of models using cross-validation. The models are evaluated against withheld data to determine the optimal settings for a "final" model using all available data. The function returns a set of models and/or a table with statistics on each model. The statistics represent various measures of model accuracy, and are calculated against training and test sites (separately).
trainByCrossValid(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
folds = predicts::folds(data),
trainFx = enmSdmX::trainGLM,
...,
weightEvalTrain = TRUE,
weightEvalTest = TRUE,
na.rm = FALSE,
outputModels = TRUE,
verbose = 0
)
data |
Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
folds |
Either a numeric vector, or matrix or data frame which specify which rows in
|
trainFx |
Function, name of the |
... |
Arguments to pass to the "trainXYZ" function. |
weightEvalTrain |
Logical, if |
weightEvalTest |
Logical, if |
na.rm |
Logical, if |
outputModels |
If |
verbose |
Numeric. If 0 show no progress updates. If > 0 then show minimal progress updates for this function only. If > 1 show detailed progress for this function. If > 2 show detailed progress plus detailed progress for the |
In some cases models do not converge (e.g., boosted regression trees and generalized additive models sometimes suffer from this issue). In this case the model will be skipped, but a data frame with the k-fold and model number in the fold will be returned in the $meta element in the output. If no models converged, then this data frame will be empty.
A list object with several named elements:
meta
: Meta-data on the model call.
folds
: The folds
object.
models
(if outputModels
is TRUE
): A list of model objects, one per data fold.
tuning
: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of:
'logLoss'
: Log loss. Higher (less negative) values imply better fit.
'cbi'
: Continuous Boyce Index (CBI). Calculated with evalContBoyce
.
'auc'
: Area under the receiver-operator characteristic curve (AUC). Calculated with evalAUC
.
'tss'
: Maximum value of the True Skill Statistic. Calculated with evalTSS
.
'msss'
: Sensitivity and specificity calculated at the threshold that maximizes sensitivity (true presence prediction rate) plus specificity (true absence prediction rate).
'mdss'
: Sensitivity (se) and specificity (sp) calculated at the threshold that minimizes the difference between sensitivity and specificity.
'minTrainPres'
: Sensitivity (se) and specificity (sp) at the greatest threshold at which all training presences are classified as "present".
'trainSe95'
and/or 'trainSe90'
: Sensitivity (se) and specificity (sp) at the threshold that ensures either 95 or 90 percent of all training presences are classified as "present" (training sensitivity = 0.95 or 0.9).
Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1017/S0376892997000088")} La Rest, K., Pinaud, D., Monestiez, P., Chadoeuf, J., and Bretagnolle, V. 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation. Global Ecology and Biogeography 23:811-820. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1111/geb.12161")} Radosavljevic, A. and Anderson, R.P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography 41:629-643. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/jbi.12227")}
summaryByCrossValid
, trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainMaxNet
, trainNS
, trainRF
# The example below show a very basic modeling workflow. It has been
# designed to work fast, not produce accurate, defensible models.
# The general idea is to calibrate a series of models and evaluate them
# against a withheld set of data. One can then use the series of models
# of the top models to better select a "final" model.
## Not run:
# Running the entire set of commands can take a few minutes. This can
# be sped up by increasing the number of cores used. The examples below use
# one core, but you can change that argument according to your machine's
# capabilities.
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create background sites (using just 1000 to speed things up!)
bgEnv <- terra::spatSample(madClim, 3000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
# using "vector" form of "folds" argument
folds <- predicts::kfold(env, 3) # just 3 folds (for speed)
### calibrate models
####################
cores <- 1 # increase this to go faster, if your computer handles it
## MaxEnt
mxx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxEnt,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mxx)
## MaxNet
mnx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxNet,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mnx)
## generalized linear models
glx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGLM,
verbose = 1,
cores = cores
)
# summarize GLM terms in best models
summaryByCrossValid(glx)
## generalized additive models
gax <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGAM,
verbose = 1,
cores = cores
)
# summarize GAM terms in best models
summaryByCrossValid(gax)
## natural splines
nsx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainNS,
df = 1:2,
verbose = 1,
cores = cores
)
# summarize NS terms in best models
summaryByCrossValid(nsx)
## boosted regression trees
brtx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainBRT,
learningRate = c(0.001, 0.0001), # too few values for reliable model(?)
treeComplexity = c(2, 4), # too few values for reliable model, but fast
minTrees = 1000,
maxTrees = 1500, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = 1,
cores = cores
)
# summarize BRT parameters across best models
summaryByCrossValid(brtx)
## random forests
rfx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainRF,
verbose = 1,
cores = cores
)
# summarize RF parameters in best models
summaryByCrossValid(rfx)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.