View source: R/BIOMOD_EnsembleModeling.R
BIOMOD_EnsembleModeling | R Documentation |
This function allows to combine a range of models built with the
BIOMOD_Modeling
function in one (or several) ensemble model. Modeling
uncertainty can be assessed as well as variables importance, ensemble predictions can be
evaluated against original data, and created ensemble models can be projected over new
conditions (see Details).
BIOMOD_EnsembleModeling(
bm.mod,
models.chosen = "all",
em.by = "PA+run",
em.algo,
metric.select = "all",
metric.select.thresh = NULL,
metric.select.table = NULL,
metric.select.dataset = NULL,
metric.eval = c("KAPPA", "TSS", "ROC"),
var.import = 0,
EMci.alpha = 0.05,
EMwmean.decay = "proportional",
nb.cpu = 1,
seed.val = NULL,
do.progress = TRUE
)
bm.mod |
a |
models.chosen |
a |
em.by |
a |
em.algo |
a |
metric.select |
a |
metric.select.thresh |
(optional, default |
metric.select.table |
(optional, default |
metric.select.dataset |
(optional, default |
metric.eval |
a |
var.import |
(optional, default |
EMci.alpha |
(optional, default |
EMwmean.decay |
(optional, default |
nb.cpu |
(optional, default |
seed.val |
(optional, default |
do.progress |
(optional, default |
Concerning models sub-selection (models.chosen
) :
Applying get_built_models
function to the bm.mod
object gives the names
of the single models created with the BIOMOD_Modeling
function. The
models.chosen
argument can take either a sub-selection of these single model names, or
the all
default value, to decide which single models will be used for the ensemble
model building.
Concerning models assembly rules (em.by
) :
Single models built with the BIOMOD_Modeling
function can be combined in 5
different ways to obtain ensemble models :
each combination of pseudo-absence and repetition datasets is done, merging algorithms together
each combination of pseudo-absence and algorithm datasets is done, merging repetitions together
pseudo-absence datasets are considered individually, merging algorithms and repetitions together
algorithm datasets are considered individually, merging pseudo-absence and repetitions together
all single models are combined into one
Hence, depending on the chosen method, the number of ensemble models built will vary.
If no evaluation data was given to the BIOMOD_FormatingData
function,
some ensemble model evaluations may be biased due to difference in data used for single
model evaluations.
Be aware that all of these combinations are allowed, but some may not make sense
depending mainly on how pseudo-absence datasets have been built and whether all of them
have been used for all single models or not (see PA.nb.absences
and models.pa
parameters in BIOMOD_FormatingData
and BIOMOD_Modeling
functions
respectively).
Concerning evaluation metrics :
metric(s) must be chosen among the ones used within the
BIOMOD_Modeling
function to build the bm.mod
object, unless
metric.select = 'user.defined'
and therefore values will be provided through the
metric.select.table
parameter.
Each selected metric will be used at different steps of the ensemble modeling function to :
remove low quality single models having a score lower than
metric.select.thresh
perform the binary transformation if em.algo = 'EMca'
weight models if em.algo = 'EMwmean'
Note that metrics are not combined together, and one ensemble model is built for each metric provided.
if metric.select = 'user.defined'
, this parameter allows
to use evaluation metrics other than those calculated within biomod2. It must be a
data.frame
containing as many columns as models.chosen
with matching names,
and as many rows as evaluation metrics to be used. The number of rows must match the length
of metric.select.thresh
, and values will be compared to those defined in
metric.select.thresh
to remove low quality single models from the ensemble
model building.
by default, validation datasets will be used, unless no
validation is available (no cross-validation) in which case calibration datasets
will be used
Concerning ensemble algorithms :
6 modeling techniques are currently available :
median of probabilities over the selected models
Less sensitive to outliers than the mean
mean of probabilities over the selected models
weighted mean of probabilities over the selected models
Probabilities are weighted according to their model evaluation scores obtained when
building the bm.out
object with the BIOMOD_Modeling
function (better a
model is, more importance it has in the ensemble) and summed.
The EMwmean.decay
is the ratio between a weight and the next or previous one.
The formula is : W = W(-1) * EMwmean.decay
.
For example, with the value of 1.6
and 4
weights wanted, the relative
importance of the weights will be 1 / 1.6 / 2.56 (=1.6*1.6) / 4.096 (=2.56*1.6)
from
the weakest to the strongest, and gives 0.11 / 0.17 / 0.275 / 0.445
considering that
the sum of the weights is equal to one. The lower the EMwmean.decay
, the smoother
the differences between the weights enhancing a weak discrimination between models.
If EMwmean.decay = 'proportional'
, the weights are assigned to each model
proportionally to their evaluation scores. The discrimination is fairer than using the
decay method where close scores can have strongly diverging weights, while the
proportional method would assign them similar weights.
It is also possible to define the EMwmean.decay
parameter as a function that will be
applied to single models scores and transform them into weights.
For example, if EMwmean.decay = function(x) {x^2}
, the squared of evaluation
score of each model will be used to weight the models predictions.
committee averaging over the selected models
Probabilities are first transformed into binary data according to the threshold defined
when building the bm.out
object with the BIOMOD_Modeling
function
(maximizing the evaluation metric score over the calibration dataset). The committee
averaging score is obtained by taking the average of these binary predictions.
It is built on the analogy of a simple vote :
each single model votes for the species being either present (1
) or absent
(0
)
the sum of 1
is then divided by the number of single models voting
The interesting feature of this measure is that it gives both a prediction and a measure of
uncertainty. When the prediction is close to 0
or 1
, it means that all models
agree to predict 0
or 1
respectively. When the prediction is around
0.5
, it means that half the models predict 1
and the other half 0
.
Note that this is for binary data only.
confidence interval around the mean of probabilities of the selected models
It creates 2 ensemble models :
LOWER : there is less than 100 * EMci.alpha / 2
% of chance to get
probabilities lower than the given ones
UPPER : there is less than 100 * EMci.alpha / 2
% of chance to get
probabilities upper than the given ones
These intervals are calculated with the following function :
I_c = [ \bar{x} - \frac{t_\alpha sd }{ \sqrt{n} };
\bar{x} + \frac{t_\alpha sd }{ \sqrt{n} }]
coefficient of variation (sd / mean
) of probabilities over the selected
models
This is the only ensemble model that might not be over the same scale than the others, as CV is a measure of uncertainty rather a measure of probability of occurrence. It will be evaluated like all other ensemble models although its interpretation will be obviously different. If the CV gets a high evaluation score, it means that the uncertainty is high where the species is observed (which might not be a good feature of the model). The lower is the score, the better are the models.
A BIOMOD.ensemble.models.out
object containing models outputs, or links to saved
outputs.
Models outputs are stored out of R (for memory storage reasons) in 2 different
folders created in the current working directory :
a models folder, named after the resp.name
argument of
BIOMOD_FormatingData
, and containing all ensemble models
a hidden folder, named .BIOMOD_DATA
, and containing outputs related
files (original dataset, calibration lines, pseudo-absences selected, predictions,
variables importance, evaluation values...), that can be retrieved with
get_[...]
or load
functions, and used by other biomod2 functions, like
BIOMOD_EnsembleForecasting
Wilfried Thuiller, Damien Georges, Robin Engler
BIOMOD_FormatingData
, bm_ModelingOptions
,
bm_CrossValidation
, bm_VariablesImportance
,
BIOMOD_Modeling
, BIOMOD_EnsembleForecasting
,
bm_PlotEvalMean
, bm_PlotEvalBoxplot
,
bm_PlotVarImpBoxplot
, bm_PlotResponseCurves
Other Main functions:
BIOMOD_EnsembleForecasting()
,
BIOMOD_FormatingData()
,
BIOMOD_LoadModels()
,
BIOMOD_Modeling()
,
BIOMOD_Projection()
,
BIOMOD_RangeSize()
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
## ----------------------------------------------------------------------- #
file.out <- paste0(myRespName, "/", myRespName, ".AllModels.models.out")
if (file.exists(file.out)) {
myBiomodModelOut <- get(load(file.out))
} else {
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName,
resp.var = myResp,
resp.xy = myRespXY,
expl.var = myExpl)
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'AllModels',
models = c('RF', 'GLM'),
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
OPT.strategy = 'bigboss',
metric.eval = c('TSS', 'ROC'),
var.import = 3,
seed.val = 42)
}
## ----------------------------------------------------------------------- #
# Model ensemble models
myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
models.chosen = 'all',
em.by = 'all',
em.algo = c('EMmean', 'EMca'),
metric.select = c('TSS'),
metric.select.thresh = c(0.7),
metric.eval = c('TSS', 'ROC'),
var.import = 3,
seed.val = 42)
myBiomodEM
# Get evaluation scores & variables importance
get_evaluations(myBiomodEM)
get_variables_importance(myBiomodEM)
# Represent evaluation scores
bm_PlotEvalMean(bm.out = myBiomodEM, dataset = 'calibration')
bm_PlotEvalBoxplot(bm.out = myBiomodEM, group.by = c('algo', 'algo'))
# # Represent variables importance
# bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'algo', 'algo'))
# bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'algo', 'merged.by.PA'))
# bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('algo', 'expl.var', 'merged.by.PA'))
# # Represent response curves
# bm_PlotResponseCurves(bm.out = myBiomodEM,
# models.chosen = get_built_models(myBiomodEM),
# fixed.var = 'median')
# bm_PlotResponseCurves(bm.out = myBiomodEM,
# models.chosen = get_built_models(myBiomodEM),
# fixed.var = 'min')
# bm_PlotResponseCurves(bm.out = myBiomodEM,
# models.chosen = get_built_models(myBiomodEM, algo = 'EMmean'),
# fixed.var = 'median',
# do.bivariate = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.