Description Usage Arguments Details Value Note Author(s) See Also Examples
BIOMOD_EnsembleModeling
combines models and make ensemble predictions built with BIOMOD_Modeling
. The ensemble predictions can also be evaluated against the original data given to BIOMOD_Modeling
. Biomod2 proposes a range of options to build ensemble models and predictions and to assess the modeling uncertainty. The created ensemble models can then be used to project distributions over space and time as classical biomod2 models.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | BIOMOD_EnsembleModeling( modeling.output,
chosen.models = 'all',
em.by = 'PA_dataset+repet',
eval.metric = 'all',
eval.metric.quality.threshold = NULL,
models.eval.meth = c('KAPPA','TSS','ROC'),
prob.mean = TRUE,
prob.cv = FALSE,
prob.ci = FALSE,
prob.ci.alpha = 0.05,
prob.median = FALSE,
committee.averaging = FALSE,
prob.mean.weight = FALSE,
prob.mean.weight.decay = 'proportional',
VarImport = 0)
|
modeling.output |
a |
chosen.models |
a character vector (either 'all' or a sub-selection of model names) that defines the models kept for building the ensemble models (might be useful for removing some non-preferred models) |
em.by |
Character. Flag defining the way the models will be combined to build the ensemble models. Available values are |
eval.metric |
vector of names of evaluation metric used to build ensemble models. It is involved for formal models exclusion if |
eval.metric.quality.threshold |
If not |
models.eval.meth |
the evaluation methods used to evaluate ensemble models ( see |
prob.mean |
Logical. Estimate the mean probabilities across predictions |
prob.cv |
Logical. Estimate the coefficient of variation across predictions |
prob.ci |
Logical . Estimate the confidence interval around the |
prob.ci.alpha |
Numeric. Significance level for estimating the confidence interval. Default = 0.05 |
prob.median |
Logical. Estimate the mediane of probabilities |
committee.averaging |
Logical. Estimate the committee averaging across predictions |
prob.mean.weight |
Logical. Estimate the weighted sum of probabilities |
prob.mean.weight.decay |
Define the relative importance of the weights. A high value will strongly discriminate the 'good' models from the 'bad' ones (see the details section). If the value of this parameter is set to 'proportional' (default), then the attributed weights are proportional to the evaluation scores given by 'weight.method'( |
VarImport |
Number of permutation to estimate variable importance |
Models sub-selection (chosen.models
)
Useful to exclude some models that have been selected in the previous steps (modeling.output
). This vector of model names can be access applying get_built_models
to your modeling.output
data. It makes easier the selection of models. The default value (i.e. ‘all’) will kept all available models.
Models assembly rules (em.by
)
Please refer to EnsembleModelingAssembly vignette that is dedicated to this parameter.
5 different ways to combine models can be considered. You can make ensemble models considering :
Dataset used for models building (Pseudo Absences dataset and repetitions done): 'PA_dataset+repet'
Dataset used and statistical models : 'PA_dataset+algo'
Pseudo-absences selection dataset : 'PA_dataset'
Statistical models : 'algo'
A total consensus model : 'all'
The value chosen for this parameter will control the number of ensemble models built.
If no evaluation data was given the at BIOMOD_FormatingData
step, some ensemble models evaluation may be a bit unfair because the data that will be used for evaluating ensemble models could differ from those used for evaluate BIOMOD_Modeling
models (in particular, some data used for 'basal models' calibration can be re-used for ensemble models evaluation). You have to keep it in mind ! (EnsembleModelingAssembly vignette for extra details)
Evaluation metrics
eval.metric
The selected metrics here are necessary the ones chosen at the BIOMOD_Modeling
step.
If you select several, ensembles will be built according to each of them.
The chosen metrics will be used at different stages in this function :
to remove ‘bad models’ (having a score lower than eval.metric.quality.threshold
(see bellow))
to make the binary transformation needed for committee averaging computation
to weight the models in the probability weighted mean model
to test (and/or evaluate) your ensemble-models forecasting ability (at this step, each ensemble-model (ensemble will be evaluated according to each evaluation metric)
eval.metric.quality.threshold
You have to give as many threshold as eval.metric
you have selected. If you have selected several evaluation metrics , take care to ensure that the order of thresholds matches the order of eval.metric
.
All models having a score lower than these quality thresholds will not be kept for building ensemble-models.
Ensemble-models algorithms
Mean of probabilities (prob.mean
)
This ensemble-model corresponds to the mean probabilities over the selected models.
Coefficient of variation of Probabilities (prob.cv
)
This ensemble-model corresponds to the coefficient of variation (i.e. sd / mean) of the probabilities over the selected models. This model is not scaled. It will be evaluated like all other ensemble-models although this interpretation is obviously different. CV is a measure of uncertainty rather a measure of probability of occurrence. If the CV gets a high evaluation score it means that the uncertainty is high where the species is observed (which might not be a good feature of the models). The lower is the score, the better are the models. CV is a nice complement to the mean probability.
Confidence interval (prob.ci
& prob.ci.alpha
)
This is the confidence interval around the mean probability (see above). This is also a nice complement to the mean probability.
Two ensemble-models will be built if prob.ci
is TRUE
:
The upper one (there is less than a 100*prob.ci.alpha
/2 % of chance to get probabilities upper than the given ones)
The lower one (there is less than a 100*prob.ci.alpha
/2 % of chance to get probabilities lower the than given ones)
These intervals are calculated with the following function :
I_c = [ \bar{x} - \frac{t_α sd }{ √{n} }; \bar{x} + \frac{t_α sd }{ √{n} }]
Median of probabilities (prob.median
)
This ensemble-model corresponds to the median probability over the selected models. The median is less sensitive to outliers than the mean. In practical terms, calculating the median requires more time and memory than the mean (or even weighting mean) as it asks to load all predictions to then extract the median. It may need to be considered in case of large dataset.
Models committee averaging (committee.averaging
)
To do this model, the probabilities from the selected models are first transformed into binary data according to the thresholds defined at BIOMOD_Modeling
step (maximizing evaluation metric score over ‘testing dataset’ ). The committee averaging score is then the average of binary predictions. It is built on the analogy of a simple vote. Each model vote for the species being ether present or absent. For each site, the sum of 1 is then divided by the number of models. The interesting feature of this measure is that it gives both a prediction and a measure of uncertainty. When the prediction is close to 0 or 1, it means that all models agree to predict 0 and 1 respectively. When the prediction is around 0.5, it means that half the models predict 1 and the other half 0.
Weighted mean of probabilities (prob.mean.weight
& prob.mean.weight.decay
)
This algorithm return the mean weighted (or more precisely this is the weighted sum) by the selected evaluation method scores (better a model is, more importance it has in the ensemble). The scores come from BIOMOD_Modeling
step.
The prob.mean.weight.decay
is the ratio between a weight and the following or prior one. The formula is : W = W(-1) * prob.mean.weight.decay
. For example, with the value of 1.6 and 4 weights wanted, the relative importance of the weights will be 1 /1.6/2.56(=1.6*1.6)/4.096(=2.56*1.6) from the weakest to the strongest, and gives 0.11/0.17/0.275/0.445 considering that the sum of the weights is equal to one. The lower the prob.mean.weight.decay
, the smoother the differences between the weights enhancing a weak discrimination between models.
The value 'proportional' (default) is also possible for the prob.mean.weight.decay
: the weights are awarded for each method proportionally to their evaluation scores. The advantage is that the discrimination is more fair than with the prob.mean.weight.decay
. In the latter case, close scores can strongly diverge in the weights they are awarded, when the proportional method will consider them as being fairly similar in prediction quality and award them a similar weight.
It is also possible to define a function as prob.mean.weight.decay
argument. In this case the given function will be applied to models scores to transforme them into weights that will be used for wheigted mean ensemble model building. For instance if you specified function(x){x^2}
as prob.mean.weight.decay
, the squared of evaluation score of each model will be used to weight formal models predictions.
A "BIOMOD.EnsembleModeling.out"
. This object will be later given to BIOMOD_EnsembleForecasting
if you want to make some projections of this ensemble-models.
You can access to evaluation scores with the get_evaluations
function and to the built models names with the get_built_models
function (see example).
Models are now combined by repetition, other way to combine them (e.g. by Models, all together...) will be available soon
Damien Georges & Wilfried Thuiller with participation of Robin Engler
BIOMOD_Modeling
, BIOMOD_Projection
, BIOMOD_EnsembleForecasting
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | # species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
package="biomod2"), row.names = 1)
head(DataSpecies)
# the name of studied species
myRespName <- 'GuloGulo'
# the presence/absences data for our species
myResp <- as.numeric(DataSpecies[,myRespName])
# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]
# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = raster::stack( system.file( "external/bioclim/current/bio3.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio4.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio7.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio11.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio12.grd",
package="biomod2"))
# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
# 2. Defining Models Options using default options.
myBiomodOption <- BIOMOD_ModelingOptions()
# 3. Doing Modelisation
myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
models = c('SRE','CTA','RF'),
models.options = myBiomodOption,
NbRunEval=1,
DataSplit=80,
Yweights=NULL,
VarImport=3,
models.eval.meth = c('TSS'),
SaveObj = TRUE,
rescal.all.models = FALSE,
do.full.models = FALSE)
# 4. Doing Ensemble Modelling
myBiomodEM <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
chosen.models = 'all',
em.by = 'all',
eval.metric = c('TSS'),
eval.metric.quality.threshold = c(0.7),
models.eval.meth = c('TSS','ROC'),
prob.mean = TRUE,
prob.cv = FALSE,
prob.ci = FALSE,
prob.ci.alpha = 0.05,
prob.median = FALSE,
committee.averaging = FALSE,
prob.mean.weight = TRUE,
prob.mean.weight.decay = 'proportional' )
# print summary
myBiomodEM
# get evaluation scores
get_evaluations(myBiomodEM)
|
Loading required package: sp
Loading required package: raster
Loading required package: parallel
Loading required package: reshape
Loading required package: ggplot2
biomod2 3.3-7 loaded.
Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
X_WGS84 Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca PteropusGiganteus
1 -94.5 82.00001 0 0 0 0
2 -91.5 82.00001 0 1 0 0
3 -88.5 82.00001 0 1 0 0
4 -85.5 82.00001 0 1 0 0
5 -82.5 82.00001 0 1 0 0
6 -79.5 82.00001 0 1 0 0
TenrecEcaudatus VulpesVulpes
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
-=-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=
> No pseudo absences selection !
! No data has been set aside for modeling evaluationNOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Loading required library...
Checking Models arguments...
Creating suitable Workdir...
> No weights : all observations will have the same weight
-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=
5 environmental variables ( bio3 bio4 bio7 bio11 bio12 )
Number of evaluation repetitions : 1
Models selected : SRE CTA RF
Total number of model runs : 3
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=- Run : GuloGulo_AllData
-=-=-=--=-=-=- GuloGulo_AllData_RUN1
Model=Surface Range Envelop
Evaluating Model stuff...
Evaluating Predictor Contributions...
Model=Classification tree
5 Fold Cross-Validation
Evaluating Model stuff...
Evaluating Predictor Contributions...
Model=Breiman and Cutler's random forests for classification and regression
Evaluating Model stuff...
Evaluating Predictor Contributions...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=
! all models available will be included in ensemble.modeling
> Evaluation & Weighting methods summary :
TSS over 0.7
> mergedAlgo_mergedRun_mergedData ensemble modeling
! Models projections for whole zonation required...
> Projecting GuloGulo_AllData_RUN1_SRE ...
> Projecting GuloGulo_AllData_RUN1_CTA ...
> Projecting GuloGulo_AllData_RUN1_RF ...
> Mean of probabilities...
Evaluating Model stuff...
> Prababilities wegthing mean...
original models scores = 0.761 0.896 0.913
final models weights = 0.296 0.349 0.355
Evaluating Model stuff...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.EnsembleModeling.out' -=-=-=-=-=-=-=-=-=-=-=-=
sp.name : GuloGulo
expl.var.names : bio3 bio4 bio7 bio11 bio12
models computed:
GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData, GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
$GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
TSS 0.940 472 96.974 97.154
ROC 0.993 471 96.974 97.154
$GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
TSS 0.947 442.0 97.579 97.044
ROC 0.993 440.5 97.731 97.044
Warning message:
system call failed: Cannot allocate memory
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.