BIOMOD_EnsembleModeling: Create and evaluate an ensemble set of models and predictions
In biomod2: Ensemble Platform for Species Distribution Modeling

Description Usage Arguments Details Value Note Author(s) See Also Examples

BIOMOD_EnsembleModeling combines models and make ensemble predictions built with BIOMOD_Modeling. The ensemble predictions can also be evaluated against the original data given to BIOMOD_Modeling. Biomod2 proposes a range of options to build ensemble models and predictions and to assess the modeling uncertainty. The created ensemble models can then be used to project distributions over space and time as classical biomod2 models.

BIOMOD_EnsembleModeling( modeling.output,
                         chosen.models = 'all',
                         em.by = 'PA_dataset+repet',
                         eval.metric = 'all',
                         eval.metric.quality.threshold = NULL,
                         models.eval.meth = c('KAPPA','TSS','ROC'),
                         prob.mean = TRUE,
                         prob.cv = FALSE,
                         prob.ci = FALSE,
                         prob.ci.alpha = 0.05,
                         prob.median = FALSE,
                         committee.averaging = FALSE,
                         prob.mean.weight = FALSE,
                         prob.mean.weight.decay = 'proportional',
                         VarImport = 0)

`modeling.output`	a `"BIOMOD.models.out"` returned by `BIOMOD_Modeling`
`chosen.models`	a character vector (either 'all' or a sub-selection of model names) that defines the models kept for building the ensemble models (might be useful for removing some non-preferred models)
`em.by`	Character. Flag defining the way the models will be combined to build the ensemble models. Available values are `'PA_dataset+repet'` (default), `'PA_dataset+algo'`, `'PA_dataset'`, `'algo'` and `'all'`
`eval.metric`	vector of names of evaluation metric used to build ensemble models. It is involved for formal models exclusion if `eval.metric.quality.threshold` is defined and/or for building ensemble models that are dependent of formal models evaluation scores (e.g. weighted mean and comitee averaging). If 'all', the same evaluation metrics than those of `modeling.output` will be automatically selected
`eval.metric.quality.threshold`	If not `NULL`, the minimum scores below which models will be excluded of the ensemble-models building.
`models.eval.meth`	the evaluation methods used to evaluate ensemble models ( see `"BIOMOD_Modeling"` models.eval.meth section for more detailed informations )
`prob.mean`	Logical. Estimate the mean probabilities across predictions
`prob.cv`	Logical. Estimate the coefficient of variation across predictions
`prob.ci`	Logical . Estimate the confidence interval around the `prob.mean`
`prob.ci.alpha`	Numeric. Significance level for estimating the confidence interval. Default = 0.05
`prob.median`	Logical. Estimate the mediane of probabilities
`committee.averaging`	Logical. Estimate the committee averaging across predictions
`prob.mean.weight`	Logical. Estimate the weighted sum of probabilities
`prob.mean.weight.decay`	Define the relative importance of the weights. A high value will strongly discriminate the 'good' models from the 'bad' ones (see the details section). If the value of this parameter is set to 'proportional' (default), then the attributed weights are proportional to the evaluation scores given by 'weight.method'(`eval.metric`)
`VarImport`	Number of permutation to estimate variable importance

Models sub-selection (chosen.models)

Useful to exclude some models that have been selected in the previous steps (modeling.output). This vector of model names can be access applying get_built_models to your modeling.output data. It makes easier the selection of models. The default value (i.e. ‘all’) will kept all available models.
Models assembly rules (em.by)

Please refer to EnsembleModelingAssembly vignette that is dedicated to this parameter.

5 different ways to combine models can be considered. You can make ensemble models considering :
- Dataset used for models building (Pseudo Absences dataset and repetitions done): 'PA_dataset+repet'
- Dataset used and statistical models : 'PA_dataset+algo'
- Pseudo-absences selection dataset : 'PA_dataset'
- Statistical models : 'algo'
- A total consensus model : 'all'
The value chosen for this parameter will control the number of ensemble models built. If no evaluation data was given the at BIOMOD_FormatingData step, some ensemble models evaluation may be a bit unfair because the data that will be used for evaluating ensemble models could differ from those used for evaluate BIOMOD_Modeling models (in particular, some data used for 'basal models' calibration can be re-used for ensemble models evaluation). You have to keep it in mind ! (EnsembleModelingAssembly vignette for extra details)
Evaluation metrics
- eval.metric
  
  The selected metrics here are necessary the ones chosen at the BIOMOD_Modeling step. If you select several, ensembles will be built according to each of them. The chosen metrics will be used at different stages in this function :
  1. to remove ‘bad models’ (having a score lower than eval.metric.quality.threshold (see bellow))
  2. to make the binary transformation needed for committee averaging computation
  3. to weight the models in the probability weighted mean model
  4. to test (and/or evaluate) your ensemble-models forecasting ability (at this step, each ensemble-model (ensemble will be evaluated according to each evaluation metric)
- eval.metric.quality.threshold
  
  You have to give as many threshold as eval.metric you have selected. If you have selected several evaluation metrics , take care to ensure that the order of thresholds matches the order of eval.metric. All models having a score lower than these quality thresholds will not be kept for building ensemble-models.
Ensemble-models algorithms
1. Mean of probabilities (prob.mean)
  
  This ensemble-model corresponds to the mean probabilities over the selected models.
2. Coefficient of variation of Probabilities (prob.cv)
  
  This ensemble-model corresponds to the coefficient of variation (i.e. sd / mean) of the probabilities over the selected models. This model is not scaled. It will be evaluated like all other ensemble-models although this interpretation is obviously different. CV is a measure of uncertainty rather a measure of probability of occurrence. If the CV gets a high evaluation score it means that the uncertainty is high where the species is observed (which might not be a good feature of the models). The lower is the score, the better are the models. CV is a nice complement to the mean probability.
3. Confidence interval (prob.ci & prob.ci.alpha) This is the confidence interval around the mean probability (see above). This is also a nice complement to the mean probability. Two ensemble-models will be built if prob.ci is TRUE :
  - The upper one (there is less than a 100*prob.ci.alpha/2 % of chance to get probabilities upper than the given ones)
  - The lower one (there is less than a 100*prob.ci.alpha/2 % of chance to get probabilities lower the than given ones)
  These intervals are calculated with the following function :
  
  I_c = [ \bar{x} - \frac{t_α sd }{ √{n} }; \bar{x} + \frac{t_α sd }{ √{n} }]
4. Median of probabilities (prob.median)
  
  This ensemble-model corresponds to the median probability over the selected models. The median is less sensitive to outliers than the mean. In practical terms, calculating the median requires more time and memory than the mean (or even weighting mean) as it asks to load all predictions to then extract the median. It may need to be considered in case of large dataset.
5. Models committee averaging (committee.averaging)
  
  To do this model, the probabilities from the selected models are first transformed into binary data according to the thresholds defined at BIOMOD_Modeling step (maximizing evaluation metric score over ‘testing dataset’ ). The committee averaging score is then the average of binary predictions. It is built on the analogy of a simple vote. Each model vote for the species being ether present or absent. For each site, the sum of 1 is then divided by the number of models. The interesting feature of this measure is that it gives both a prediction and a measure of uncertainty. When the prediction is close to 0 or 1, it means that all models agree to predict 0 and 1 respectively. When the prediction is around 0.5, it means that half the models predict 1 and the other half 0.
6. Weighted mean of probabilities (prob.mean.weight & prob.mean.weight.decay)
  
  This algorithm return the mean weighted (or more precisely this is the weighted sum) by the selected evaluation method scores (better a model is, more importance it has in the ensemble). The scores come from BIOMOD_Modeling step.
  
  The prob.mean.weight.decay is the ratio between a weight and the following or prior one. The formula is : W = W(-1) * prob.mean.weight.decay. For example, with the value of 1.6 and 4 weights wanted, the relative importance of the weights will be 1 /1.6/2.56(=1.6*1.6)/4.096(=2.56*1.6) from the weakest to the strongest, and gives 0.11/0.17/0.275/0.445 considering that the sum of the weights is equal to one. The lower the prob.mean.weight.decay, the smoother the differences between the weights enhancing a weak discrimination between models.
  
  The value 'proportional' (default) is also possible for the prob.mean.weight.decay: the weights are awarded for each method proportionally to their evaluation scores. The advantage is that the discrimination is more fair than with the prob.mean.weight.decay. In the latter case, close scores can strongly diverge in the weights they are awarded, when the proportional method will consider them as being fairly similar in prediction quality and award them a similar weight. It is also possible to define a function as prob.mean.weight.decay argument. In this case the given function will be applied to models scores to transforme them into weights that will be used for wheigted mean ensemble model building. For instance if you specified function(x){x^2} as prob.mean.weight.decay, the squared of evaluation score of each model will be used to weight formal models predictions.

A "BIOMOD.EnsembleModeling.out". This object will be later given to BIOMOD_EnsembleForecasting if you want to make some projections of this ensemble-models.

You can access to evaluation scores with the get_evaluations function and to the built models names with the get_built_models function (see example).

Models are now combined by repetition, other way to combine them (e.g. by Models, all together...) will be available soon

Damien Georges & Wilfried Thuiller with participation of Robin Engler

BIOMOD_Modeling, BIOMOD_Projection, BIOMOD_EnsembleForecasting

# species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
                                    package="biomod2"), row.names = 1)
head(DataSpecies)

# the name of studied species
myRespName <- 'GuloGulo'

# the presence/absences data for our species 
myResp <- as.numeric(DataSpecies[,myRespName])

# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]


# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = raster::stack( system.file( "external/bioclim/current/bio3.grd", 
                     package="biomod2"),
                system.file( "external/bioclim/current/bio4.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio7.grd", 
                             package="biomod2"),  
                system.file( "external/bioclim/current/bio11.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio12.grd", 
                             package="biomod2"))

# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)
       
# 2. Defining Models Options using default options.
myBiomodOption <- BIOMOD_ModelingOptions()

# 3. Doing Modelisation

myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, 
                                       models = c('SRE','CTA','RF'), 
                                       models.options = myBiomodOption, 
                                       NbRunEval=1, 
                                       DataSplit=80, 
                                       Yweights=NULL, 
                                       VarImport=3, 
                                       models.eval.meth = c('TSS'),
                                       SaveObj = TRUE,
                                       rescal.all.models = FALSE,
                                       do.full.models = FALSE)
                                       
# 4. Doing Ensemble Modelling
myBiomodEM <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
                           chosen.models = 'all',
                           em.by = 'all',
                           eval.metric = c('TSS'),
                           eval.metric.quality.threshold = c(0.7),
                           models.eval.meth = c('TSS','ROC'),
                           prob.mean = TRUE,
                           prob.cv = FALSE,
                           prob.ci = FALSE,
                           prob.ci.alpha = 0.05,
                           prob.median = FALSE,
                           committee.averaging = FALSE,
                           prob.mean.weight = TRUE,
                           prob.mean.weight.decay = 'proportional' )   
                                       
# print summary
myBiomodEM

# get evaluation scores
get_evaluations(myBiomodEM)

Loading required package: sp
Loading required package: raster
Loading required package: parallel
Loading required package: reshape
Loading required package: ggplot2
biomod2 3.3-7 loaded.

Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
  X_WGS84  Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca PteropusGiganteus
1   -94.5 82.00001                0        0            0                 0
2   -91.5 82.00001                0        1            0                 0
3   -88.5 82.00001                0        1            0                 0
4   -85.5 82.00001                0        1            0                 0
5   -82.5 82.00001                0        1            0                 0
6   -79.5 82.00001                0        1            0                 0
  TenrecEcaudatus VulpesVulpes
1               0            0
2               0            0
3               0            0
4               0            0
5               0            0
6               0            0
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=

> No pseudo absences selection !
      ! No data has been set aside for modeling evaluationNOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Loading required library...

Checking Models arguments...

Creating suitable Workdir...

	> No weights : all observations will have the same weight


-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=

 5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
Number of evaluation repetitions : 1
Models selected : SRE CTA RF 

Total number of model runs : 3 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


-=-=-=- Run :  GuloGulo_AllData 


-=-=-=--=-=-=- GuloGulo_AllData_RUN1 

Model=Surface Range Envelop
	Evaluating Model stuff...
	Evaluating Predictor Contributions... 

Model=Classification tree 
	 5 Fold Cross-Validation
	Evaluating Model stuff...
	Evaluating Predictor Contributions... 

Model=Breiman and Cutler's random forests for classification and regression
	Evaluating Model stuff...
	Evaluating Predictor Contributions... 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=

   ! all models available will be included in ensemble.modeling
   > Evaluation & Weighting methods summary :
      TSS over 0.7


  > mergedAlgo_mergedRun_mergedData ensemble modeling
   ! Models projections for whole zonation required...
	> Projecting GuloGulo_AllData_RUN1_SRE ...
	> Projecting GuloGulo_AllData_RUN1_CTA ...
	> Projecting GuloGulo_AllData_RUN1_RF ...

   > Mean of probabilities...
			Evaluating Model stuff...
   > Prababilities wegthing mean...
		  original models scores =  0.761 0.896 0.913
		  final models weights =  0.296 0.349 0.355
			Evaluating Model stuff...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.EnsembleModeling.out' -=-=-=-=-=-=-=-=-=-=-=-=

sp.name : GuloGulo

expl.var.names : bio3 bio4 bio7 bio11 bio12


models computed: 
GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData, GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
$GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData
    Testing.data Cutoff Sensitivity Specificity
TSS        0.940    472      96.974      97.154
ROC        0.993    471      96.974      97.154

$GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
    Testing.data Cutoff Sensitivity Specificity
TSS        0.947  442.0      97.579      97.044
ROC        0.993  440.5      97.731      97.044

Warning message:
system call failed: Cannot allocate memory