# BIOMOD_EnsembleModeling: Create and evaluate an ensemble set of models and predictions In biomod2: Ensemble Platform for Species Distribution Modeling

## Description

BIOMOD_EnsembleModeling combines models and make ensemble predictions built with BIOMOD_Modeling. The ensemble predictions can also be evaluated against the original data given to BIOMOD_Modeling. Biomod2 proposes a range of options to build ensemble models and predictions and to assess the modeling uncertainty. The created ensemble models can then be used to project distributions over space and time as classical biomod2 models.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 BIOMOD_EnsembleModeling( modeling.output, chosen.models = 'all', em.by = 'PA_dataset+repet', eval.metric = 'all', eval.metric.quality.threshold = NULL, models.eval.meth = c('KAPPA','TSS','ROC'), prob.mean = TRUE, prob.cv = FALSE, prob.ci = FALSE, prob.ci.alpha = 0.05, prob.median = FALSE, committee.averaging = FALSE, prob.mean.weight = FALSE, prob.mean.weight.decay = 'proportional', VarImport = 0) 

## Arguments

 modeling.output a "BIOMOD.models.out" returned by BIOMOD_Modeling chosen.models a character vector (either 'all' or a sub-selection of model names) that defines the models kept for building the ensemble models (might be useful for removing some non-preferred models) em.by Character. Flag defining the way the models will be combined to build the ensemble models. Available values are 'PA_dataset+repet' (default), 'PA_dataset+algo', 'PA_dataset', 'algo' and 'all' eval.metric vector of names of evaluation metric used to build ensemble models. It is involved for formal models exclusion if eval.metric.quality.threshold is defined and/or for building ensemble models that are dependent of formal models evaluation scores (e.g. weighted mean and comitee averaging). If 'all', the same evaluation metrics than those of modeling.output will be automatically selected eval.metric.quality.threshold If not NULL, the minimum scores below which models will be excluded of the ensemble-models building. models.eval.meth the evaluation methods used to evaluate ensemble models ( see "BIOMOD_Modeling" models.eval.meth section for more detailed informations ) prob.mean Logical. Estimate the mean probabilities across predictions prob.cv Logical. Estimate the coefficient of variation across predictions prob.ci Logical . Estimate the confidence interval around the prob.mean prob.ci.alpha Numeric. Significance level for estimating the confidence interval. Default = 0.05 prob.median Logical. Estimate the mediane of probabilities committee.averaging Logical. Estimate the committee averaging across predictions prob.mean.weight Logical. Estimate the weighted sum of probabilities prob.mean.weight.decay Define the relative importance of the weights. A high value will strongly discriminate the 'good' models from the 'bad' ones (see the details section). If the value of this parameter is set to 'proportional' (default), then the attributed weights are proportional to the evaluation scores given by 'weight.method'(eval.metric) VarImport Number of permutation to estimate variable importance

## Details

1. Models sub-selection (chosen.models)

Useful to exclude some models that have been selected in the previous steps (modeling.output). This vector of model names can be access applying get_built_models to your modeling.output data. It makes easier the selection of models. The default value (i.e. ‘all’) will kept all available models.

2. Models assembly rules (em.by)

Please refer to EnsembleModelingAssembly vignette that is dedicated to this parameter.

5 different ways to combine models can be considered. You can make ensemble models considering :

• Dataset used for models building (Pseudo Absences dataset and repetitions done): 'PA_dataset+repet'

• Dataset used and statistical models : 'PA_dataset+algo'

• Pseudo-absences selection dataset : 'PA_dataset'

• Statistical models : 'algo'

• A total consensus model : 'all'

The value chosen for this parameter will control the number of ensemble models built. If no evaluation data was given the at BIOMOD_FormatingData step, some ensemble models evaluation may be a bit unfair because the data that will be used for evaluating ensemble models could differ from those used for evaluate BIOMOD_Modeling models (in particular, some data used for 'basal models' calibration can be re-used for ensemble models evaluation). You have to keep it in mind ! (EnsembleModelingAssembly vignette for extra details)

3. Evaluation metrics

• eval.metric

The selected metrics here are necessary the ones chosen at the BIOMOD_Modeling step. If you select several, ensembles will be built according to each of them. The chosen metrics will be used at different stages in this function :

1. to remove ‘bad models’ (having a score lower than eval.metric.quality.threshold (see bellow))

2. to make the binary transformation needed for committee averaging computation

3. to weight the models in the probability weighted mean model

4. to test (and/or evaluate) your ensemble-models forecasting ability (at this step, each ensemble-model (ensemble will be evaluated according to each evaluation metric)

• eval.metric.quality.threshold

You have to give as many threshold as eval.metric you have selected. If you have selected several evaluation metrics , take care to ensure that the order of thresholds matches the order of eval.metric. All models having a score lower than these quality thresholds will not be kept for building ensemble-models.

4. Ensemble-models algorithms

1. Mean of probabilities (prob.mean)

This ensemble-model corresponds to the mean probabilities over the selected models.

2. Coefficient of variation of Probabilities (prob.cv)

This ensemble-model corresponds to the coefficient of variation (i.e. sd / mean) of the probabilities over the selected models. This model is not scaled. It will be evaluated like all other ensemble-models although this interpretation is obviously different. CV is a measure of uncertainty rather a measure of probability of occurrence. If the CV gets a high evaluation score it means that the uncertainty is high where the species is observed (which might not be a good feature of the models). The lower is the score, the better are the models. CV is a nice complement to the mean probability.

3. Confidence interval (prob.ci & prob.ci.alpha) This is the confidence interval around the mean probability (see above). This is also a nice complement to the mean probability. Two ensemble-models will be built if prob.ci is TRUE :

• The upper one (there is less than a 100*prob.ci.alpha/2 % of chance to get probabilities upper than the given ones)

• The lower one (there is less than a 100*prob.ci.alpha/2 % of chance to get probabilities lower the than given ones)

These intervals are calculated with the following function :

I_c = [ \bar{x} - \frac{t_α sd }{ √{n} }; \bar{x} + \frac{t_α sd }{ √{n} }]

4. Median of probabilities (prob.median)

This ensemble-model corresponds to the median probability over the selected models. The median is less sensitive to outliers than the mean. In practical terms, calculating the median requires more time and memory than the mean (or even weighting mean) as it asks to load all predictions to then extract the median. It may need to be considered in case of large dataset.

5. Models committee averaging (committee.averaging)

To do this model, the probabilities from the selected models are first transformed into binary data according to the thresholds defined at BIOMOD_Modeling step (maximizing evaluation metric score over ‘testing dataset’ ). The committee averaging score is then the average of binary predictions. It is built on the analogy of a simple vote. Each model vote for the species being ether present or absent. For each site, the sum of 1 is then divided by the number of models. The interesting feature of this measure is that it gives both a prediction and a measure of uncertainty. When the prediction is close to 0 or 1, it means that all models agree to predict 0 and 1 respectively. When the prediction is around 0.5, it means that half the models predict 1 and the other half 0.

6. Weighted mean of probabilities (prob.mean.weight & prob.mean.weight.decay)

This algorithm return the mean weighted (or more precisely this is the weighted sum) by the selected evaluation method scores (better a model is, more importance it has in the ensemble). The scores come from BIOMOD_Modeling step.

The prob.mean.weight.decay is the ratio between a weight and the following or prior one. The formula is : W = W(-1) * prob.mean.weight.decay. For example, with the value of 1.6 and 4 weights wanted, the relative importance of the weights will be 1 /1.6/2.56(=1.6*1.6)/4.096(=2.56*1.6) from the weakest to the strongest, and gives 0.11/0.17/0.275/0.445 considering that the sum of the weights is equal to one. The lower the prob.mean.weight.decay, the smoother the differences between the weights enhancing a weak discrimination between models.

The value 'proportional' (default) is also possible for the prob.mean.weight.decay: the weights are awarded for each method proportionally to their evaluation scores. The advantage is that the discrimination is more fair than with the prob.mean.weight.decay. In the latter case, close scores can strongly diverge in the weights they are awarded, when the proportional method will consider them as being fairly similar in prediction quality and award them a similar weight. It is also possible to define a function as prob.mean.weight.decay argument. In this case the given function will be applied to models scores to transforme them into weights that will be used for wheigted mean ensemble model building. For instance if you specified  function(x){x^2}  as prob.mean.weight.decay, the squared of evaluation score of each model will be used to weight formal models predictions.

## Value

A "BIOMOD.EnsembleModeling.out". This object will be later given to BIOMOD_EnsembleForecasting if you want to make some projections of this ensemble-models.

You can access to evaluation scores with the get_evaluations function and to the built models names with the get_built_models function (see example).

## Note

Models are now combined by repetition, other way to combine them (e.g. by Models, all together...) will be available soon

## Author(s)

Damien Georges & Wilfried Thuiller with participation of Robin Engler

BIOMOD_Modeling, BIOMOD_Projection, BIOMOD_EnsembleForecasting

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 # species occurrences DataSpecies <- read.csv(system.file("external/species/mammals_table.csv", package="biomod2"), row.names = 1) head(DataSpecies) # the name of studied species myRespName <- 'GuloGulo' # the presence/absences data for our species myResp <- as.numeric(DataSpecies[,myRespName]) # the XY coordinates of species data myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")] # Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12) myExpl = raster::stack( system.file( "external/bioclim/current/bio3.grd", package="biomod2"), system.file( "external/bioclim/current/bio4.grd", package="biomod2"), system.file( "external/bioclim/current/bio7.grd", package="biomod2"), system.file( "external/bioclim/current/bio11.grd", package="biomod2"), system.file( "external/bioclim/current/bio12.grd", package="biomod2")) # 1. Formatting Data myBiomodData <- BIOMOD_FormatingData(resp.var = myResp, expl.var = myExpl, resp.xy = myRespXY, resp.name = myRespName) # 2. Defining Models Options using default options. myBiomodOption <- BIOMOD_ModelingOptions() # 3. Doing Modelisation myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, models = c('SRE','CTA','RF'), models.options = myBiomodOption, NbRunEval=1, DataSplit=80, Yweights=NULL, VarImport=3, models.eval.meth = c('TSS'), SaveObj = TRUE, rescal.all.models = FALSE, do.full.models = FALSE) # 4. Doing Ensemble Modelling myBiomodEM <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut, chosen.models = 'all', em.by = 'all', eval.metric = c('TSS'), eval.metric.quality.threshold = c(0.7), models.eval.meth = c('TSS','ROC'), prob.mean = TRUE, prob.cv = FALSE, prob.ci = FALSE, prob.ci.alpha = 0.05, prob.median = FALSE, committee.averaging = FALSE, prob.mean.weight = TRUE, prob.mean.weight.decay = 'proportional' ) # print summary myBiomodEM # get evaluation scores get_evaluations(myBiomodEM) 

### Example output

Loading required package: sp

Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
X_WGS84  Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca PteropusGiganteus
1   -94.5 82.00001                0        0            0                 0
2   -91.5 82.00001                0        1            0                 0
3   -88.5 82.00001                0        1            0                 0
4   -85.5 82.00001                0        1            0                 0
5   -82.5 82.00001                0        1            0                 0
6   -79.5 82.00001                0        1            0                 0
TenrecEcaudatus VulpesVulpes
1               0            0
2               0            0
3               0            0
4               0            0
5               0            0
6               0            0
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=

> No pseudo absences selection !
! No data has been set aside for modeling evaluationNOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Checking Models arguments...

Creating suitable Workdir...

> No weights : all observations will have the same weight

-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=

5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
Number of evaluation repetitions : 1
Models selected : SRE CTA RF

Total number of model runs : 3

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=- Run :  GuloGulo_AllData

-=-=-=--=-=-=- GuloGulo_AllData_RUN1

Model=Surface Range Envelop
Evaluating Model stuff...
Evaluating Predictor Contributions...

Model=Classification tree
5 Fold Cross-Validation
Evaluating Model stuff...
Evaluating Predictor Contributions...

Model=Breiman and Cutler's random forests for classification and regression
Evaluating Model stuff...
Evaluating Predictor Contributions...

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=

! all models available will be included in ensemble.modeling
> Evaluation & Weighting methods summary :
TSS over 0.7

> mergedAlgo_mergedRun_mergedData ensemble modeling
! Models projections for whole zonation required...
> Projecting GuloGulo_AllData_RUN1_SRE ...
> Projecting GuloGulo_AllData_RUN1_CTA ...
> Projecting GuloGulo_AllData_RUN1_RF ...

> Mean of probabilities...
Evaluating Model stuff...
> Prababilities wegthing mean...
original models scores =  0.761 0.896 0.913
final models weights =  0.296 0.349 0.355
Evaluating Model stuff...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-= 'BIOMOD.EnsembleModeling.out' -=-=-=-=-=-=-=-=-=-=-=-=

sp.name : GuloGulo

expl.var.names : bio3 bio4 bio7 bio11 bio12

models computed:
GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData, GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
$GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData Testing.data Cutoff Sensitivity Specificity TSS 0.940 472 96.974 97.154 ROC 0.993 471 96.974 97.154$GuloGulo_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
TSS        0.947  442.0      97.579      97.044
ROC        0.993  440.5      97.731      97.044

Warning message:
system call failed: Cannot allocate memory


biomod2 documentation built on May 31, 2017, 2:55 a.m.