BIOMOD_Modeling: Run a range of species distribution models

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/BIOMOD_Modeling.R

Description

This function allows to calibrate and evaluate a range of species distribution models techniques run over a given species. Calibrations are made on the whole sample or a random subpart. The predictive power of the different models is estimated using a range of evaluation metrics.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
BIOMOD_Modeling( data, 
                 models = c('GLM','GBM','GAM','CTA','ANN',
                            'SRE','FDA','MARS','RF','MAXENT.Phillips', 
                            "MAXENT.Tsuruoka"), 
                 models.options = NULL, 
                 NbRunEval=1, 
                 DataSplit=100, 
                 Yweights=NULL,
                 Prevalence=NULL,
                 VarImport=0, 
                 models.eval.meth = c('KAPPA','TSS','ROC'), 
                 SaveObj = TRUE,
                 rescal.all.models = FALSE,
                 do.full.models = TRUE,
                 modeling.id = as.character(format(Sys.time(), '%s')),
                 ...)

Arguments

data

BIOMOD.formated.data object returned by BIOMOD_FormatingData

models

vector of models names choosen among 'GLM', 'GBM', 'GAM', 'CTA', 'ANN', 'SRE', 'FDA', 'MARS', 'RF', 'MAXENT.Phillips' and "MAXENT.Tsuruoka"

models.options

BIOMOD.models.options object returned by BIOMOD_ModelingOptions

NbRunEval

Number of Evaluation run

DataSplit

% of data used to calibrate the models, the remaining part will be used for testing

Yweights

response points weights

Prevalence

either NULL (default) or a 0-1 numeric used to build 'weighted response weights'

VarImport

Number of permutation to estimate variable importance

models.eval.meth

vector of names of evaluation metric among 'KAPPA', 'TSS', 'ROC', 'FAR', 'SR', 'ACCURACY', 'BIAS', 'POD', 'CSI' and 'ETS'

SaveObj

keep all results and outputs on hard drive or not (NOTE: strongly recommended)

rescal.all.models

if true, all model prediction will be scaled with a binomial GLM

do.full.models

if true, models calibrated and evaluated with the whole dataset are done

modeling.id

character, the ID (=name) of modeling procedure. A random number by default.

...

further arguments :

  • DataSplitTable : a matrix, data.frame or a 3D array filled with TRUE/FALSE to specify which part of data must be used for models calibration (TRUE) and for models validation (FALSE). Each column correspund to a 'RUN'. If filled, args NbRunEval, DataSplit and do.full.models will be ignored.

Details

  1. data

    If you have decide to add pseudo absences to your original dataset (see BIOMOD_FormatingData), NbPseudoAbsences * NbRunEval + 1 models will be created.

  2. models

    The set of models to be calibrated on the data. 10 modeling techniques are currently available:

    • GLM : Generalized Linear Model (glm)

    • GAM : Generalized Additive Model (gam, gam or bam, see BIOMOD_ModelingOptions for details on algorithm selection)

    • GBM : Generalized Boosting Model or usually called Boosted Regression Trees (gbm)

    • CTA: Classification Tree Analysis (rpart)

    • ANN: Artificial Neural Network (nnet)

    • SRE: Surface Range Envelop or usually called BIOCLIM

    • FDA: Flexible Discriminant Analysis (fda)

    • MARS: Multiple Adaptive Regression Splines (earth)

    • RF: Random Forest (randomForest)

    • MAXENT.Phillips: Maximum Entropy (http://www.cs.princeton.edu/~schapire/maxent/)

    • MAXENT.Tsuruoka: low-memory multinomial logistic regression (maxent)

  3. NbRunEval & DataSplit

    As already explained in the BIOMOD_FormatingData help file, the common trend is to split the original dataset into two subsets, one to calibrate the models, and another one to evaluate them. Here we provide the possibility to repeat this process (calibration and evaluation) N times (NbRunEval times). The proportion of data kept for calibration is determined by the DataSplit argument (100% - DataSplit will be used to evaluate the model). This sort of cross-validation allows to have a quite robust test of the models when independent data are not available. Each technique will also be calibrated on the complete original data. All the models produced by BIOMOD and their related informations are saved on the hard drive.

  4. Yweights & Prevalence

    Allows to give more or less weight to some particular observations. If these arguments is kept to NULL (Yweights = NULL, Prevalence = NULL), each observation (presence or absence) has the same weight (independent of the number of presences and absences). If Prevalence = 0.5 absences will be weighted equally to the presences (i.e. the weighted sum of presence equals the weighted sum of absences). If prevalence is set below or above 0.5 absences or presences are given more weight, respectively. In the particular case that pseudo-absence data have been generated BIOMOD_FormatingData (PA.nb.rep > 0), weights are by default (Prevalence = NULL) calculated such that prevalence is 0.5, meaning that the presences will have the same importance as the absences in the calibration process of the models. Automatically created Yweights will be composed of integers to prevent different modelling issues. Note that the Prevalence argument will always be ignored if Yweights are defined.

  5. models.eval.meth

    The available evaluations methods are :

    • ‘ROC’ : Relative Operating Characteristic

    • ‘KAPPA’ : Cohen's Kappa (Heidke skill score)

    • ‘TSS’ : True kill statistic (Hanssen and Kuipers discriminant, Peirce's skill score)

    • ‘FAR’ : False alarm ratio

    • ‘SR’ : Success ratio

    • ‘ACCURANCY’ : Accuracy (fraction correct)

    • ‘BIAS’ : Bias score (frequency bias)

    • ‘POD’ : Probability of detection (hit rate)

    • ‘CSI’ : Critical success index (threat score)

    • ‘ETS’ : Equitable threat score (Gilbert skill score)

    Some of them are scaled to have all an optimum at 1. You can choose one of more (vector) evaluation metric. By Default, only 'KAPPA', 'TSS' and 'ROC' evaluation are done. Please refer to the CAWRC website (http://www.cawcr.gov.au/projects/verification/#Methods_for_dichotomous_forecasts) to get detailled description of each metric.

  6. SaveObj

    If this argument is set to False, it may prevent the evaluation of the ‘ensemble modelled’ models in further steps. We strongly recommend to always keep this argument TRUE even it asks for free space onto the hard drive.

  7. rescal.all.models

    This parameter is quite experimental and we adcise not to use it. It should lead to reduction in projection scale amplitude Some categorical models have to be scaled in every case (‘FDA’, ‘ANN’). But It may be interesting to scale all model computed to ensure that they will produced comparable predictions (0-1000 ladder). That's particularly useful when you do some ensemble forecasting to remove the scale prediction effect (the more extended projections are, the more they influence ensemble forecasting results).

  8. do.full.models

    Building models with all information available may be usefull in some particular cases (i.e. rare species with few presences points). The main drawback of this method is that, if you don't give separated data for models evaluation, your models will be evaluated with the same data that the ones used for calibration. Thats will lead to over-optimistic evaluation scores. Be carefull whith this '_Full' models interpretation.

Value

A BIOMOD.models.out object

See "BIOMOD.models.out" for details.

Additional objects are stored out of R in two different directories for memory storage purposes. They are created by the function directly on the root of your working directory set in R ("models" directory). This one contains each calibrated model for each repetition and pseudo-absence run. A hidden folder ‘.DATA_BIOMOD’ contains some files (predictions, original dataset copy, pseudo absences chosen...) used by other functions like BIOMOD_Projection or BIOMOD_EnsembleModeling .

The models are currently stored as objects to be read exclusively in R. To load them back (the same stands for all objects stored on the hard disk) use the load function (see examples section below).

Author(s)

Wilfried Thuiller, Damien Georges, Robin Engler

See Also

BIOMOD_FormatingData, BIOMOD_ModelingOptions, BIOMOD_Projection

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
                                    package="biomod2"))
head(DataSpecies)

# the name of studied species
myRespName <- 'GuloGulo'

# the presence/absences data for our species 
myResp <- as.numeric(DataSpecies[,myRespName])

# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]


# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = stack( system.file( "external/bioclim/current/bio3.grd", 
                             package="biomod2"),
                system.file( "external/bioclim/current/bio4.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio7.grd", 
                             package="biomod2"),  
                system.file( "external/bioclim/current/bio11.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio12.grd", 
                             package="biomod2"))

# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)
                                                                     
# 2. Defining Models Options using default options.
myBiomodOption <- BIOMOD_ModelingOptions()

# 3. Doing Modelisation

myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, 
                                       models = c('SRE','RF'), 
                                       models.options = myBiomodOption, 
                                       NbRunEval=2, 
                                       DataSplit=80, 
                                       VarImport=0, 
                                       models.eval.meth = c('TSS','ROC'),
                                       do.full.models=FALSE,
                                       modeling.id="test")
                                       
## print a summary of modeling stuff
myBiomodModelOut
                                       
                                    

Example output

Loading required package: sp
Loading required package: raster
Loading required package: parallel
Loading required package: reshape
Loading required package: ggplot2
biomod2 3.3-7 loaded.

Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
  X X_WGS84  Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca PteropusGiganteus
1 1   -94.5 82.00001                0        0            0                 0
2 2   -91.5 82.00001                0        1            0                 0
3 3   -88.5 82.00001                0        1            0                 0
4 4   -85.5 82.00001                0        1            0                 0
5 5   -82.5 82.00001                0        1            0                 0
6 6   -79.5 82.00001                0        1            0                 0
  TenrecEcaudatus VulpesVulpes
1               0            0
2               0            0
3               0            0
4               0            0
5               0            0
6               0            0
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=

> No pseudo absences selection !
      ! No data has been set aside for modeling evaluationNOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Loading required library...

Checking Models arguments...

Creating suitable Workdir...

	> No weights : all observations will have the same weight


-=-=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=-=

 5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
Number of evaluation repetitions : 2
Models selected : SRE RF 

Total number of model runs : 4 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


-=-=-=- Run :  GuloGulo_AllData 


-=-=-=--=-=-=- GuloGulo_AllData_RUN1 

Model=Surface Range Envelop
	Evaluating Model stuff...
Model=Breiman and Cutler's random forests for classification and regression
	Evaluating Model stuff...

-=-=-=--=-=-=- GuloGulo_AllData_RUN2 

Model=Surface Range Envelop
	Evaluating Model stuff...
Model=Breiman and Cutler's random forests for classification and regression
	Evaluating Model stuff...
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.out -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Modeling id : test

Species modeled : GuloGulo

Considered variables : bio3 bio4 bio7 bio11 bio12


Computed Models :  GuloGulo_AllData_RUN1_SRE GuloGulo_AllData_RUN1_RF 
GuloGulo_AllData_RUN2_SRE GuloGulo_AllData_RUN2_RF


Failed Models :  none

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

biomod2 documentation built on May 29, 2017, 9:33 a.m.