Models: Running the species-climate envelop models available in...
In BIOMOD: Ensemble platform for species distribution modeling

Description Usage Arguments Details Value Author(s) See Also Examples

This function allows to train various species-climate modelling algorithms. It enables also to run three evaluation techniques (TSS, Kappa, Roc curve) to estimate the predictive performance of the models produced.

Models(GLM = FALSE, TypeGLM = "simple", Test = "AIC", GBM = FALSE, No.trees = 5000, GAM = FALSE, Spline = 3, CTA = FALSE, CV.tree = 50, ANN = FALSE, CV.ann = 5, SRE = FALSE, quant=0.025, FDA = FALSE, MARS = FALSE, RF = FALSE, NbRunEval = 1, DataSplit = 100, NbRepPA=0, strategy="sre", coor=NULL, distance=0, nb.absences=NULL, Yweights = NULL, VarImport = 0, Roc = FALSE, Optimized.Threshold.Roc = FALSE, Kappa = FALSE, TSS = FALSE, KeepPredIndependent = FALSE)

`GLM`	type True to run the Generalised Linear Model, or False to switch it off
`TypeGLM`	the complexity of the terms of the GLM : 'simple', 'poly', 'quad' standing for linear, polynomial or quadratic respectively (see manual for further details).
`Test`	the stepwise procedure for the GLM, either use the AIC or BIC criteria.
`GBM`	type True to run the Generalised Boosting Model, or False to switch it off
`No.trees`	the maximum number of trees for the GBM
`GAM`	type True to run the Generalised Additive Model, or False to switch it off
`Spline`	the degree of smoothing of the spline function for the GAM (3 by default)
`CTA`	type True to run the Classification and regression Tree Analysis, or False to switch it off
`CV.tree`	number of cross-validation in CTA to estimate the length of the optimal tree
`ANN`	type True to run the Artificial Neural Network, or False to switch it off
`CV.ann`	number of cross validation for the ANN
`SRE`	type True to run the Surface Range Envelope, or False to switch it off
`quant`	the value defines the extreme percentiles of the response data not to be used by the SRE for calibration (see `sre` for more details)
`FDA`	type True to run the Flexible Discriminant Analysis, or False to switch it off
`MARS`	type True to run the Multiple Adaptive Regression Splines, or False to switch it off
`RF`	type True to run the Random Forest, or False to switch it off
`NbRunEval`	the number of evaluation runs to proceed before the final 100 percent data model is run (see the manual or the details section below for further explanations)
`DataSplit`	to determine the ratio used for splitting the original database in calibration and evaluation subsets. The value given (between 0 and 100) is the percentage allocated to calibration
`NbRepPA`	The number of repetitions for the selection of pseudo absences
`strategy`	the strategy to use for selecting pseudo absences. Can be either "circles", "squares", "per", "random" or "sre". See `pseudo.abs` for details.
`coor`	a two columned matrix giving the coordinates of your data points. It is needed for the "per", "circles" and "squares" strategies
`distance`	a value giving the distance to use for the "per", "circles" and "squares" strategy of the pseudo absences selection. See `pseudo.abs` for details.
`nb.absences`	the number of pseudo absences wanted to run the models with. They are randomly selected from the pool of pseudo absences available selected by the given strategy.
`Yweights`	a N-columns matrix, N being the number of species, giving the weights that the user can set for the response variables. This is similar to an index of detectability for each site, which allows users to give stronger weights to more reliable presences or absences
`VarImport`	if True, this parameter enables a direct comparison of the explanatory variable importance across models (see the manual for further explanations)
`Roc`	If True, the Area Under the ROC curve score will be evaluated for all the models selected and for each species
`Optimized.Threshold.Roc`	Type True to determine a threshold with the Roc method optimising the percentage of presences and absences correctly predicted (this is not automatic as the Roc curve is a threshold independent method).
`Kappa`	If True, the Kappa score will be evaluated and a threshold produced for all the models selected and for each species
`TSS`	If True, the True Skill Statistic score will be evaluated and a threshold produced for all the models selected and for each species
`KeepPredIndependent`	If true, the predictions on the independent data will be saved. Otherwise, only the predictive accuracy on the evaluation set are conserved (NbRunEval)

The dataset used is DataBIOMOD produced by the Initial.State function. A prior run of this function is therefore necessary. The two arguments explained below are crucial as they determine the number of runs that will be done for each species. Be carefull not to over multiply thevtotal number of runs in order to keep a reasonable running time.

NbRepPA : BIOMOD now enables to calibrate the models with only a limited number of absences from the full dataset given in Initial.State. If NbRepPA is equal to 0 the calibration procedure will be the standard one, i.e. taking all the data available. Else (NbRepPA > 0), an inner run of the pseudo.abs function will be done for each species, with the strategy chosen, to determine the pool of pseudo-absences available for calibration. The number of absences taken for calibration is determined by the nb.absences argument. For example, if there are 50000 pseudo-absences available and nb.absences=2000, those 2000 absences will be randomly selected from the preselected 50000 pseudo-absences. This random selection is repeated NbRepPA-times to constitute the runs PA1, PA2, and so forth.

NbRunEval : as already explained in the Initial.State help file, the common trend is to split the original dataset into two subsets for the calibration and evaluation steps of the models construction. Here we provide the possibility to do multiple runs with this standard procedure (NbRunEval times) to have an evaluation on independent-like data, whilst building the final model on 100 percent of the given data. Since march 2009, all the models produced by BIOMOD and their related informations are saved on the hard disk. This is to enable the possibility of assessing the stochastic effect of the splitting procedure of the calibration step by being able to render projections with those models.

Yweights : When NbRepPA is set to a value higher than zero, some weights are awarded automatically. For all the runs across all the species, the weights are determined to obtain a prevalence of 0.5, meaning that the presences will have the same importance than the absences in the calibration process of the models.

No values are returned but several objects are created :

`Evaluation.results.Kappa`
`Evaluation.results.Roc`
`Evaluation.results.TSS`	a list of matrices (one for each run) storing the result of the evaluation procedures for Kappa / TSS / Roc for each model : the score of the model on the evaluation and calibration data, the threshold to be used for converting the probabilities into binary or filtered data, the sensitivity and specificity of the model-threshold combination
`VarImportance`	a list of matrices (one per species) containing the results of the importance of the variables analysis on the final model. The highest the value, the more influence the variable has on the model. A value of this 0 assumes no influence of that variable on the model. Note that this technique does not account for interactions between the variables.
`Models.information`	Crucial information for the models for rendering projections (but not of any interest for the user)
`Biomod.PA.data`	if NbRepPA>0, it stores the available data for each species after the inner `pseudo.abs` run.
`Biomod.PA.sample`	if NbRepPA>0, it stores the data that has been selected for each run from Biomod.PA.data for calibrating the models.

Additionnal objects are stored out of R in two different directories for memory storage purposes. They are created by the function directly on the root of your working directory set in R ("models" and "pred" directories). The first one contains the models, the second one the predictions on current data. Other objects will later be produced in the "pred" directory by other functions. Additionnal directories will be created by the the Projection function.

The models are currently stored as objects to be read exclusively in R. To load them back (the same stands for all objects stored on the hard disk) use the load function (see examples section below).

The predictions are stored as arrays (see the examples) with 4 dimensions. Please, see the "Biomod Manual.pdf" (inside the BIOMOD package) or the examples below for further details.

Wilfried Thuiller, Bruno Lafourcade, Robin Engler

Initial.State, Projection, CurrentPred, PredictionBestModel, response.plot, level.plot

## Not run: 
data(Sp.Env)
data(CoorXY)
#use fix(Sp.Env) to visualise the dataset

#This command is necessary for the run of BIOMOD as a new dataframe is produced for the Models function
Initial.State(Response=Sp.Env[,12:14], Explanatory=Sp.Env[,4:10], 
IndependentResponse=NULL, IndependentExplanatory=NULL)


#Here are done 2 PA runs and 2 repetitions for each run. Here we will have 36 runs per species. This will hence take several minutes.
## NOT RUN ON ALL MODELS
#Models(GLM = TRUE, TypeGLM = "quad", Test = "AIC", GBM = TRUE, No.trees = 3000, GAM = TRUE, CTA = TRUE, CV.tree = 50, 
#   ANN = TRUE, CV.ann = 2, SRE = TRUE, quant=0.05, FDA = TRUE, MARS = TRUE, RF = TRUE, NbRunEval = 2, DataSplit = 80,
#   Yweights=NULL, Roc=TRUE, Optimized.Threshold.Roc=TRUE, Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,
#   NbRepPA=2, strategy="circles", coor=CoorXY, distance=2, nb.absences=1000)

Models(GLM = FALSE, TypeGLM = "quad", Test = "AIC", GBM = FALSE, No.trees = 3000, GAM = FALSE, CTA = TRUE, CV.tree = 50, 
   ANN = FALSE, CV.ann = 2, SRE = FALSE, quant=0.05, FDA = TRUE, MARS = FALSE, RF = FALSE, NbRunEval = 2, DataSplit = 80,
   Yweights=NULL, Roc=TRUE, Optimized.Threshold.Roc=TRUE, Kappa=TRUE, TSS=TRUE, KeepPredIndependent = FALSE, VarImport=5,
   NbRepPA=2, strategy="circles", coor=CoorXY, distance=2, nb.absences=1000)


#view all the produced objects with ls()
ls()

#Take a look at the format in which the predictions are stored
load("pred/Pred_Sp290")
dim(Pred_Sp290)
dimnames(Pred_Sp290)[-1]

Pred_Sp290[1:20,,,1]
#please check the UserGuide for a full explanation of the structure of the array.
Biomod.Manual("BiomodTutorial")
Biomod.Manual("Biomod_Presentation_Manual")
Biomod.Manual("Biomod_Practical_Building_Models")

#You can load back a model and visualise it with the load() function :
load("models/Sp290_FDA_PA1")
Sp290_FDA_PA1

load("models/Sp290_FDA_PA1")
Sp290_FDA_PA1
str(Sp290_FDA_PA1)

#to see the results of the evaluation procedures
Evaluation.results.Kappa


#check which variable has the most impact on the models (only for the 100 percent model)
VarImportance


#This next function enables to plot the response curves of the model
load("models/Sp290_CTA_PA1")
response.plot(Sp290_CTA_PA1, Sp.Env[4:10])


#Another series of functions enable to perform further steps, see ?CurrentPred and ?PredictionBestModel.


## End(Not run)