mopaTrain: Easy species distribution modeling and cross validation
In mopa: Species Distribution MOdeling with Pseudo-Absences

Description Usage Arguments Details Value Author(s) References See Also Examples

Species distribution modeling and k-fold cross validation for a set of presence/absence data per species, also considering different background extents (optional). Algorithms supported are "glm", "svm", "maxent", "mars", "rf", "cart.rpart" and "cart.tree"

1
2
3

mopaTrain(y, x, k = 10, algorithm = c("glm", "svm", "maxent", "mars", "rf",
  "cart.rpart", "cart.tree"), algorithm.args = NULL, weighting = FALSE,
  threshold = NULL, diagrams = FALSE, tuneRF.args = NULL)

`y`	Object returned by function `pseudoAbsences` or data frame or list/s of data frames with coordinates in the first two columns and presence/absence (1=presence, 0=absence) in the third column.
`x`	RasterStack ot list of RasterStacks of variables for modeling, a.k.a baseline environment/climatology
`k`	Integer. Number of folds for cross validation. Default is 10
`algorithm`	Character string of the algorithms for modeling. Options are the following: "glm", "svm", "maxent", "mars", "rf", "cart.rpart" and "cart.tree" (see details)
`algorithm.args`	Further arguments to be passed to the selected algorithm for modeling (functions involved are described in details).
`weighting`	Logical for model fitting with weighted presence/absences. Applicable for algorithms "glm", "mars", "rf", cart.tree and "cart.rpart". Default is FALSE. The processing time is considerably increased if weighting option is selected when the "mars" algorithm (see `earth` is applied.
`threshold`	Cut value between 0 and 1 to calculate the confusion matrix. Default is NULL (see Details).
`diagrams`	Logical. Only applied if `x` contains data for different background extents (see `backgroundRadius` and `pseudoAbsences`). Should diagrams of AUC extent fitting be printed? default is FALSE.
`tuneRF.args`	list of arguments from function `tuneRF`. Only used when algorihm = "rf"

This function calculates the AUC with the function auc from package PresenceAbsence. Note: Package SDMTools must be detached.

If threshold is not specified the value that maximisez the TSS (true skill statistic) is used to calculate the confusion matrix.

If y contains data for different background extents (see backgroundRadius and pseudoAbsences), mopaTrain performs the species distribution modeling for each different background extent, and fits obtained AUCs (corresponding to different background extents) to three non linear models (Michaelis-Menten, exponential2 and exponential3). The model that scores the lowest error is automatically selected to extract the Vm coefficient (equation 1 in Iturbide et al., 2015). Then, the minimum extent at which the AUC surpasses the Vm value is selected as the threshold extent (see Figure 3 in Iturbide et al., 2015), being the corresponding fitted SDM the one returned by mopaFitting. If argument diagrams is set to TRUE, A fitted model plot (as in Fig. 3 in Iturbide et al., 2015) is printed in the plotting environment.

mopaTrain uses the algorithm implementations of the following functions and R packages:

"mars" function earth from package earth
"rf" function ranger from package ranger
"maxent" function maxent from package dismo
"cart.rpart" function rpart from package rpart
"svm" function best.svm from package e1071
"cart.tree" function tree from package tree
"glm" function glm from package stats

For example, when appying "glm", further arguments from function glm can be passed to mopaTrain by using algorithm.args.

A list of six components is returned for each species in x:

$model fitted model using all data for training
$auc AUC statistic in the cross validation
$kappa kappa statistic in the cross validation
$tss true skill statistic in the cross validation
$fold.models fitted models of each data partition for cross validation
$ObsPred cross model prediction (e.g. for further assessment of model accuracy)

M. Iturbide

Iturbide, M., Bedia, J., Herrera, S., del Hierro, O., Pinto, M., Gutierrez, J.M., 2015. A framework for species distribution modelling with improved pseudo-absence generation. Ecological Modelling. DOI:10.1016/j.ecolmodel.2015.05.018.

mopaPredict, pseudoAbsences, backgroundGrid, OCSVMprofiling, backgroundRadius, extractFromModel

## Load presence data
data(Oak_phylo2)

## Load climate data
destfile <- tempfile()
data.url <- "https://raw.githubusercontent.com/SantanderMetGroup/mopa/master/data/biostack.rda"
download.file(data.url, destfile)
load(destfile, verbose = TRUE)

## Spatial reference
r <- biostack$baseline[[1]]
## Create background grid
bg <- backgroundGrid(r)

## Generate pseudo-absences
RS_random <-pseudoAbsences(xy = Oak_phylo2, background = bg$xy, 
                           exclusion.buffer = 0.083*5, prevalence = -0.5, kmeans = FALSE)
## Model training
fittedRS <- mopaTrain(y = RS_random, x = biostack$baseline, 
                      k = 10, algorithm = "glm", weighting = TRUE)
## Extract fitted models
mods <- extractFromModel(models = fittedRS, value = "model")