BIOMOD_FormatingData: Format input data, and select pseudo-absences if wanted, for...
In biomod2: Ensemble Platform for Species Distribution Modeling

BIOMOD_FormatingData

R Documentation

Format input data, and select pseudo-absences if wanted, for usage in biomod2

Description

This function gathers together all input data needed (xy, presences/absences, explanatory variables, and the same for evaluation data if available) to run biomod2 models. It allows to select pseudo-absences if no absence data is available, with different strategies (see Details).

Usage

BIOMOD_FormatingData(
  resp.name,
  resp.var,
  expl.var,
  dir.name = ".",
  resp.xy = NULL,
  eval.resp.var = NULL,
  eval.expl.var = NULL,
  eval.resp.xy = NULL,
  PA.nb.rep = 0,
  PA.nb.absences = 1000,
  PA.strategy = NULL,
  PA.dist.min = 0,
  PA.dist.max = NULL,
  PA.sre.quant = 0.025,
  PA.user.table = NULL,
  na.rm = TRUE,
  filter.raster = FALSE
)

Arguments

`resp.name`	a `character` corresponding to the species name
`resp.var`	a `vector`, a `SpatVector` without associated data (if presence-only), or a `SpatVector` object containing binary data (`0` : absence, `1` : presence, `NA` : indeterminate) for a single species that will be used to build the species distribution model(s) Note that old format from sp are still supported such as `SpatialPoints` (if presence-only) or `SpatialPointsDataFrame` object containing binary data.
`expl.var`	a `matrix`, `data.frame`, `SpatVector` or `SpatRaster` object containing the explanatory variables (in columns or layers) that will be used to build the species distribution model(s) Note that old format from raster and sp are still supported such as `RasterStack` and `SpatialPointsDataFrame` objects.
`dir.name`	(optional, default `.`) A `character` corresponding to the modeling folder
`resp.xy`	(optional, default `NULL`) If `resp.var` is a `vector`, a 2-columns `matrix` or `data.frame` containing the corresponding `X` and `Y` coordinates that will be used to build the species distribution model(s)
`eval.resp.var`	(optional, default `NULL`) A `vector`, a `SpatVector` without associated data (if presence-only), or a `SpatVector` object containing binary data (`0` : absence, `1` : presence, `NA` : indeterminate) for a single species that will be used to evaluate the species distribution model(s) with independent data Note that old format from sp are still supported such as `SpatialPoints` (if presence-only) or `SpatialPointsDataFrame` object containing binary data.
`eval.expl.var`	(optional, default `NULL`) A `matrix`, `data.frame`, `SpatVector` or `SpatRaster` object containing the explanatory variables (in columns or layers) that will be used to evaluate the species distribution model(s) with independent data. Note that old format from raster and sp are still supported such as `RasterStack` and `SpatialPointsDataFrame` objects.
`eval.resp.xy`	(optional, default `NULL`) If `resp.var` is a `vector`, a 2-columns `matrix` or `data.frame` containing the corresponding `X` and `Y` coordinates that will be used to evaluate the species distribution model(s) with independent data
`PA.nb.rep`	(optional, default `0`) If pseudo-absence selection, an `integer` corresponding to the number of sets (repetitions) of pseudo-absence points that will be drawn
`PA.nb.absences`	(optional, default `0`) If pseudo-absence selection, and `PA.strategy = 'random'` or `PA.strategy = 'sre'` or `PA.strategy = 'disk'`, an `integer` corresponding to the number of pseudo-absence points that will be selected for each pseudo-absence repetition (true absences included). It can also be a `vector` of the same length as `PA.nb.rep` containing `integer` values corresponding to the different numbers of pseudo-absences to be selected
`PA.strategy`	(optional, default `NULL`) If pseudo-absence selection, a `character` defining the strategy that will be used to select the pseudo-absence points. Must be `random`, `sre`, `disk` or `user.defined` (see Details)
`PA.dist.min`	(optional, default `0`) If pseudo-absence selection and `PA.strategy = 'disk'`, a `numeric` defining the minimal distance to presence points used to make the `disk` pseudo-absence selection (in meters, see Details)
`PA.dist.max`	(optional, default `0`) If pseudo-absence selection and `PA.strategy = 'disk'`, a `numeric` defining the maximal distance to presence points used to make the `disk` pseudo-absence selection (in meters, see Details)
`PA.sre.quant`	(optional, default `0`) If pseudo-absence selection and `PA.strategy = 'sre'`, a `numeric` between `0` and `0.5` defining the half-quantile used to make the `sre` pseudo-absence selection (see Details)
`PA.user.table`	(optional, default `NULL`) If pseudo-absence selection and `PA.strategy = 'user.defined'`, a `matrix` or `data.frame` with as many rows as `resp.var` values, as many columns as `PA.nb.rep`, and containing `TRUE` or `FALSE` values defining which points will be used to build the species distribution model(s) for each repetition (see Details)
`na.rm`	(optional, default `TRUE`) A `logical` value defining whether points having one or several missing values for explanatory variables should be removed from the analysis or not
`filter.raster`	(optional, default `FALSE`) If `expl.var` is of raster type, a `logical` value defining whether `resp.var` is to be filtered when several points occur in the same raster cell

Details

This function gathers and formats all input data needed to run biomod2 models. It supports different kind of inputs (e.g. matrix, SpatVector, SpatRaster) and provides different methods to select pseudo-absences if needed.

Concerning explanatory variables and XY coordinates :

if SpatRaster, RasterLayer or RasterStack provided for expl.var or eval.expl.var,
biomod2 will extract the corresponding values from XY coordinates provided :
- either through resp.xy or eval.resp.xy respectively
- or resp.var or eval.resp.var, if provided as SpatVector or SpatialPointsDataFrame
Be sure to give the objects containing XY coordinates in the same projection system than the raster objects !
if data.frame or matrix provided for expl.var or eval.expl.var,
biomod2 will simply merge it (cbind) with resp.var without considering XY coordinates.
Be sure to give explanatory and response values in the same row order !

Concerning pseudo-absence selection (see bm_PseudoAbsences) :

if both presence and absence data are available, and there is enough absences : set PA.nb.rep = 0 and no pseudo-absence will be selected.
if no absence data is available, several pseudo-absence repetitions are recommended (to estimate the effect of pseudo-absence selection), as well as high number of pseudo-absence points.
Be sure not to select more pseudo-absence points than maximum number of pixels in the studied area !
it is possible now to create several pseudo-absence repetitions with different number of points, BUT with the same sampling strategy.

Response variable

biomod2 models single species at a time (no multi-species). Hence, resp.var must be a uni-dimensional object (either a vector, a one-column matrix, data.frame, a SpatVector (without associated data - if presence-only), a SpatialPoints (if presence-only), a SpatialPointsDataFrame or SpatVector object), containing values among :

1 : presences
0 : true absences (if any)
NA : no information point (might be used to select pseudo-absences if any)

If no true absences are available, pseudo-absence selection must be done.
If resp.var is a non-spatial object (vector, matrix or data.frame), XY coordinates must be provided through resp.xy.
If pseudo-absence points are to be selected, NA points must be provided in order to select pseudo-absences among them.

Explanatory variables

Factorial variables are allowed, but might lead to some pseudo-absence strategy or models omissions (e.g. sre).

Evaluation data

Although biomod2 provides tools to automatically divide dataset into calibration and validation parts through the modeling process (see CV.[..] parameters in BIOMOD_Modeling function ; or bm_CrossValidation function), it is also possible (and strongly advised) to directly provide two independent datasets, one for calibration/validation and one for evaluation

Pseudo-absence selection (see bm_PseudoAbsences)

If no true absences are available, pseudo-absences must be selected from the background data, meaning data there is no information whether the species of interest occurs or not. It corresponds either to the remaining pixels of the expl.var (if provided as a SpatRaster or RasterSatck) or to the points identified as NA in resp.var (if expl.var provided as a matrix or data.frame).
Several methods are available to do this selection :

random: all points of initial background are pseudo-absence candidates. PA.nb.absences are drawn randomly, for each PA.nb.rep requested.
sre: pseudo-absences have to be selected in conditions (combination of explanatory variables) that differ in a defined proportion (PA.sre.quant) from those of presence points. A Surface Range Envelop model is first run over the species of interest (see bm_SRE), and pseudo-absences are selected outside this envelop.
This case is appropriate when all the species climatic niche has been sampled, otherwise it may lead to over-optimistic model evaluations and predictions !
disk: pseudo-absences are selected within circles around presence points defined by PA.dist.min and PA.dist.max distance values (in meters). It allows to select pseudo-absence points that are not too close to (avoid same niche and pseudo-replication) or too far (localized sampling strategy) from presences.
user.defined: pseudo-absences are defined in advance and given as data.frame through the PA.user.table parameter.

Value

A BIOMOD.formated.data object that can be used to build species distribution model(s) with the BIOMOD_Modeling function.
print/show, plot and summary functions are available to have a summary of the created object.

Author(s)

Damien Georges, Wilfried Thuiller

Examples

library(terra)

# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)

# Select the name of the studied species
myRespName <- 'GuloGulo'

# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])

# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]

# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)



# ---------------------------------------------------------------#
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)
myBiomodData
summary(myBiomodData)
plot(myBiomodData)


# ---------------------------------------------------------------#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
# 
# # Format Data with pseudo-absences : random method
# myBiomodData.r <- BIOMOD_FormatingData(resp.var = myResp.PA,
#                                        expl.var = myExpl,
#                                        resp.xy = myRespXY,
#                                        resp.name = myRespName,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 1000,
#                                        PA.strategy = 'random')
# 
# # Format Data with pseudo-absences : disk method
# myBiomodData.d <- BIOMOD_FormatingData(resp.var = myResp.PA,
#                                        expl.var = myExpl,
#                                        resp.xy = myRespXY,
#                                        resp.name = myRespName,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 500,
#                                        PA.strategy = 'disk',
#                                        PA.dist.min = 5,
#                                        PA.dist.max = 35)
# 
# # Format Data with pseudo-absences : SRE method
# myBiomodData.s <- BIOMOD_FormatingData(resp.var = myResp.PA,
#                                        expl.var = myExpl,
#                                        resp.xy = myRespXY,
#                                        resp.name = myRespName,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 1000,
#                                        PA.strategy = 'sre',
#                                        PA.sre.quant = 0.025)
# 
# # Format Data with pseudo-absences : user.defined method
# myPAtable <- data.frame(PA1 = ifelse(myResp == 1, TRUE, FALSE),
#                         PA2 = ifelse(myResp == 1, TRUE, FALSE))
# for (i in 1:ncol(myPAtable)) myPAtable[sample(which(myPAtable[, i] == FALSE), 500), i] = TRUE
# myBiomodData.u <- BIOMOD_FormatingData(resp.var = myResp.PA,
#                                        expl.var = myExpl,
#                                        resp.xy = myRespXY,
#                                        resp.name = myRespName,
#                                        PA.strategy = 'user.defined',
#                                        PA.user.table = myPAtable)
# 
# myBiomodData.r
# myBiomodData.d
# myBiomodData.s
# myBiomodData.u
# plot(myBiomodData.r)
# plot(myBiomodData.d)
# plot(myBiomodData.s)
# plot(myBiomodData.u)


# ---------------------------------------------------------------#
# # Select multiple sets of pseudo-absences
#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
# 
# # Format Data with pseudo-absences : random method
# myBiomodData.multi <- BIOMOD_FormatingData(resp.var = myResp.PA,
#                                            expl.var = myExpl,
#                                            resp.xy = myRespXY,
#                                            resp.name = myRespName,
#                                            PA.nb.rep = 4,
#                                            PA.nb.absences = c(1000, 500, 500, 200),
#                                            PA.strategy = 'random')
# myBiomodData.multi
# summary(myBiomodData.multi)
# plot(myBiomodData.multi)

biomod2 documentation built on June 22, 2024, 10:56 a.m.