View source: R/BIOMOD_FormatingData.R
BIOMOD_FormatingData | R Documentation |
This function gathers together all input data needed (xy, presences/absences, explanatory variables, and the same for evaluation data if available) to run biomod2 models. It allows to select pseudo-absences if no absence data is available, with different strategies (see Details).
BIOMOD_FormatingData(
resp.name,
resp.var,
expl.var,
dir.name = ".",
resp.xy = NULL,
eval.resp.var = NULL,
eval.expl.var = NULL,
eval.resp.xy = NULL,
PA.nb.rep = 0,
PA.nb.absences = 1000,
PA.strategy = NULL,
PA.dist.min = 0,
PA.dist.max = NULL,
PA.sre.quant = 0.025,
PA.user.table = NULL,
na.rm = TRUE,
filter.raster = FALSE
)
resp.name |
a |
resp.var |
a |
expl.var |
a |
dir.name |
(optional, default |
resp.xy |
(optional, default |
eval.resp.var |
(optional, default |
eval.expl.var |
(optional, default |
eval.resp.xy |
(optional, default |
PA.nb.rep |
(optional, default |
PA.nb.absences |
(optional, default |
PA.strategy |
(optional, default |
PA.dist.min |
(optional, default |
PA.dist.max |
(optional, default |
PA.sre.quant |
(optional, default |
PA.user.table |
(optional, default |
na.rm |
(optional, default |
filter.raster |
(optional, default |
This function gathers and formats all input data needed to run biomod2 models. It
supports different kind of inputs (e.g. matrix
,
SpatVector
, SpatRaster
)
and provides different methods to select pseudo-absences if needed.
Concerning explanatory variables and XY coordinates :
if SpatRaster
, RasterLayer
or RasterStack
provided for expl.var
or eval.expl.var
,
biomod2 will extract
the corresponding values from XY coordinates provided :
either through resp.xy
or eval.resp.xy
respectively
or resp.var
or eval.resp.var
, if provided as
SpatVector
or SpatialPointsDataFrame
Be sure to give the objects containing XY coordinates in the same projection system than the raster objects !
if data.frame
or matrix
provided for expl.var
or
eval.expl.var
,
biomod2 will simply merge it (cbind
)
with resp.var
without considering XY coordinates.
Be sure to give explanatory and response values in the same row order !
Concerning pseudo-absence selection (see bm_PseudoAbsences
) :
if both presence and absence data are available, and there is enough absences :
set PA.nb.rep = 0
and no pseudo-absence will be selected.
if no absence data is available, several pseudo-absence repetitions
are recommended (to estimate the effect of pseudo-absence selection), as well as high
number of pseudo-absence points.
Be sure not to select more pseudo-absence points than maximum number of pixels in
the studied area !
it is possible now to create several pseudo-absence repetitions with different
number of points, BUT with the same sampling strategy.
biomod2 models single species at a time (no multi-species). Hence, resp.var
must be a uni-dimensional object (either a vector
, a one-column matrix
,
data.frame
, a SpatVector
(without associated
data - if presence-only), a SpatialPoints
(if presence-only),
a SpatialPointsDataFrame
or SpatVector
object),
containing values among :
1
: presences
0
: true absences (if any)
NA
: no information point (might be used to select pseudo-absences if any)
If no true absences are available, pseudo-absence selection must be done.
If resp.var
is a non-spatial object (vector
, matrix
or
data.frame
), XY coordinates must be provided through resp.xy
.
If pseudo-absence points are to be selected, NA
points must be provided in order to
select pseudo-absences among them.
Factorial variables are allowed, but might lead to some pseudo-absence strategy or models
omissions (e.g. sre
).
Although biomod2 provides tools to automatically divide dataset into calibration and
validation parts through the modeling process (see CV.[..]
parameters in
BIOMOD_Modeling
function ; or bm_CrossValidation
function
), it is also possible (and strongly advised) to directly provide two independent
datasets, one for calibration/validation and one for evaluation
bm_PseudoAbsences
)If no true absences are available, pseudo-absences must be selected from the
background data, meaning data there is no information whether the species of
interest occurs or not. It corresponds either to the remaining pixels of the expl.var
(if provided as a SpatRaster
or RasterSatck
)
or to the points identified as NA
in resp.var
(if expl.var
provided as a matrix
or data.frame
).
Several methods are available to do this selection :
all points of initial background are pseudo-absence candidates.
PA.nb.absences
are drawn randomly, for each PA.nb.rep
requested.
pseudo-absences have to be selected in conditions (combination of explanatory
variables) that differ in a defined proportion (PA.sre.quant
) from those of
presence points. A Surface Range Envelop model is first run over the species of
interest (see bm_SRE
), and pseudo-absences are selected outside this envelop.
This case is appropriate when all the species climatic niche has been sampled,
otherwise it may lead to over-optimistic model evaluations and predictions !
pseudo-absences are selected within circles around presence points defined by
PA.dist.min
and PA.dist.max
distance values (in meters). It allows to select
pseudo-absence points that are not too close to (avoid same niche and pseudo-replication)
or too far (localized sampling strategy) from presences.
pseudo-absences are defined in advance and given as data.frame
through the PA.user.table
parameter.
A BIOMOD.formated.data
object that can be used to build species distribution
model(s) with the BIOMOD_Modeling
function.
print/show
,
plot
and
summary
functions
are available to have a summary of the created object.
Damien Georges, Wilfried Thuiller
bm_PseudoAbsences
, BIOMOD_Modeling
Other Main functions:
BIOMOD_EnsembleForecasting()
,
BIOMOD_EnsembleModeling()
,
BIOMOD_LoadModels()
,
BIOMOD_Modeling()
,
BIOMOD_Projection()
,
BIOMOD_RangeSize()
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
# ---------------------------------------------------------------#
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
myBiomodData
summary(myBiomodData)
plot(myBiomodData)
# ---------------------------------------------------------------#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
#
# # Format Data with pseudo-absences : random method
# myBiomodData.r <- BIOMOD_FormatingData(resp.var = myResp.PA,
# expl.var = myExpl,
# resp.xy = myRespXY,
# resp.name = myRespName,
# PA.nb.rep = 4,
# PA.nb.absences = 1000,
# PA.strategy = 'random')
#
# # Format Data with pseudo-absences : disk method
# myBiomodData.d <- BIOMOD_FormatingData(resp.var = myResp.PA,
# expl.var = myExpl,
# resp.xy = myRespXY,
# resp.name = myRespName,
# PA.nb.rep = 4,
# PA.nb.absences = 500,
# PA.strategy = 'disk',
# PA.dist.min = 5,
# PA.dist.max = 35)
#
# # Format Data with pseudo-absences : SRE method
# myBiomodData.s <- BIOMOD_FormatingData(resp.var = myResp.PA,
# expl.var = myExpl,
# resp.xy = myRespXY,
# resp.name = myRespName,
# PA.nb.rep = 4,
# PA.nb.absences = 1000,
# PA.strategy = 'sre',
# PA.sre.quant = 0.025)
#
# # Format Data with pseudo-absences : user.defined method
# myPAtable <- data.frame(PA1 = ifelse(myResp == 1, TRUE, FALSE),
# PA2 = ifelse(myResp == 1, TRUE, FALSE))
# for (i in 1:ncol(myPAtable)) myPAtable[sample(which(myPAtable[, i] == FALSE), 500), i] = TRUE
# myBiomodData.u <- BIOMOD_FormatingData(resp.var = myResp.PA,
# expl.var = myExpl,
# resp.xy = myRespXY,
# resp.name = myRespName,
# PA.strategy = 'user.defined',
# PA.user.table = myPAtable)
#
# myBiomodData.r
# myBiomodData.d
# myBiomodData.s
# myBiomodData.u
# plot(myBiomodData.r)
# plot(myBiomodData.d)
# plot(myBiomodData.s)
# plot(myBiomodData.u)
# ---------------------------------------------------------------#
# # Select multiple sets of pseudo-absences
#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
#
# # Format Data with pseudo-absences : random method
# myBiomodData.multi <- BIOMOD_FormatingData(resp.var = myResp.PA,
# expl.var = myExpl,
# resp.xy = myRespXY,
# resp.name = myRespName,
# PA.nb.rep = 4,
# PA.nb.absences = c(1000, 500, 500, 200),
# PA.strategy = 'random')
# myBiomodData.multi
# summary(myBiomodData.multi)
# plot(myBiomodData.multi)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.