s2bak: Build sightings-only or S2 species distribution models for...

s2bakR Documentation

Build sightings-only or S2 species distribution models for multiple species.

Description

fit.s2bak.so function fits SDMs for each provided species within the same system, using a specified SDM approach (or the default which are GAMs from the mgcv package). Parallelization is possible when processing each SDM, with the default being 1 core.

The fit.s2bak.s2 function fits SDMs using species sightings, background sites and survey sites, differentiating between them using a binary survey_var predictor, denoting sightings-only (1) or survey (0).

Saving SDMs to the output may be computationally intensive, particularly with large datasets and many species. To reduce issues with memory, readout and the version = "short" may be used, which does not output the fitted models but instead saves it to the directory specified in readout.

fit.s2bak.bak fit a bias-adjustment kernel (BaK) for a fitted sightings-only SDM. Provides three models: Location bias, species bias and the final bias-adjustment kernel. The user can specify nodelling function for species nad location biases, while the final bias-adjustment kernel functions as a generalized linear model (glm) that combines model predictions with the output from the other two models.

Build S2BaK from top to bottom. Has functionality for parallelization, but the default is 1 core.

Fits SO models for all species, S2 models for species with survey data and a BaK model for adjusted predictions.

Assumes that all columns/variables in 'data' are relevant for the location bias model.

Usage

fit.s2bak.s2(
  formula,
  data_obs,
  data_surv = NA,
  obs,
  surv = NA,
  sdm.fun,
  background = NA,
  nbackground = 10000,
  overlapBackground = TRUE,
  survey_var = "so",
  addSurvey = FALSE,
  index = NA,
  ncores = 1,
  readout = NA,
  version = c("full", "short")[1],
  ...
)

fit.s2bak.so(
  formula,
  data_obs,
  obs,
  sdm.fun,
  background = NA,
  nbackground = 10000,
  overlapBackground = TRUE,
  index = NA,
  ncores = 1,
  readout = NA,
  version = c("full", "short")[1],
  ...
)

fit.s2bak.bak(
  formula_site,
  formula_species,
  predictions,
  data_surv,
  surv,
  trait,
  bak.fun,
  predict.bak.fun,
  truncate = c(1e-04, 0.9999),
  index = NA,
  bak.arg = list()
)

fit.s2bak(
  formula,
  formula_survey = NA,
  formula_site,
  formula_species,
  data_obs,
  data_surv = NA,
  obs,
  surv = NA,
  trait,
  sdm.fun,
  predict.fun,
  bak.fun,
  predict.bak.fun,
  truncate = c(1e-04, 0.9999),
  background = NA,
  nbackground = 10000,
  overlapBackground = TRUE,
  bak.arg = list(),
  addSurvey = TRUE,
  index = NA,
  ncores = 1,
  readout = NA,
  version = c("full", "short")[1],
  ...
)

Arguments

formula

Formula for the model functions. Assumes the structure follows "Y ~ X". Alternatively, a named list of formulas can be provided corresponding to species names. In this case, species will be fit using their corresponding formula. The response variable can have any name, as the function name the column accordingly.

For s2bak.s2 models, the survey_var should be specified in the formula. Otherwise, addSurvey can be set to TRUE to add it as an additional predictor.

data_obs

A data.frame containing the covariates used for fitting s2bak.so and s2bak.s2 models with sightings data. The index of the data.frame linking sites to observations should correspond to the indices in obs.

data_surv

A data.frame containing the covariates used for fitting s2bak.s2 models with survey data. The index of the data.frame linking sites to survey presences should correspond to the indices in surv. Default is NA, as survey data is not necessary to fit s2bak.so models.

obs

A data.frame of species observations, with a column for species name (must be labelled 'species') and column of index of observations to reflect presences. If the index column name is not found in 'data', it assumes row number.

surv

A data.frame of species presences for the survey data used to fit s2bak.s2 models (optional otherwise), with a column for species name (must be labelled 'species') and column of index of observations to reflect presences. If the index column name is not found in 'data', it assumes row number. It will add the an additional binary predictor to the formula(s), so, denoting whether a sites is sightings-only (1) or survey data (0). If left as NA, it will fit the SDMs as presence-only models with the function of choice.

sdm.fun

Model (as function) used for fitting. The function must have the formula as their first argument, and 'data' as the parameter for the dataset (including presences and background sites within the data.frame).

background

Background sites (pseudo-absences) used to fit the presence-only model, provided as a vector of indices of data (following the same column name as observations). If the index column name is not found in 'data', it assumes row number within 'data'. If left as NA, it will randomly sample 'nbackground' sites, with or without overlap ('overlapBackground'). Currently, only one set of background sites can be used.

nbackground

Number of background sites to sample. Only applies if background = NA.

overlapBackground

Whether sampled background sites that overlap with observations should be included. By default, it allows overlap. If FALSE, number of background sites may be less than specified or provided.

survey_var

Character name for the predictor variable determining a site is sightings-only (1) or survey data (0), the default is called "so". The column is automatically created within the function, and is used to with the formulas.

addSurvey

Whether the binary variable survey_var should be added to formula(s). If survey data is not provided or if survey_var is already in the formula, then survey_var will not be added to the formula(s) regardless of addSurvey = TRUE. If there is survey data and addSurvey = FALSE, then 'so' will not be added, and it will throw a warning.

index

Name of the columns for indexing environment data.frame with species sightings/survey data. If left as index = NA, then it will assume row number.

ncores

Number of cores to fit the SDMs, default is 1 core but can be automatically set if ncores=NA. If ncores > number of available cores - 1, set to the latter.

readout

Directory to save fitted SDMs and background sites. If NA, it will not save any SDMs. Provides an additional output that shows where the SDM is saved (with file name). The output in this directory can later be used in other functions such as predict.s2bak.s2.

version

Whether the SDMs should be included in the output. With "short", no the fitted SDMs are not provided. Setting to "full" (default) will output the list with all SDMs. Setting to "short" and combined with readout, can considerably reduce RAM usage while saving the progress so far, which is useful when dealing with many species or large datasets.

...

Other arguments that are passed to the SDM function (sdm.fun).

formula_site

Formula for fitting survey site bias, with locational bias as a function of spatial predictions. The response variable, bias, is generated and therefore its variable name can be anything.

formula_species

Formula for fitting species bias, with species bias as a function of species traits. The response variable is generated and therefore its name can be anything.

predictions

Sightings-only (SO) model predictions over the survey sites for all species, beyond those found in the survey data, as a matrix with columns for each species and rows for each site.

trait

Full trait data for the species predictions, as a data.frame with 'species' as a column and relevant traits for the remainder. Like with the predictions, the species in the dataset do not necessarily have to possess survey data, but will be used in the final adjustment model as final output.

bak.fun

Model function for fitting bias adjustment model (e.g., glm).

predict.bak.fun

Model function for predicting bias adjustment model (e.g., predict.glm). Needs to match bak.fun

truncate

Numeric minimum and maximum range of predicted values. Values very close to zero or one cannot be meaningfully distinguished, however these extreme values may have disproportionally large consequences on likelihoods due to logit transformation.

bak.arg

Additional arguments for bak.fun.

formula_survey

For fit.s2bak, we can specify a separate formula for fit.s2bak.s2. If left as NA it will use formula.

predict.fun

Prediction function for SDM, which must match the model function used for s2bak.s2 and s2bak.so models).

Value

An object of class "s2bak.s2", providing fitted SDMs for each species based on the provided SDM modelling approach. The primary difference between SO and S2 models are the additional data points from the survey data, and an additional binary predictor 'so' which denotes whether the data is from presence-background (1) or presence-absence data (0).

Bias adjustment models, the kernels (location and species), as a second-order GLM.

An S2BaK class object containing S2, SO and BaK.


leung-lab/s2bak documentation built on March 1, 2023, 9:10 a.m.