calculate_bias: Evaluating Sampling Bias in Species Distribution Data

View source: R/calculate_bias.R

calculate_biasR Documentation

Evaluating Sampling Bias in Species Distribution Data

Description

The major function of the package, calculating the bias effect of sampling bias due to geographic structures, such as the vicinity to cities, airports, rivers and roads. Results are projected to space, and can be compared numerically.

Usage

calculate_bias(
  x,
  gaz = NULL,
  res = 1,
  buffer = NULL,
  restrict_sample = NULL,
  terrestrial = TRUE,
  inp_raster = NULL,
  mcmc_rescale_distances = 1000,
  mcmc_iterations = 1e+05,
  mcmc_burnin = 20000,
  mcmc_outfile = NULL,
  prior_q = c(1, 0.01),
  prior_w = c(1, 1),
  plot_raster = FALSE,
  verbose = FALSE,
  run_null_model = FALSE,
  use_hyperprior = TRUE
)

Arguments

x

an object of the class data.frame, with one species occurrence record per line, and at least three columns, named ‘species’, ‘decimalLongitude’, and ‘decimalLatitude’.

gaz

a list of geographic gazetteers as SpatVector or sf. If NULL, a set of default gazetteers, representing large scale occurrence of airports, cities, rivers, and roads is used. See Details.

res

numerical. The raster resolution for the distance calculation to the geographic features and the data visualization, in decimal degrees. The default is to one degree, but higher resolution will be desirable for most analyses. res together with the extent of the input data determine computation time and memory requirements.

buffer

numerical. The size of the geographic buffer around the extent of ras for the distance calculations in degrees, to account for geographic structures neighbouring the study area (such as a road right outside the study area). Should be a multiple of res. Default is to res * 10. See Details.

restrict_sample

a SpatVector object. If provided the area for the bias test will be restricted to raster cells within these polygons (and the extent of the sampled points in x). Make sure to use adequate values for res. Default = NULL.

terrestrial

logical. If TRUE, the empirical distribution (and the output maps) are restricted to terrestrial areas. Uses the rnaturalearth:::ne_countries to define what is terrestrial. Default = TRUE.

inp_raster

an object of class SpatRaster. A template raster for the counts and distance calculation. Can be used to provide a special resolution, or for different coordinate reference systems. See vignette.

mcmc_rescale_distances

numerical. rescaling factor for the distance calculation

mcmc_iterations

numerical. the number of iterations for the MCMC, by default 100,000

mcmc_burnin

numerical. the burn-in for the MCMC, default is to 20,000

mcmc_outfile

character string. the path on where to write the results of the MCMC, optional.

prior_q

the gamma prior for the sampling rate $q$, which represents the expected number of occurrences per cell in the absence of biases. In the format c(shape,rate).

prior_w

the gamma prior for the steepness of the Poisson rate decline, such that w approximating 0 results in a null model of uniform sampling rate q across cells. In the format c(shape,rate).

plot_raster

logical. If TRUE, a plot of the occurrence raster is shown for diagnostic purposes. Default = FALSE

verbose

logical. If TRUE, progress is reported. Default = FALSE.

run_null_model

logical. Run a null model with bias weights set to zero.

use_hyperprior

logical. If TRUE a hyperprior on the bias weights is used for regularization to avoid over-parametrization.

Details

The default gazetteers delivered with the package are simplified from http://www.naturalearthdata.com/downloads/. They include major features, and for small scale analyses custom gazetteers should be used.

For computational convenience, the gazetteers are cropped to the extent of the point occurrence data sets. To account for the fact, that, relevant structures might lay directly outside this extent, but still influencing the distribution of samples in the study area, the buffer option, gives the area, around the extent that should be included in the distance calculation.

Visit https://github.com/azizka/sampbias/wiki for more information on distance calculation and the algorithm behind sampbias.

Value

An object of the S3-class ‘sampbias’, which is a list including the following objects:

summa

A list of summary statistics for the sampbias analyses, including the total number of occurrence points in x, the total number of species in x, the extent of the output rasters as well as the settings for res, binsize, and convexhull used in the analyses.

occurrences

a SpatRaster indicating occurrence records per grid cell, with resolution res.

species

a SpatRaster with indicating the number of species per grid cell, with resolution res.

biasmaps

a list of SpatRaster, with the same length as gaz. Each element is the spatial projection of the bias effect for a sources of bias in gaz. The last raster in the list is the average over all bias sources.

biastable

a data.frame, with the estimated bias effect for each bias source in gaz, at the distances specified by biasdist.

Note

Check https://github.com/azizka/sampbias/wiki for a tutorial on sampbias.

See Also

summary.sampbias is.sampbias plot.sampbias

Examples

## Not run: 
  #simulate data
  x <- data.frame(species = rep(sample(x = LETTERS, size = 5), times = 20),
                   decimalLongitude = runif(n = 100, min = 0, max = 20),
                   decimalLatitude = runif(n = 100, min = -4, max = 4))

  out <- calculate_bias(x, terrestrial = TRUE, buffer = 0)
  summary(out)
  plot(out)
  
  
  

## End(Not run)

azizka/sampbias documentation built on Feb. 1, 2024, 7:51 p.m.