To install sampbias, you can use the devtools
package:
require(devtools) install_github(repo = "azizka/sampbias")
library(sampbias) library(terra)
sampbias calculates spatial bias in a species occurrence data set based on two input files:
The example files for this tutorial are provided with the package, descriptions and help for all functions are accessible via ?Functionname
, for instance ?calculate_bias
.
Species occurrences can be provided as a data.frame
including a minimum of three columns named "species", "decimalLongitude", and "decimalLatitude". sampbias can also directly work with data downloaded from the Global Biodiversity Information Facility data portal (www.gbif.org). We will use the records of mammals from Borneo provided with the package (GBIF, 2016, doi.org/10.15468/dl.7fg4zx.) as example in this tutorial.
The input data can be easily loaded into R from a .txt. or .csv file, for instance using the read.csv()
function.
#loading a text file to R occ <- read.csv(system.file("extdata", "mammals_borneo.csv", package = "sampbias"), sep = "\t")
Note: sampbias evaluates the bias by comparing the observed records to a random sampling. This means that the tool is designed for multi-species data sets and not single species data sets where the distribution of records might reflect ecological preferences. In general, the more species and the more records, the more reliable the results will be, but the package is also capable of dealing with sparsely sampled data sets (see main text for simulation results).
sampbias evaluates the distribution of the sampled species occurrences in relation to geographic features that might bias sampling effort. These are usually related to accessibility or means of travel. sampbias includes a set of default gazetteers for cities, airports, roads and rivers (from https://www.naturalearthdata.com/), which can give an estimate of bias for large and medium scale analyses. For higher resolutions custom gazetteers will yield better results, and should be provided via the gaz
option of calculate_bias
function as a list of SpatVectors
objects. These can be easily loaded into R from standard shape files using the terra
package:
cit <- terra::vect(system.file("extdata/Borneo_major_cities.shp", package = "sampbias")) roa <- terra::vect(system.file("extdata/Borneo_major_roads.shp", package = "sampbias")) gazetteers <- list(cities = cit, roads = roa)
See ?terra::vect
on how to create a these object from a table of coordinates.
A sampbias analysis is run in one line of code via the calculate_bias
function:
bias.out <- calculate_bias(x = occ, gaz = gazetteers)
In addition to the input from above, calculate_bias
offers options to customize analyses, of which the most important ones are shown in Table 1. See ?calculate_bias
for a detailed description of all options, and the section "Changing default settings" of this tutorial for a description and examples for the "restricted sample" and "inp_raster" options. It is of special importance to adapt the raster resolution to the extent of the study area via the res
or inp_raster
options.
|Option | Description|
|---|---|
|res|the raster resolution for the distance calculation to the geographic features and the data visualization in decimal degrees. The default is to one degree, but higher resolution will be desirable for most analyses. res
together with the extent of the study area determine computation time and memory requirements.|
|restrict_sample|a SpatialPolygons
or SpatialPolygonsDataFrame
object. If provided the area for the bias test will be restricted to raster cells within these polygons (and the extent of the sampled points in x). Make sure to use adequate values for res
. Default = NULL.|
|terrestrial|logical. If TRUE, the empirical distribution (and the output maps) are restricted to terrestrial areas. Default = TRUE.|
|inp_raster|a SpatRaster
object. This option enables to directly provide the grid for the bias calculation, instead of specifying a resolution. This can be used to work with equal area grids or other coordinate references systems.
The sampbias model uses a Markov chain Monte Carlo approach to estimate the model parameters. The default MCMC runs for 100,000 generations with a 20% burn-in, which has proven sufficient for most analyses. We suggest users to verify that the effective sample size of the posterior estimates is large enough (e.g. > 200) using the ESS
function of the LaplacesDemon
package (LaplacesDemon::ESS(bias.out$bias_estimate)
).
The output of calculate_bias
is a list of different R objects.
|Object|Description|
|---|---|
|summa|A list of summary statistics, including the total number of occurrence points in x, the total number of species in x, the extent of the output rasters as well as the settings for res, binsize, and restrict_area used in the analyses.|
|occurrences|a raster indicating occurrence records per grid cell, with resolution res
.|
|bias_estimate| A data.frame
with the posterior estimates of the model parameters, including bias weights|
|distance_raster|A SpatRaster
, with the calculated distances for all bias factors (e.g. cities, roads, etc.) as used in the model|
The package includes summary and plot methods for an easy exploration of the results. The plot method generates a boxplot of the posterior estimates of the weights for each biasing factor (Fig. 1).
summary(bias.out)
## Number of occurences: 6262 ## Raster resolution: 1 ## Convexhull: ## Geographic extent: ## SpatExtent : 108, 120, -5, 7 (xmin, xmax, ymin, ymax) ## Bias weights: ## bias_weight std_dev ## w_cities 9.164000e-03 1.777083e-04 ## w_roads 2.834735e-03 2.212042e-04 ## hp_rate 2.221860e+02 1.239156e+02
plot(bias.out)
As the last step, it is possible to project the bias effects into space and map them to identify areas with particular high bias, for instance, to design future field campaigns (Fig. 2).
proj <- project_bias(bias.out) map_bias(proj, type = "log_sampling_rate")
\newpage{}
The calculate_bias
function provides various options to customize analyses (see above). For two important options---providing a custom study area and using an equal area grid---we provide examples below.
By default sampbias uses a rectangle defined by the minimum and maximum longitude and latitude values in the input point records to define the study area. It may be desirable to change this, for instance if this rectangle comprises a set of different habitats such as rainforest and desert, which may differ in species richness and therefore the number of records expected. sampbias enables user-defined study areas via the restrict_sample
option of the calculate_bias
function, which uses objects of the class SpatVector
. If the user provides a custom study area, all occurrence records in the dataset falling outside this area will be disregarded. This could be simple custom polygons as in the area_example
dataset provided with sampbias:
data(area_example) data(borneo) plot(borneo) plot(area_example, col = "red", add = TRUE)
bias.out <- calculate_bias(x = occ, gaz = gazetteers, restrict_sample = area_example)
More complex restrictions are possible, for instance limiting the study area to the ecoregion "Borneo montane rain forests" (Olson, et al. 2001). For diagnostics the sample raster can be plotted using the plot_raster
argument.
data(ecoregion_example) plot(borneo) plot(ecoregion_example, col = "red", add = TRUE)
bias.out <- calculate_bias(x = occ, gaz = gazetteers, restrict_sample = ecoregion_example)
By default sampbias uses a latitude/longitude grid. This is a reasonable approximation for local and regional scales. However, since the area of the cells in such a grid differs with latitude, an equal area grid---for instance using a Behrmann projection---is often a better choice for large-scale studies. When using an equal area grid, it is necessary for the input occurrence records as well as the gazetteer files to share the same projection and coordinate reference system.
#projection wgs84 <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" # an example for an equal area raster data(ea_raster) ea_raster <- terra::unwrap(ea_raster) # reproject the occurrence coordinates ## select coordinates from the occ data.frame and create spatial object ea_occ <- terra::vect(occ, geom = c("decimalLongitude", "decimalLatitude"), crs="+proj=longlat +datum=WGS84") ## transform to the same CRS as the equal area grid ea_occ <- terra::project(ea_occ, terra::crs(ea_raster)) ## retransform into a data.frame ea_occ <- data.frame(species = occ[, 1], terra::crds(ea_occ)) # reproject gazetteers ## set the CRS in case it is not defined. Make sure to know the correct CRS. terra::crs(gazetteers[[1]]) <- terra::crs(gazetteers[[2]]) <- wgs84 #transform to the new CRS ea_gaz <- lapply(gazetteers, terra::project, y = terra::crds(ea_raster)) # run sampbias ea_bias <- calculate_bias(x = ea_occ, gaz = ea_gaz, inp_raster = ea_raster)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.