ppmData: Develop a quadrature scheme using quasi-random sampling for...

View source: R/ppm_data.R

ppmDataR Documentation

Develop a quadrature scheme using quasi-random sampling for spatial point processes.

Description

ppmData is a package for setting up quadrature to implement spatial Poisson Point process models and extensions. The approach uses quasi- random sampling (Grafston & Tille, 2013, Foster et al., 2018) to generate a quadrature scheme for numerical approximation of a Poisson point process model (Berman & Turner 1992; Warton & Shepard 2010). Quasi-random sampling quadrature are form of spatially-balanced survey design or point stratification that aims to reduce the frequency of placing samples close to each other (relative to pseudo-random or grid designs). A quasi-random quadrature design improves efficiency of background point sampling (and subsequent modelling) by reducing the amount of spatial auto-correlation between data implying that each sample is providing as much unique information as possible (Grafston & Tille, 2013, Foster et al., 2018) and thus reducing low errors for geostatistical prediction (Diggle & Ribeiro, 2007). Because the quasi-random design is not on a regular grid we use Dirichlet tessellation to generate polygons for each point in the quadrature scheme. Areal weights are then derived from these polygons.

Usage

ppmData(
  presences,
  window = NULL,
  covariates = NULL,
  npoints = NULL,
  coord = c("X", "Y"),
  mark.id = "SpeciesID",
  quad.method = c("quasi.random", "pseudo.random", "grid"),
  interpolation = c("simple", "bilinear"),
  unit = c("geo", "m", "km", "ha"),
  na.rm = FALSE,
  control = list()
)

Arguments

presences

a three column matrix or data.frame object giving the coordinates of each species' presence in (should be a matrix of nsites * 3) with the three columns being c("X","Y","SpeciesID"), where X is longitude, Y is latitude and SpeciesID is a factor which associated each occurrence to a species.

window

SpatRaster a raster object giving the region where to generate the quadrature scheme. Windows with NA cells are ignored and masked out of returned data. If NULL, a rectangle bounding the extent of presences will be used as the default window.

covariates

SpatRaster A terra raster object containing covariates for modelling the point process. These layers should match the resolution and extent of the window provided. If NULL, only the coordinates of the presences and quadrature points are returned for the ppmData object.

npoints

Integer The number of quadrature points to generate. If NULL, the number of quadrature points is calculated based on linear scaling. In reality, the number of quadrature points needed to approximate the log-likelihood function will depend on the data and likelihood function being approximated. Typically, the more quadrature the better the estimate, but there is a trade off between computational efficiency and accuracy. See Warton & Shepard (2010) or Renner et al., 2015 for useful discussions on the location and number of quadrature points required to converge a ppm likelihood.

coord

Character These are the users name of site coordinates. The default is c('X','Y'). This should match the name of the coordinates in the presences data set.

mark.id

Character This is the column name of the mark ID. The default is "SpeciesID". But this should be changed to match the user's data. If this column contains multiple "species" then a marked quadrature scheme will be created.

quad.method

Character The quadrature generation method. Default is "quasi.random" for quasi-random, "pseudo.random" for pseudo-random (regular random) and "grid" for a regular grid at a set resolution (with respect to the original window resolution).

interpolation

Character The interpolation method to use when extracting covariate data at a presence or quadrature location. Default is "bilinear", can also use "simple", this is based on the terra package extract.

unit

Character The type of area to return. The default is "geo" and returns the area based on the euclidean distance between geographic coordinates. This will default to the values of the raster and presence coordinate system. Alternatively, meters squared "m", kilometers squared "km" , or hectares "ha" can be used.

na.rm

Boolean Remove NA data from covariates. Only works for single species models.

control

list A list of control parameters for using ppmData. See details for uses of control parameters.

Details

The approach uses quasi-random sampling to generate a quadrature scheme based (e.g Berman & Turner 1992; Warton & Shepard 2010; Foster et al, 2017). The weights each quasi-random point in the quadrature scheme is calculated using a Dirichlet tessellation (Turner 2020). To improve computational efficiency we have rewritten the Delaunay triangulation and Dirichlet tessellation in c++ using a sweep algorithm. The control has a bunch of parameters you can use to tweek the ppmData object.

  • quasirandom.samples integer This sets the total number of samples to consider in the BAS step (rejection sampling). The default is set to NULL and the function internally generates 10 times the total number of quadrature points needed. This means if 10000 quadrature points are required for ppmData, then a halton sequence of 100000 quasi-random numbers are drawn and then thinned according to the inclusion probabilities. The more quasirandom.samples selected the slower the quasi-random quadrature scheme will be to generate.

  • buffer.NA boolean If extract from extract returns NA for point extract, do you want us to attempt to user buffer to calculate cells which are NA.

  • buffer.size numeric If you call 'buffer.NA' what is the range of the buffer in meters.

  • mc.cores integer The number of cores to use in the processing. default is parallel::detectCores()-1

  • quiet boolean If TRUE, do not print messages. Default is FALSE.

Author(s)

Skipton Woolley <skip.woolley@csiro.au> & Scott Foster <scott.foster@data61.csiro.au>

References

Diggle, P. J., P. J. Ribeiro, Model-based Geostatistics. Springer Series in Statistics. Springer, 2007.

Foster, S.D., Monk, J., Lawrence, E., Hayes, K.R., Hosack, G.R. and Przeslawski, R., 2018. Statistical considerations for monitoring and sampling. Field manuals for marine sampling to monitor Australian waters, pp.23-41.

Grafstrom, Anton, and Yves Tille. Doubly balanced spatial sampling with spreading and restitution of auxiliary totals. Environmetrics 24.2 (2013): 120-131.

Warton, D. I., and L. C. Shepherd. Poisson point process models solve the pseudo-absence problem for presence-only data #'in ecology. The Annals of Applied Statistics 4.3 (2010): 1383-1402.

Examples

## Not run: 
library(ppmData)
library(terra)
path <- system.file("extdata", package = "ppmData")
lst <- list.files(path=path,pattern='*.tif',full.names = TRUE)
preds <- rast(lst)
window <- preds[[1]]
presences <- subset(snails,SpeciesID %in% "Tasmaphena sinclairi")
quad <- ppmData(npoints = 1000, presences=presences, window = window, covariates = preds)

## End(Not run)

skiptoniam/qrbp documentation built on May 13, 2023, 2:08 a.m.