getRegion: Get region
In fuzzySim: Fuzzy Similarity in Species Distributions

getRegion

R Documentation

Get region

Description

This function computes a polygon around a set of point coordinates under given criteria, which may be useful for delimiting background or (pseudo)absence regions for computing species distibution models. Some of the 'type' options, especially those involving clusters or inverse distance, attempt to address survey bias by making smaller polygons around areas with fewer or more isolated points.

Usage

getRegion(
  pres.coords,
  type = "width",
  clust_dist = 100,
  dist_mult = 1,
  width_mult = 0.5,
  weight = FALSE,
  CRS = NULL,
  dist_mat = NULL,
  dist_method = "auto",
  verbosity = 2,
  plot = TRUE
)

Arguments

`pres.coords`	a SpatVector of points, or an object inheriting class 'data.frame' with 2 columns containing, respectively, the x and y, or longitude and latitude coordinates (in this order!) of the points where species presence was recorded.
`type`	character indicating which procedure to use for defining the region around 'pres.coords'. Options are: "width": a buffer whose radius is the minimum diameter of the 'pres.coords' spatial extent (computed with `terra::width()`), multiplied by 'width_mult'; "mean_dist": a buffer whose radius is the mean pairwise `terra::distance()` among 'pres.coords', multiplied by 'dist_mult'; "inv_dist": a buffer whose radius is inversely proportional to the sum of the distances from each point to all other points in 'pres.coords' (a rough measure of how isolated each point is, possibly indicating an opportunistic record in a sparsely surveyed area); "clust_mean_dist": a different buffer around each cluster of 'pres.coords' (clusters computed with `stats::hclust()`, method = "simple") and then `stats::cutree()` with h = clust_dist*1000), sized according to the mean pairwise distance of each cluster's 'pres.coords'. "clust_width": a different buffer around each cluster of 'pres.coords' (clusters computed as described for 'clust_mean_dist'), sized according to the `terra::width()` of each cluster's 'pres.coords'.
`clust_dist`	if 'type' involves clusters, numeric value specifying the distance threshold (in km) within which points are clustered together. Default 100.
`dist_mult`	if type = "mean_dist" or "clust_mean_dist", multiplier of the mean pairwise point distance to use for the `terra::buffer()` radius around each cluster. Default 1.
`width_mult`	if type = "width" or "clust_width", multiplier of the width to use for the `terra::buffer()` radius. Default 0.5.
`weight`	logical (used only if 'type' includes clusters) indicating whether to weigh the radius of the buffer around each cluster proportionally to the number of points that it includes. Default FALSE; if set to TRUE, clusters with fewer points (possibly indicating more sparsely surveyed areas) get proportionally smaller buffers than the mean distances among them.
`CRS`	coordinate reference system of 'pres.coords' (if it is not a SpatVector with a defined CRS already), in one of the following formats: WKT/WKT2, <authority>:<code>, or PROJ-string notation (see `terra::crs()`).
`dist_mat`	optional matrix of pairwise distances among 'pres.coords', to use (if 'type' includes a string "dist" or "clust") for efficiency instead of computing a new one. Should normally be computed with `terra::distance()`, `geodist::geodist()`, or another function that takes the Earth's curvature into account.
`dist_method`	argument to pass to [distMat()] (if 'dist_mat' is NULL) specifying the method for distance calculation. The default is "auto", or "haversine" if 'type' includes the string "clust", to avoid different clusters generating a different automatic method selection.
`verbosity`	integer indicating the amount of messages to display along the process. The default is 2, for all available messages.
`plot`	logical (default TRUE) indicating whether to plot the resulting region (in yelow), together with the input 'pres.coords' (black points, or points coloured according to their cluster) and a label with the number of points in each cluster (if 'type' involves clusters).

Details

Most methods for computing species distribution models require predictor values for regions beyond those with species occurrence records, i.e. background or (pseudo)absence areas. The extent (as well as the spatial resolution) of these regions has a strong effect on model predictions. Ideally, they should include the areas that are within the reach of the species AND were reasonably surveyed (though you can further refine the latter with selectAbsences and an optional biasLayer). While sometimes we have a large enough and delimited area that we can use (e.g. when modelling a region where a national or regional distribution atlas is available), often we need to approximate the areas that appear to be both reasonably surveyed and within the species' reach.

Mind that no automated procedure can properly address all possible issues related to uneven data collection, or properly conform to all possible species distribution and survey patterns. Mind also that the output region from this function does not consider geographical barriers, or other factors that should also be taken into account when delimiting a region for modelling.

It is thus recommended to try different values for 'type' and associated parameters; judge for yourself which one provides the most plausible approximation to the surveyed region accessible to your target species; and possibly post-process (i.e. further edit) the resulting region in light of the available knowledge of that species' distribution, survey patterns and study region.

Value

SpatVector polygon delimiting a region around 'pres.coords'

Author(s)

A. Marcia Barbosa

Examples

## Not run: 
# you can run these examples if you have 'terra' and 'geodata' installed

# download example data:

occs <- geodata::sp_occurrence("Triturus", "pygmaeus")

occs_sv <- terra::vect(occs, geom = c("lon", "lat"), crs = "EPSG:4326")

cntry <- geodata::world(path = tempdir())


terra::plot(occs_sv)

terra::plot(cntry, lwd = 0.2, add = TRUE)


# compute regions with some different methods:

reg1 <- fuzzySim::getRegion(occs_sv)

terra::plot(cntry, lwd = 0.2, add = TRUE)


reg2 <- fuzzySim::getRegion(occs_sv, type = "inv_dist")

terra::plot(cntry, lwd = 0.2, add = TRUE)

terra::plot(reg2, lwd = 4, border = "orange", add = TRUE)


reg3 <- fuzzySim::getRegion(occs_sv, type = "clust_width", weight = TRUE,
width_mult = 0.3)

terra::plot(cntry, lwd = 0.2, add = TRUE)

terra::plot(reg3, lwd = 4, border = "orange", add = TRUE)


# note it is up to the user to pre-process the data (e.g. by removing erroneous
# records) and/or post-process the region (e.g. by erasing islands, countries,
# or continents that are not accessible to the target species).

## End(Not run)

fuzzySim documentation built on March 22, 2025, 3 a.m.