View source: R/subsample_circle.R
cookies | R Documentation |
Spatially subsample a dataset to produce samples of standard area and extent.
cookies(
dat,
xy,
iter,
nSite,
r,
weight = FALSE,
crs = "epsg:4326",
output = "locs"
)
dat |
A |
xy |
A vector of two elements, specifying the name or numeric position
of columns in |
iter |
The number of spatial subsamples to return |
nSite |
The quota of unique locations to include in each subsample. |
r |
Numeric value for the radius (km) defining the circular extent of each spatial subsample. |
weight |
Whether sites within the subsample radius should be drawn
at random ( |
crs |
Coordinate reference system as a GDAL text string, EPSG code,
or object of class |
output |
Whether the returned data should be two columns of
subsample site coordinates ( |
The function takes a single location as a starting (seed) point and
circumscribes a buffer of r
km around it. Buffer circles that span
the antemeridian (180 degrees longitude) are wrapped as a multipolygon
to prevent artificial truncation. After standardising radial extent, sites
are drawn within the circular extent until a quota of nSite
is met.
Sites are sampled without replacement, so a location is used as a seed point
only if it is within r
km distance of at least nSite-1
locations.
The method is introduced in Antell et al. (2020) and described in
detail in Methods S1 therein.
The probability of drawing each site within the standardised extent is
either equal (weight = FALSE
) or proportional to the inverse-square
of its distance from the seed point (weight = TRUE
), which clusters
subsample locations more tightly.
For geodetic coordinates (latitude-longitude), distances are calculated along great circle arcs. For Cartesian coordinates, distances are calculated in Euclidian space, in units associated with the projection CRS (e.g. metres).
A list of length iter
. Each list element is a
data.frame
or matrix
(matching the class of dat
)
with nSite
observations. If output = 'locs'
(default), only the coordinates of subsampling locations are returned.
If output = 'full'
, all dat
columns are returned for the
rows associated with the subsampled locations.
If weight = TRUE
, the first observation in each returned subsample
data.frame
corresponds to the seed point. If weight = FALSE
,
observations are listed in the random order of which they were drawn.
Antell2020divvy
clustr()
# generate occurrences: 10 lat-long points in modern Australia
n <- 10
x <- seq(from = 140, to = 145, length.out = n)
y <- seq(from = -20, to = -25, length.out = n)
pts <- data.frame(x, y)
# sample 5 sets of 3 occurrences within 200km radius
cookies(dat = pts, xy = 1:2, iter = 5,
nSite = 3, r = 200)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.