grts | R Documentation |
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
grts(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
sites_legacy
An sf object containing legacy sites. This is
NULL
if legacy sites were not included in the sample.
sites_base
An sf object containing the base sites. This is NULL
if n_base
equals the number of legacy sites.
sites_over
An sf object containing the reverse hierarchically
ordered replacement sites. This is NULL
if no reverse hierarchically
ordered replacement sites were included in the sample.
sites_near
An sf object containing the nearest neighbor
replacement sites. This is NULL
if no nearest neighbor replacement
sites were included in the sample.
design
A list documenting the specifications of this sampling design.
This can be checked to verify your sampling design ran as intended.
call
The original function call.
stratum_var
The name of the stratification variable in sframe
.
This equals NULL
if no stratification is used.
stratum
The unique strata. This equals "None"
if
the sampling design is unstratified.
n_base
The base sample size per stratum.
seltype
The selection type per stratum.
caty_var
The name of the unequal probability variable in sframe
.
This equals NULL
if no unequal probability variable is used.
caty_n
The expected sample sizes for each level of the
unequal probability grouping variable per stratum. This equals
NULL
when seltype
is not "unequal"
.
aux_var
The name of the proportional probability (auxiliary) variable in sframe
.
This equals NULL
if no proportional probability variable is used.
legacy
A logical variable indicating whether legacy sites
were included in the sample.
legacy_stratum_var
The name of the stratification variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no stratification variable is used.
legacy_caty_var
The name of the unequal probability variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites were used but
no unequal probability variable is used.
legacy_aux_var
The name of the proportional probability (auxiliary)
variable in legacy_sites
.
Omitted if legacy sites are not used. This equals NULL
if legacy sites
were used but no proportional probability variable is used.
mindis
The minimum distance requirement desired. This
is NULL
when no minimum distance requirement was applied.
n_over
The reverse hierarchically ordered replacement
site sample sizes per stratum. If seltype
is unequal
,
this represents the expected sample sizes. This is NULL
when no reverse hierarchically ordered replacement sites were selected.
n_near
The number of nearest neighbor replacement sites
desired. This is NULL
when no nearest neighbor replacement
sites were selected.
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
siteID
A site identifier (as named using the DesignID
and SiteBegin
arguments to grts()
).
siteuse
Whether the site is a legacy site (Legacy
), base
site (Base
), reverse hierarchically ordered replacement site
(Over
), or nearest neighbor replacement site (Near
).
replsite
The replacement site ordering. replsite
is
None
if the site is not a replacement site, Next
if it is
the next reverse hierarchically ordered replacement site to use, or
Near_
, where the word following _
indicates the ordering of sites closest to
the originally sampled site.
lon_WGS84
Longitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
lat_WGS84
Latitude coordinates using the WGS84 coordinate
system (EPSG:4326). Only given if coordinates are projected.
X
Longitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
Y
Latitude coordinates using the provided coordinate
system. Only given if coordinates are not projected (i.e., they are geographic or NA).
stratum
A stratum indicator. stratum
is None
if the sampling design was unstratified. If the sampling design was stratified
,
stratum
indicates the stratum.
wgt
The design weight.
ip
The site's original inclusion probability (the reciprocal)
of (wgt
).
caty
An unequal probability grouping indicator. caty
is None
if the sampling design did not use unequal inclusion probabilities.
If the sampling design did use unequal inclusion probabilities, caty
indicates the unequal probability level.
aux
The auxiliary proportional probability variable. This
column is only returned if seltype
was proportional
in the
original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Tony Olsen olsen.tony@epa.gov
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
irs
to select a sample that is not spatially balanced
## Not run:
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.