spbal - Spatially Balanced Sampling

library(bookdown)
knitr::opts_chunk$set( 
  fig.dim = c(8, 6),
  collapse = TRUE,
  comment = "#>"
)
library(spbal)
library(ggplot2)
library(gridExtra)

spbal

This vignette is intended to provide details of the functions and methods provided in the spbal package. The spbal package has three spatially balanced sample designs: BAS, Halton Frames and HIP. This document illustrates how samples are drawn using spbal, including master and stratified sampling applications. The package is freely available from CRAN. This document was created with spbal version $1.0.0$.

This vignette is divided into the following sections:

Lets start by looking at Simple Random Sampling (SRS).

Simple Random Sampling (SRS)

Draw a random sample without replacement from a population.

spbal::SRS()

This function invokes base::sample() to draw a random sample using a user specified random seed. Returned is a list of random positive integers of length sample_size, in the range $1$ to total_rows, these can then be used to index the original population data.

The following parameters are supported:

All parameter values must be numeric and have values greater than zero, sample_size must be less than total_rows.

spbal::SRS() code example

Function spbal::SRS() returns a list of size sample_size.

# Create a random sample of 20 with a seed of 99 from a population of 100.
rand_samp <- spbal::SRS(seed = 99, total_rows = 100, sample_size = 20)
rand_samp

Balanced Acceptance Samples (BAS)

BAS draws spatially balanced samples from areal resources. To draw BAS samples, spbal requires a study region shapefile and the region's bounding box. An initial sample size is also needed, which can be easily increased or decreased within spbal for master sampling applications.

spbal::BAS()

The spbal::BAS() function supports the following input parameters:

The spbal::BAS() function returns a list containing three variables:

The sample points are returned in the form of a simple feature collection of POINT objects. They have the following attributes:

spbal::BAS() code examples

First, lets define our shapefile, select a region from within it and define a bounding box around it:

```{R BASex1a}

Use the North Carolina shapefile supplied in the sf R package.

shp_file <- sf::st_read(system.file("shape/nc.shp", package="sf")) shp_gates <- shp_file[shp_file$NAME == "Gates",] shp_gates

Vertically aligned master sample bounding box.

bb <- spbal::BoundingBox(shapefile = shp_gates)

### Equal probability BAS sample.

This section uses the Gates county loaded above. The first example design is an equal probability BAS sample of $n = 20$ survey sites from Gates, North Carolina, U.S.A. The call is below:

```{R BASex2a,warning=FALSE, message=FALSE}
set.seed(511)
n_samples <- 20
result <- spbal::BAS(shapefile = shp_gates,
                     n = n_samples,
                     boundingbox = bb)
BAS20 <- result$sample
BAS20

The site locations are illustrated below:

```{R BASex1c}

1. Plot using ggplot

gates <- sf::st_as_sf(shp_gates, coords = c("longitude", "latitude"))

ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = BAS20, size = 3, aes(label = spbalSeqID, vjust = -0.5, geometry = geometry, x = after_stat(x), y = after_stat(y), color = "ID"), stat = "sf_coordinates") + geom_point(data = BAS20, aes(geometry = geometry, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "black"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Spatially balanced sample using a equal probability BAS design.", subtitle = "Total of 20 survey sites from Gates, North Carolina", caption = "spbal Equal probability BAS sample.")

### Increase the BAS sample size.

To increase this BAS sample of $20$ sites to $50$ sites, we take the first $50$ points from the random-start Halton sequence that BAS used to draw its sample. **spbal** achieves this by specifying the random seed integers from the first sample using the following call.

```{R BASex4a,message=FALSE, warning=FALSE}
n_samples <- 50
result2 <- spbal::BAS(shapefile = shp_gates,
                      n = n_samples,
                      boundingbox = bb,
                      seeds = result$seed)
BAS50 <- result2$sample
BAS50

# Check, first n_samples points in both samples must be the same.
all.equal(BAS20$geometry, BAS50$geometry[1:20])

The first 20 sites in BAS50 are identical to BAS20. The sample size can be increased arbitrarily from a specified seed, making BAS well-suited for master sampling applications. The critical point is that any sub-sample of consecutive points from a BAS master sample is a bona fide BAS sample. Users can also specify their seed point using seeds = c(u1, u2) to generate a specific random-start Halton sequence, such as resurrecting a previously used BAS sample.

Plot the BAS point ordering, BAS20 and BAS50 using ggplot.

## Convert foreign object to an sf object for ggplot.
gates <- sf::st_as_sf(shp_gates, coords = c("longitude", "latitude"))
ggplot() +
  geom_sf() +
  geom_sf(data = gates, size = 4, shape = 23) +
  geom_text(data = BAS50,
            size = 2,
            aes(label = BAS50$spbalSeqID,
                vjust = -0.7,
                geometry = geometry,
                x = after_stat(x),
                y = after_stat(y), color = "BAS Order"),
            stat = "sf_coordinates") +
  geom_point(data = BAS50,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour = "BAS n = 20 + 30"),
                 stat = "sf_coordinates")+
  geom_point(data = BAS20,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour = "BAS n = 20"),
                 stat = "sf_coordinates"
  ) +
 scale_color_manual(values = c("red","blue","black"),
                    name = "Legend") +  
 theme_bw() +
 labs(x = "Latitude",
      y = "Longitude",
      title = "BAS samples from the Gates county.")

Stratified BAS Sample.

It is possible to create stratified samples from BAS master samples using spbal. To implement this design, the input shapefile must have labelled sub-regions or strata, all of which should be enclosed within a single bounding box.

An example of a stratified design from a BAS master sample is shown below. In this design, $50$ sites are selected from three counties in the state of North Carolina, U.S.A, with $20$ sites from Gates, $20$ from Northampton, and $10$ from Hertford. The code for this stratified design is provided below, and the survey sites within each stratum can be seen in the figure below.

```{R BASex6a, results='hide', message=FALSE} strata <- c("Gates", "Northampton", "Hertford") n_strata <- c("Gates" = 20, "Northampton" = 20, "Hertford" = 10) shp_subset <- shp_file[shp_file[["NAME"]] %in% strata,]

Define the bounding box for Gates, Northampton and Hertford using **spbal**.

```r
bb_strata <- spbal::BoundingBox(shapefile = shp_subset)

Draw the stratified sample from the BAS master sample and display the first three sites in the Gates region.

set.seed(511)
result3 <- spbal::BAS(shapefile = shp_subset,
                      n = n_strata,
                      boundingbox = bb_strata,
                      stratum = "NAME")
BASMaster <- result3$sample
gates_samp <- BASMaster[BASMaster[["NAME"]] %in% "Gates",]
gates_samp 

Plot the stratified sample using ggplot.

strat <- sf::st_as_sf(shp_subset, coords = c("longitude", "latitude"))

gates_samp       <- BASMaster[BASMaster[["NAME"]] %in% "Gates",]
northampton_samp <- BASMaster[BASMaster[["NAME"]] %in% "Northampton",]
hertford_samp    <- BASMaster[BASMaster[["NAME"]] %in% "Hertford",]

ggplot() +
  geom_sf() +
  geom_sf(data = strat, size = 4, shape = 23) +
  geom_point(data = gates_samp,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Gates"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = northampton_samp,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Northampton"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = hertford_samp,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Hertford"),
             stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("red", "blue", "green"),
                     name = "Legend") +  # Define color scale and legend title
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "Stratified samples from BAS master samples.",
       subtitle = "50 sites from North Carolina, U.S.A, 20 from Gates, \n20 from Northampton and 10 from Hertford.",
       caption = "spbal Stratified BAS sample.")

Increasing or decreasing strata sample sizes.

The stratum sample sizes can be easily increased or decreased using points from the master sample. For instance, one may double the number of survey sites in the Hertford stratum (from $10$ to $20$) for the stratified design above using the seeds input. The call is below.

```{R BASex11f} n_strata <- c("Gates" = 20, "Northampton" = 20, "Hertford" = 20) result4 <- spbal::BAS(shapefile = shp_subset, n = n_strata, boundingbox = bb_strata, seeds = result3$seed, stratum = "NAME") BASMaster <- result4$sample gates_samp2 <- BASMaster[BASMaster[["NAME"]] %in% "Gates",] gates_samp2

The $20$ sites in **gates_samp2** are identical to **gates_samp** because the same seed point was used.

```r
# Ensure gates_samp is equal to the first 10 sites in gates_samp2. Must return TRUE.
all.equal(gates_samp$geometry[1:20], gates_samp2$geometry[1:20])

Plot the increased stratified sample using ggplot.

gates_samp4       <- BASMaster[BASMaster[["NAME"]] %in% "Gates",]
northampton_samp4 <- BASMaster[BASMaster[["NAME"]] %in% "Northampton",]
hertford_samp4    <- BASMaster[BASMaster[["NAME"]] %in% "Hertford",]

ggplot() +
  geom_sf() +
  geom_sf(data = strat, size = 4, shape = 23) +
  geom_point(data = gates_samp4,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Gates"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = northampton_samp4,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Northampton"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = hertford_samp4,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Hertford"),
             stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("red", "blue", "green"),
                     name = "Legend") +  # Define color scale and legend title
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "Stratified samples from BAS master samples.",
       subtitle = "60 sites from North Carolina, U.S.A, 20 from Gates, \n20 from Northampton and 20 from Hertford.",
       caption = "spbal Stratified BAS samples.")

Non-Overlapping panel design.

The final BAS example design with spbal is a non-overlapping panel design for surveying over time. This design has three panels containing $20$ survey sites from the county of Gates, North Carolina, U.S.A. Each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 60$.

```{R BASex14g} set.seed(511)

n_panels <- c(20, 20, 20) result5 <- spbal::BAS(shapefile = shp_gates, panels = n_panels, boundingbox = bb) BASpanel <- result5$sample BASpanel

The sites in panel $2$ are obtained using the **getPanel()** function as follows.

```r
panel_2 <- spbal::getPanel(BASpanel, 2)
panel_2 <- panel_2$sample
panel_2

Plot the sample sites in each panel using ggplot.

# to extract the sample points associated with a specific panelid, we can use the following code:
panel_1 <- BASpanel[BASpanel$panel_id == 1,]
panel_2 <- BASpanel[BASpanel$panel_id == 2,]
panel_3 <- BASpanel[BASpanel$panel_id == 3,]

# or use the spbal::getPanel() function.
#panel_1 <- spbal::getPanel(BASpanel, 1)
#panel_2 <- spbal::getPanel(BASpanel, 2)
#panel_3 <- spbal::getPanel(BASpanel, 3)

ggplot() +
  geom_sf() +
  geom_sf(data = gates, size = 4, shape = 23) +
  geom_point(data = panel_1,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-1 (n = 20)"),
                 stat = "sf_coordinates"
  ) +
  geom_point(data = panel_2,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-2 (n = 20)"),
                 stat = "sf_coordinates"
  ) +
  geom_point(data = panel_3,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-3 (n = 20)"),
                 stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("red", "blue", "black"),
                                     name = "Panels") + 
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "Panel Design from BAS Master Sample",
       subtitle = "Total of 60 survey sites from Gates, North Carolina, U.S.A",
       caption = "spbal Non-overlapping Panel Design.")

Overlapping panel design.

Panel overlaps are also possible in spbal. The following call sets the last five elements from panel $1$ as the first five elements in panel $2$ and the last five elements from panel $2$ as the first five elements in panel $3$ ($50$ unique sites). Because the overlaps are sub-samples of consecutive BAS points, they are well-spread over the study region. Furthermore, each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 50$.

set.seed(511)
n_panels <- c(20, 20, 20)
n_panel_overlap <- c(0, 5, 5)
result6 <- spbal::BAS(shapefile = shp_gates,
                      panels = n_panels,
                      panel_overlap = n_panel_overlap,
                      boundingbox = bb)
BASpanel <- result6$sample

The sites in panel $2$ are obtained using the getPanel() function as follows. The first five sites in panel $2$ are also in panel $1$ so panel_id = $1, 2$ for these sites.

panel_2 <- spbal::getPanel(BASpanel, 2)
panel_2 <- panel_2$sample
panel_2[1:5,]

Plot our overlapped samples.

panel_1 <- spbal::getPanel(BASpanel, 1)
panel_2 <- spbal::getPanel(BASpanel, 2)
panel_3 <- spbal::getPanel(BASpanel, 3)

ggplot() +
  geom_sf() +
  geom_sf(data = gates, size = 4, shape = 23) +
  geom_jitter(width = 10, height = 10) +
  geom_point(data = sf::st_jitter(panel_1$sample, 0.02),
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-1"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = sf::st_jitter(panel_2$sample, 0.02),
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-2"),
             stat = "sf_coordinates"
  ) +
  geom_point(data = panel_3$sample,
             aes(geometry = geometry,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Panel-3"),
             stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("red", "blue", "green"),
                     name = "Overlapped Panels") +  # Define color scale and legend title
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "Spatially balanced sample using a overlapping panel \ndesign, of three panels, each containing 20 survey sites.",
       subtitle = "Total of 50 survey sites from Gates, North Carolina. Panel Overlap = (0, 5, 5)",
       caption = "spbal Overlapping Panel Design.")

Halton Frame (HF)

Halton Frames discretize an areal resource into a spatially ordered grid, where samples of consecutive frame points are spatially balanced.

spbal::HaltonFrame()

The spbal::HaltonFrame() function supports the following input parameters:

The HaltonFrame() function returns a list containing five variables:

The sample points in hf.pts.shp are returned in the form of a simple feature collection of POINT objects. It has the following features:

spbal::HaltonFrame() code example

To generate Halton Frames, spbal requires a study region shapefile and the region's bounding box. To illustrate Halton Frames, we discretize the Gates study region into a coarse grid using $B = 2^{J_1} \times 3^{J_2} = 2^3 \times 3^2$ (a $9$ by $8$ grid). The call is below.

set.seed(511)
result6 <- spbal::HaltonFrame(shapefile = shp_gates, 
                              J = c(3, 2),
                              boundingbox = bb)
Frame <- result6$hf.pts.shp
Grid <- result6$hg.pts.shp

Spatially ordered $9$ by $8$ Halton grid $(B = 2^{3} * 3^{2} = 72)$ for the Gates county in North Carolina, U.S.A.

```{R HFex1b}

Grid - Halton grid over Gates county.

ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_text(data = Grid, size = 3, aes(label = ID, vjust = -0.5, geometry = x, x = after_stat(x), y = after_stat(y), color = "ID"), stat = "sf_coordinates") + geom_point(data = Grid, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Samples"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "black"), name = "Legend") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "A Halton grid, B = 2^3 * 3^2, over Gates, North Carolina, U.S.A.", subtitle = "A total of 432 points.", caption = "spbal Halton Grid example.")

and the corresponding Halton Frame for the Gates county in North Carolina in the U.S.A.

```{R HFex1c}
# Frame - Halton frame over Gates county.
ggplot() +
  geom_sf() +
  geom_sf(data = gates, size = 4, shape = 23) +
  geom_text(data = Frame,
            size = 3,
            aes(label = ID,
                vjust = -0.5,
                geometry = x,
                x = after_stat(x),
                y = after_stat(y), color = "ID"),
            stat = "sf_coordinates") +
  geom_point(data = Frame,
             aes(geometry = x,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Samples"),
             stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("red", "black"),
                     name = "Legend") +  # Define color scale and legend title
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "A Halton frame, B = 2^3 * 3^2, over Gates, North Carolina, U.S.A.",
       subtitle = "Showing sample points within the Gates shapefile.",
       caption = "spbal Halton Frame example.")

Halton frame fine grid

The second example frame generates a fine grid using an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame over Gates and draws a spatially balanced sample of $n = 25$. The call is below.

set.seed(511)
result7 <- spbal::HaltonFrame(shapefile = shp_gates,
                              J = c(8, 5),
                              boundingbox = bb)
Frame <- result7$hf.pts.shp

Draw $n = 25$ sites from the Halton Frame using the getSample() function.

The first 25 sites from a $B = 2^{8} * 3^{5}$ Halton Frame ($62,208$ grid points covering Gates), the numbering shows the frame's ordering, where sub-samples of consecutively numbered points are spatially balanced samples.

n_samples <- 25
FrameSample <- getSample(shapefile = Frame, 
                        n = n_samples)
FrameSample <- FrameSample$sample
FrameSample[1:10, c("x", "spbalSeqID")]

Plot the sample and number the points in the frame's order (given by spbalSeqID) using ggplot.

ggplot() +
  geom_sf() +
  geom_sf(data = gates, size = 4, shape = 23) +
  geom_text(data = FrameSample, 
            size = 3,
            aes(label = spbalSeqID,
                vjust = -0.5,
                geometry = x,
                x = after_stat(x),
                y = after_stat(y), color = "Frame Order"),
                stat = "sf_coordinates") +
  geom_point(data = FrameSample, 
             aes(geometry = x,
                 x = after_stat(x),
                 y = after_stat(y), colour= "Samples"),
                 stat = "sf_coordinates"
  ) +
  scale_color_manual(values = c("black", "red"),
                     name = "Legend") +  # Define color scale and legend title
  theme_bw() +
  labs(x = "Latitude",
       y = "Longitude",
       title = "First 25 sites from the Gates Halton Frame.")

Spatially balanced sample from a random position

It is also possible to draw a spatially balanced sample from a random position in the frame using the getSample() function. The call for an $n = 20$ sample is given below. Note the use of the randomStart = TRUE parameter to generate the sample from a random position.

n_samples <- 20
FrameSample <-getSample(shapefile = Frame, 
                        n = n_samples, 
                        randomStart = TRUE)
FrameSample <- FrameSample$sample
FrameSample[1:10, c("x", "spbalSeqID")]

Halton frame overlapping panel design.

The following example generates an overlapping panel design for surveying over time with three panels, each with $20$ sites with a panel overlap of five.

```{R HFex3a} set.seed(511)

Three panels, of 20 samples each.

panels <- c(20, 20, 20)

second panel overlaps first panel by 5, and third panel

overlaps second panel by 5.

panel_overlap <- c(0, 5, 5)

generate the sample.

samp <- spbal::HaltonFrame(J = c(4, 3), boundingbox = bb, panels = panels, panel_overlap = panel_overlap, shapefile = shp_gates)

get halton frame data from our sample.

samp3 <- samp$hf.pts.shp samp3

panelid <- 1 olPanel_1 <- spbal::getPanel(samp3, panelid)

panelid <- 2 olPanel_2 <- spbal::getPanel(samp3, panelid)

panelid <- 3 olPanel_3 <- spbal::getPanel(samp3, panelid)

Plot using ggplot2

ggplot() + geom_sf() + geom_sf(data = gates, size = 4, shape = 23) + geom_jitter(width = 10, height = 10) + geom_point(data = sf::st_jitter(olPanel_1$sample, 0.02), aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-1"), stat = "sf_coordinates" ) + geom_point(data = sf::st_jitter(olPanel_2$sample, 0.02), aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-2"), stat = "sf_coordinates" ) + geom_point(data = olPanel_3$sample, aes(geometry = x, x = after_stat(x), y = after_stat(y), colour= "Panel-3"), stat = "sf_coordinates" ) + scale_color_manual(values = c("red", "blue", "green"), name = "Overlapped Panels") + # Define color scale and legend title theme_bw() + labs(x = "Latitude", y = "Longitude", title = "Halton Frame sample using a overlapping panel \ndesign, of three panels, each containing 20 survey sites.", subtitle = "Total of 50 survey sites from Gates, North Carolina, U.S.A. Panel Overlap = (0, 5, 5)", caption = "spbal Halton Frame Overlapping Panel Design.")

### Halton Frame Stratified sample.

Stratified samples can be selected from a Halton Frame with **spbal**. To implement this design, the input shapefile must have labelled sub-regions or strata, which must be enclosed within a single bounding box. We provide a stratified example design for $50$ sites are selected from three counties in the state of North Carolina, U.S.A - Gates ($n = 20$), Northampton ($n = 30$), and Hertford ($n = 10$).

Load the shape file for the Gates, Northampton and Hertford regions.

```r
strata <- c("Gates", "Northampton", "Hertford")
n_strata <- c("Gates" = 20, "Northampton" = 30, "Hertford" = 10)
shp_subset <- shp_file[shp_file[["NAME"]] %in% strata,]

Define the bounding box for Gates, Northampton and Hertford using spbal.

bb_strata <- spbal::BoundingBox(shapefile = shp_subset)

Generate an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame for the Gates, Northampton and Hertford regions.

set.seed(511)
result9 <- spbal::HaltonFrame(shapefile = shp_subset,
                              N = n_strata,
                              J = c(8, 5),
                              boundingbox = bb_strata,
                              stratum = "NAME")
Frame <- result9$hf.pts.shp

Use the getSample() function to get $10$ points from the Hertford region.

n_samples <- 10
hertford_samp <- spbal::getSample(Frame, 
                                  n = n_samples, 
                                  strata = "Hertford", 
                                  stratum = "NAME")
hertford_samp <- hertford_samp$sample
hertford_samp[1:10, c("NAME", "spbalSeqID", "x")]

Panel Design from Halton Frame

Panel designs from Halton Frames are also possible in spbal. The following call sets the last five elements from panel $1$ as the first five elements in panel $2$ and the last five elements from panel $2$ as the first five elements in panel $3$ (an over-lapping design with $50$ unique sites). Each panel is a spatially balanced sample of size $n = 20$, and the collection of sites from all panels is a spatially balanced sample of size $n = 50$.

Generate an approximately regular $B = 2^8 \times 3^5 = 256 \times 243$ Halton Frame for Gates.

set.seed(511)
panels <- c(20, 20, 20)
n_panel_overlap <- c(0, 5, 5)
result10 <- spbal::HaltonFrame(shapefile = shp_gates,
                               panels = panels,
                               panel_overlap = n_panel_overlap,
                               J = c(8, 5),
                               boundingbox = bb)
HaltonFramePanel <- result10$hf.pts.shp

Now we obtain the sites in panel $1$ using the getPanel() function.

panelid <- 1
SitesPanel_1 <- spbal::getPanel(HaltonFramePanel, panelid)
SitesPanel_1 <- SitesPanel_1$sample
SitesPanel_1[1:10, c("x", "spbalSeqID", "panel_id")]

Halton Iterative Partitioning (HIP)

HIP draws spatially balanced samples and over-samples from point resources by partitioning the resource into boxes with the same nested structure as Halton boxes. The spbal parameter iterations defines the number of boxes used in the HIP partition and should be larger than the sample size but less than the population size. The iterations parameter also defines the number of units available in the HIP over-sample, where the over-sample contains one unit from each box in the HIP partition.

Halton iterative partitioning (HIP) extends Basic acceptance sampling (BAS) to point resources. It partitions the resource into $B \geq n$ boxes that have the same nested structure as in BAS, but different sizes. These boxes are then uniquely numbered using a random-start Halton sequence of length $B$. The HIP sample is obtained by randomly drawing one point from each of the boxes numbered $1, 2, ..., n$.

HIP draws spatially balanced samples from point resources by partitioning the resource into $B = 2^{J_1} * 3^{J_2}$ nested boxes.

spbal::HIP()

The spbal::HIP() function supports the following input parameters:

The HIP() function returns a list containing five variables:

spbal::HIP() code example

The following sample code is used to demonstrate the use of the spbal::HIP() function. Here we are generating $20$ points from a population of $5,000$ (random) points with $7$ levels of partitioning ($4$ in the first dimension and $3$ in the second) to give $2^4$ * $3^3$ = $32 * 27$, resulting in $864$ boxes.

# set random seed
base::set.seed(511)

# define HIP parameters.
pop <- matrix(stats::runif(5000*2), nrow = 5000, ncol = 2)
n <- 20
its <- 7

# Convert the population matrix to an sf point object.
sf_points <- sf::st_as_sf(data.frame(pop), coords = c("X1", "X2"))
dim(sf::st_coordinates(sf_points))

# generate HIP sample.
result <- spbal::HIP(population = sf_points, 
                     n = n, 
                     iterations =  its)

# HaltonIndex
HaltonIndex <- result$HaltonIndex
# verify all spread equally, should be TRUE.
(length(unique(table(HaltonIndex))) == 1)

# Population Sample
HIPsample <- result$sample
HIPsample

The HIP over-sample has $432$ sites (one site from each of the $432$ boxes in the HIP partition).

HIPoverSample <- result$overSample
HIPoverSample[1:10, c("geometry", "spbalSeqID")]

OverSampleSize <- dim(HIPoverSample)[1]
OverSampleSize

The first $n$ points in HIPoverSample are identical to HIPsample.

# compare the HIP sample and oversample, they will be the same.
all.equal(HIPsample$geometry[1:n], HIPoverSample$geometry[1:n])

Forcing a minimum distance between sites

HIP selects spatially balanced samples, but the algorithm can choose sites closer together than desired, mainly if site density varies significantly over the study region. To enforce a minimum distance between sites, spbal has a minimum distance between sites parameter minRadius. Setting minRadius $= r > 0$ draws an ordered subset $S$ from the HIP over-sample, where the size of $S$ depends on the iterations and minRadius parameters. Taking the first $n \leq |S|$ sites from $S$ gives a well-spread sample of $n$ sites that satisfies the minimum distance between sites requirement.

HIP Panel Design

HIP Panel designs are also possible in spbal. Use the panels and n_panel_overlap parameters as demonstrated for the BAS and HaltonFrame functions.

More Details

The following section provides more details on the spbal package functions getSample and getPanel. These functions only accept objects created by spbal as input via the shapefile parameter.

getSample

The spbal getSample function can be applied to outputs produced by the spbal BAS and HaltonFrame functions.

Non-Stratified samples

Let 'shp' be the output from BAS or HaltonFrame, then

Stratified samples

In the case where the input is a stratified sample, both the $strata$ and $stratum$ parameters must be specified. If they are specified then the $n$ parameter need not be specified, if it is, it will be ignored. In this case getSample will return all points of a stratified sample. The parameter $randomStart$ is not supported when dealing with stratified samples. If specified it will be ignored.

getPanel

The spbal getPanel function can be applied to outputs produced by the spbal BAS and HaltonFrame functions.

Let 'shp' be the output from BAS or HaltonFrame, then

Use the R help function for more information on these functions.

The End



Try the spbal package in your browser

Any scripts or data that you put into this service are public.

spbal documentation built on April 4, 2025, 2:05 a.m.