knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

The Basics

A spatially balanced sample design is a collection of points on a landscape where data will be collected which, if sampled in the correct order, will be evenly distributed spatially while still being randomly placed. This offers the advantages of random sample design while avoiding one of its weaknesses: random distributions are often clumped, which can introduce bias into data where values are dependent on location. The function spsurvey::grts() is an implementation of one approach to generating spatially balanced sample designs and the one used here.

These designs have a few components:

Input formats

Spatial components

The spatially explicit components are absolutely required. This almost always means a shapefile which is read into R, although as long as you end up with an SPDF in R, the source doesn't matter. Because this package already depends on the package rgdal, these examples will use rgdal::readOGR() to read in shapefiles but if you prefer other options like st::st_read() those will work as well.

If you have no stratification, then a polygon shapefile of the area you're sampling from with a variable containing the sample frame identity is enough, e.g. for a Bureau of Land Management monoitoring program for the Slickrock Canyon National Monument the polygons would be BLM-managed land that within the boundaries of the monument and there would be a variable/attribute field called "SAMPLEFRAME" with the value "SlickrockCanyonNM".

If you're stratifying, the easiest approach is to make the polygons both the sample frame and strata. So for the example of Slickrock Canyon, stratifying by ecosystems makes sense and so the input polygons would have the variable "STRATUM" that contains the values "Grassland", "Shrubland", "PinyonJuniper", and "Other" for the appropriate polygons and a variable called "SAMPLEFRAME" with the value "SlickrockCanyonNM" for all the polygons.

# Read in the polygons, which are stored in the current working directory right now.
frame <- rgdal::readOGR(dsn = getwd(),
                        layer = "slickrockcanyon_strata.shp",
                        # stringsAsFactors defaults to TRUE but factors aren't appropriate for this
                        stringsAsFactors = FALSE)

# The following are quick checks to confirm that the frame actually has everything expected.

# Every polygon belongs to the "SlickrockCanyonNM" sample frame, so there should be no NA value
unique(frame@data[["SAMPLEFRAME"]])

# Every polygon represents one of the strata. Again, no NA values
frame@data[["STRATUM"]]

# And frame is an SPDF. Because these are polygons this should return "Spatial Polygons Data Frame"
class(frame)

Point allocation

Determining point allocation can be done in two different ways, either by using a function to automatically determine how to allocate them proprtionally by area or by creating and reading in a table of exactly how to allocate the points to strata over panels. The former is an easy and recommended way to make a design that minimizes bias, but in some rarer cases manually assigning how effort is distributed is the better choice.

Regardless of approach, the major decisions to make are how many panels, how many base points, and how many oversample points you'll need.

Automatic point allocation

The function allocate_panels() will create a design object (a specially-structured list) that describes how many base and oversample points to draw in each stratum for each panel. This is based on the proportional area of the strata so that if in a design with 150 base points there's a stratum that makes up 50% of the sample frame, then 75 of the base points will be allocated to it while a stratum that makes up 10% will only get 15 of the base points.

In order to do this it requires a few inputs.

# Create a design object for a three-year design with 50 base points per year,
# at least 3 base points per year in each stratum,
# and either (base * 0.33) or 3 oversample points per stratum per year, whichever is larger, for each stratum
design <- allocate_panels(spdf = frame,
                          stratum_field = "STRATUM",
                          panel_names = c("Year1", "Year2", "Year3"),
                          panel_sample_size = 50,
                          points_min = 3,
                          oversample_proportion = 0.33,
                          oversample_min = 3)

Manual point allocation

If you don't want proportionally-allocated points or need finer control (like having one stratum with a higher oversample proportion), you can manually allocate the points. The easiest way to do this is to create a .CSV in Excel or some other software with one row per stratum and the following columns:

# Create a design object by reading in a three-year design in a CSV
# Note that because dataframe is just a filename, it's read in from the current working directory.
# You may specify a full filepath to the file
design <- read_panels(dataframe = "slickrockcanyon_design.csv",
                      stratum_field = "STRATUM",
                      panel_names = c("Year1", "Year2", "Year3"),
                      oversample_field = "Oversample")

Executing a design

Once all the inputs are set, you can draw points using grts_aim() which uses spsurvey::grts() and provides a standard format output. There are three other required inputs:

points <- grts_aim(design_object = design,
                   design_name = "Slickrock Canyon NM",
                   frame = frame,
                   stratum_field = "STRATUM",
                   seed_number = 420)

In practice

Documentation

As a rule, it's a good idea to document what your sample design is in your script so that it doesn't depend exclusively on an external document. A good place to put that is at the head of the script. As an example, the beginning of a script might look like:

#---- DESIGN DOCUMENTATION ----
# Project name: Slickrock Canyon 
# Purpose: Resource Management Plan (RMP) effectiveness
# Sample frame: Slickrock Canyon National Monument (Bureau of Land Management land only)
# Stratification scheme: Grouped Biophysical Setting groups
# Panels: 3
# Base points/panel: 50
# Total base points: 150
# Points allocation: Proportionally by area (minimum 3)
# Oversample proportion: 0.667
# Seed number: 420
# Date of final draw: 2019-04-04

Setup

It's also a good idea to put all the constants for a script at the top where they can be edited readily. This is a good habit to be in for any scripting, but can be particularly useful for sample designs where you may run multiple versions in rapid succession. For any value that will be used as an argument for a function, especially one that will be used more than once (e.g. the filepath to a folder containing input files), store that as an object at the beginning after attaching the packages needed. This will let you adjust values without having to find and change multiple references in the script, possibly missing some.

#---- SETUP ----
library("sample.design")
# rgdal is being attached here to use readOGR()
library("rgdal")
# dplyr and tidyr are used for creating quality assurance objects
library("dplyr")
library("tidyr")

# The project name
project_name <- "NM_SlickrockCanyon"
# The date of the draw in ISO-8601. It's a good practice to append this to the filenames of the results
# This avoids confusing filenames like "NM_SlickrockCanyon_points_final_FINAL3_actualfinal.shp"
draw_date <- "20190404"
# The filepath to the source polygons is different from the filepath to write out to (basic data hygiene)
path_spatial <- "C:/Projects/NewMexico/SlickrockCanyon/data/polygons"
path_output <- "C:/Projects/NewMexico/SlickrockCanyon/output"
# The filename of the polygons used here. The date at the end is because this has gone through multiple revisions over time
frame_filename <- "SlickrockCanyon_sf_20190404"
# The name of the field in the frame polygons containing stratification identities
stratum_field <- "STRATUM"
# Because we're going to allocate proportionally, this is the information required.
# Otherwise we could include the filepath information for a table of design information to manually allocate using.
# Note that this is also captured in a more human-readable format in above.
panel_names <- c("Year1", "Year2", "Year3")
panel_sample_size <- 50
base_min <- 3
oversample_proportion <- 0.667
oversample_min <- 3
# The seed number to make sure this is reproducible. Could be any positive integer
seed_number <- c(420)

Drawing

Once the setup is done, it's just the process of reading in the data, generating the design object, and drawing the points. If the design has manually allocated points, you would simply use read_panels() instead of allocate_panels().

#---- DRAWING ----
# Read in the frame (which is also stratification in this case)
frame  <- readOGR(dsn = path_spatial,
                  layer = frame_filename,
                  stringsAsFactors = FALSE)

# Create the design object using the values defined in the SETUP chunk above
design <- allocate_panels(spdf = frame,
                          stratum_field = stratum_field,
                          panel_names = panel_names,
                          panel_sample_size = panel_sample_size,
                          points_min = base_min,
                          oversample_proportion = oversample_proportion,
                          oversample_min = oversample_min)

# Draw the points using the frame and the design object
points_draw <- grts_aim(design_object = design,
                        design_name = project_name,
                        sp_object = frame,
                        seed_number = seed_number,
                        stratum_field = stratum_field)

Quality Assurance

At this point, creating summary information for future reference and as a quality assurance measure is a good idea. These are not required, but can be very helpful for troubleshooting and evaluating how true-to-expectations the design created is.

#---- QUALITY ASSURANCE ----
# These steps require the packages dplyr and tidyr
# Use dplyr::group_by() and dplyr::summarize() to create a data frame with number
# of base points and over sample points per stratum stratum_summary

# This is an intermediate object that's easier to read if reformatted
stratum_summary_tall <- dplyr::summarize(dplyr::group_by(points_draw@data,
                                                         STRATUM, PANEL),
                                         point_count = n())

# This format is easier and clearer to read
stratum_summary <- tidyr::spread(stratum_summary_tall,
                                 key = PANEL,
                                 value = point_count)

Writing Outputs

Once all the outputs have been created (both in the form of any QA information and the final points), the only remaining step is to write them out.

#---- WRITING OUTPUTS ----

# Note that the filenames here are generated using project_name and draw_date, just to keep things clear!

# Write out the points as a shapefile
writeOGR(obj = points_draw,
         dsn = path_output,
         layer = paste(project_name, "points", draw_date, sep = "_"),
         driver = "ESRI Shapefile",
         overwrite_layer = TRUE)

# Write out the stratum summary created for QA purposes
write.csv(stratum_summary,
          paste(path_output,
                paste0(paste(project_name, "strata_summary", draw_date, sep = "_"), ".csv"),
                sep = "/"))

# design_dataframe() converts a design object into the same format used by read_panels()
# This writes out the design information in a static format and serves as another QA step
write.csv(design_dataframe(design),
          paste(path_output,
                paste0(paste(project_name, "point_allocation", draw_date, sep = "_"), ".csv"),
                sep = "/"))

Complete example

This is the above example as a single chunk instead of broken across multiple. You can use this as a template to create your own sample design script by copying it and modifying the DESIGN DOCUMENTATION and SETUP sections appropriately.

#---- DESIGN DOCUMENTATION ----
# Project name: Slickrock Canyon 
# Purpose: Resource Management Plan (RMP) effectiveness
# Sample frame: Slickrock Canyon National Monument (Bureau of Land Management land only)
# Stratification scheme: Grouped Biophysical Setting groups
# Panels: 3
# Base points/panel: 50
# Total base points: 150
# Points allocation: Proportionally by area (minimum 3)
# Oversample proportion: 0.667
# Seed number: 420
# Date of final draw: 2019-04-04

#---- SETUP ----
library("sample.design")
# rgdal is being attached here to use readOGR()
library("rgdal")
# dplyr and tidyr are used for creating quality assurance objects
library("dplyr")
library("tidyr")

# The project name
project_name <- "NM_SlickrockCanyon"
# The date of the draw in ISO-8601. It's a good practice to append this to the filenames of the results
# This avoids confusing filenames like "NM_SlickrockCanyon_points_final_FINAL3_actualfinal.shp"
draw_date <- "20190404"
# The filepath to the source polygons is different from the filepath to write out to (basic data hygiene)
path_spatial <- "C:/Projects/NewMexico/SlickrockCanyon/data/polygons"
path_output <- "C:/Projects/NewMexico/SlickrockCanyon/output"
# The filename of the polygons used here. The date at the end is because this has gone through multiple revisions over time
frame_filename <- "SlickrockCanyon_sf_20190404"
# The name of the field in the frame polygons containing stratification identities
stratum_field <- "STRATUM"
# Because we're going to allocate proportionally, this is the information required.
# Otherwise we could include the filepath information for a table of design information to manually allocate using.
# Note that this is also captured in a more human-readable format in above.
panel_names <- c("Year1", "Year2", "Year3")
panel_sample_size <- 50
base_min <- 3
oversample_proportion <- 0.667
oversample_min <- 3
# The seed number to make sure this is reproducible. Could be any positive integer
seed_number <- c(420)

#---- DRAWING ----
# Read in the frame (which is also stratification in this case)
frame  <- readOGR(dsn = path_spatial,
                  layer = frame_filename,
                  stringsAsFactors = FALSE)

# Create the design object using the values defined in the SETUP chunk above
design <- allocate_panels(spdf = frame,
                          stratum_field = stratum_field,
                          panel_names = panel_names,
                          panel_sample_size = panel_sample_size,
                          points_min = base_min,
                          oversample_proportion = oversample_proportion,
                          oversample_min = oversample_min)

# Draw the points using the frame and the design object
points_draw <- grts_aim(design_object = design,
                        design_name = project_name,
                        sp_object = frame,
                        seed_number = seed_number,
                        stratum_field = stratum_field)

#---- QUALITY ASSURANCE ----
# These steps require the packages dplyr and tidyr
# Use dplyr::group_by() and dplyr::summarize() to create a data frame with number
# of base points and over sample points per stratum stratum_summary

# This is an intermediate object that's easier to read if reformatted
stratum_summary_tall <- dplyr::summarize(dplyr::group_by(points_draw@data,
                                                         STRATUM, PANEL),
                                         point_count = n())

# This format is easier and clearer to read
stratum_summary <- tidyr::spread(stratum_summary_tall,
                                 key = PANEL,
                                 value = point_count)

#---- WRITING OUTPUTS ----

# Note that the filenames here are generated using project_name and draw_date, just to keep things clear!

# Write out the points as a shapefile
writeOGR(obj = points_draw,
         dsn = path_output,
         layer = paste(project_name, "points", draw_date, sep = "_"),
         driver = "ESRI Shapefile",
         overwrite_layer = TRUE)

# Write out the stratum summary created for QA purposes
write.csv(stratum_summary,
          paste(path_output,
                paste0(paste(project_name, "strata_summary", draw_date, sep = "_"), ".csv"),
                sep = "/"))

# design_dataframe() converts a design object into the same format used by read_panels()
# This writes out the design information in a static format and serves as another QA step
write.csv(design_dataframe(design),
          paste(path_output,
                paste0(paste(project_name, "point_allocation", draw_date, sep = "_"), ".csv"),
                sep = "/"))


nstauffer/sample.design documentation built on May 9, 2022, 3:21 a.m.