In SilviaTerra/forestsamplr: Standard Forest Sampling Design Workups

Forest Sampling Package

The Forest Sampling is a package of functions that calculates workups for sampling designs commonly used in forestry. It is an open source opportunity for collaboration and transparency.

These sampling designs include: Simple Random Sampling Cluster Sampling Stratified Sampling Systematic Sampling Two-Stage Sampling 3P Sampling

Variations are included, described in detail below.

General Information

All functions require data input. For most functions, this data must be a data frame or tibble formatted in the generalized structure. Select functions can act on vectors (summarize_simple_random and summarize_systematic).
In most cases, additional parameters are necessary. Defaults have been provided to help minimize this.

Common parameters, usage information:

attribute: assumes target attribute is called attr. Set attribute to the column header used for the variable of interest. Not required for vector input.
___Tot: total number of the ___ unit in the sampling design. Commonly the total number of clusters or plots in the sampling frame.
desiredConfidence: Percent confidence level (two-sided) expressed as decimal. Default is set to 0.95.

Output:

The outputs of the functions vary, but all produce a data frame containing: mean variance standard error upper limit for the confidence interval * lower limit for the confidence interval

Functions:

The following are the basic functions and their associated sampling strategy.

Function | Sampling Strategy -------------------------------------|--------------------------------------------------------- summarize_all_cluster() | cluster sampling summarize_all_simple_random() | simple random sampling summarize_stratified() | stratified sampling summarize_systematic() | systematic sampling summarize_two_stage() | two-stage sampling summarize_threeP() | 3P sampling summarize_poisson() | Poisson sampling

Cluster Sample:

Sampling Strategy Definition:

Each sampling unit is represented as a cluster, a group of several plots.

Function Definition:

summarize_all_cluster(data, attribute = NA, element = TRUE, plotTot = NA, desiredConfidence = 0.95, bernoulli = F)

Options:

Cluster sample has 2 options: 1. Cluster sample with a normal distribution * Set bernoulli = FALSE or do not include the parameter in the function call * Ex. summarize_all_cluster(dataPlot, attribute, element = TRUE, bernoulli = F) 2. Cluster sample with a bernoulli distribution * Set bernoulli = TRUE * Ex. summarize_all_cluster(data, attribute, plotTot = 250, bernoulli = T)

Parameter | Include parameter in option #: | Description -------------------|--------------------------------|------------------------------------------------------------------------- data | 1, 2 | data frame containing observations of variable of interest for either cluster-level or plot-level data element | 1 | logical true if parameter data is plot-level, false if parameter data is cluster-level. Default is True attribute | 1, 2 | character name of attribute to be summarized plotTot | 2 | numeric population size, equivalent to the total number of possible plots in the population desiredConfidence | 1, 2 | numeric desired confidence level (e.g. 0.9) bernoulli | 1, 2 | logical TRUE if data fitting the Bernoulli distribution is used

Data Input:

Option 1 can take in element (plot) data or stand data.

plotLevelDataExample <- data.frame(
  clusterID = c(
    1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4, 4,
    5, 5, 5, 5, 5
  ),
  attr = c(
    1000, 1250, 950, 900, 1005, 1000, 1250, 950,
    900, 1005, 1000, 1250, 950, 900, 1005, 1000,
    1250, 950, 900
  ),
  isUsed = c(
    T, T, T, T, T, T, T, T, T, T, T, T, T, T, F,
    F, F, F, F
  )
)

clusterLevelDataExample <- data.frame(
  clusterID = c(1, 2, 3, 4, 5),
  clusterElements = c(4, 2, 9, 4, 10),
  sumAttr = c(1000, 1250, 950, 900, 1005),
  isUsed = c(T, T, F, T, T)
)

Option 2 is the option for the Bernoulli distribution. For this distribution, the total number of plots in the sampling frame must be indicated, and data must be input as follows (where attr is the value of the attribute for each plot):

data <- data.frame(
  plots = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  propAlive = c(
    0.75, 0.80, 0.80, 0.85, 0.70,
    0.90, 0.70, 0.75, 0.80, 0.65
  )
)

Simple Random Sample

Sampling Strategy Definition:

n sampling units are selected. Each sampling unit is selected independently from the selection of other sampling units.

Function Definition:

summarize_all_srs(data, attribute = 'attr', popSize = NA, desiredConfidence = 0.95, infiniteReplacement = F, bernoulli = F)

Options:

Simple random sample (SRS) has 3 options: 1. SRS of a finite population or sampled without replacement * Set infiniteReplacement = FALSE or do not include the parameter in the function call * Data input can be as a vector or data frame * Ex. summarize_all_srs(data, attribute, popSize = NA, desiredConfidence = 0.95, infiniteReplacement = F) 2. SRS of an infinite population or sampled with replacement * Set infiniteReplacement = TRUE * Data input can be as a vector or data frame * Ex. summarize_all_srs(data, attribute, popSize = NA, desiredConfidence = 0.95, infiniteReplacement = T) 3. SRS with a Bernoulli distribution * Set bernoulli = TRUE * Data input can only be as a data frame * Ex. summarize_all_srs(data, attribute, popSize = popTot, desiredConfidence = 0.95, infiniteReplacement = F, bernoulli = T)

Parameter | Include parameter in option #: | Description --------------------|--------------------------------|------------------------------------------------------------------------- data | 1, 2 | data frame or vector containing observations of variable of interest data | 3 | data frame containing observations of variable of interest (attribute must be coded as either TRUE and FALSE or 1 and 0) attribute | 1, 2, 3 | character name of attribute to be summarized, must be defined if data is input as a data frame popSize | 1, 2, 3 | numeric population size, defaults to NA (unknown popSize) desiredConfidence | 1, 2, 3 | numeric desired confidence level (e.g. 0.9) infiniteReplacement | 1, 2 | logical true if sample was done with replacement or from an infinite population. FALSE if sampled without replacement, from a finite population (defaults to FALSE) bernoulli | 1, 2, 3 | logical TRUE if data fitting the Bernoulli distribution is used

Data Input:

Vector (Options 1, 2)

data <- c(120, 140, 160, 110, 100, 90)

Data frame (Options 1, 2)

data <- data.frame(
  bapa = c(120, 140, 160, 110, 100, 90),
  plots = c(1, 2, 3, 4, 5, 6)
)
attribute <- "bapa"

For Bernoulli Distribution - data frame input only (Option 3)

data <- data.frame(alive = c(T, T, F, T, F, F), plots = c(1, 2, 3, 4, 5, 6))
attribute <- "alive"

Stratified Sample:

Sampling Strategy Definition:

A simple random sample is performed on at least two subpopulations of known size. The potential subpopulations are formed from dividing the population.

Function Definition:

summarize_stratified(trainingData, attribute, stratumTab, desiredConfidence = 0.95, post = T)

Parameter | Description --------------------|----------------------------------------------------- trainingData | data frame containing observations of variable of interest, and stratum assignment for each plot attribute | character name of attribute to be summarized stratumTab | data frame containing acreages for each stratum desiredConfidence | numeric desired confidence level (e.g. 0.9) post | logical true if post-stratification was used

Options:

Stratified sample currently has two options: 1. Stratified * Set post = FALSE 2. Post-Stratified * Set post = TRUE

Data Input:

Note: The stratum columns MUST be named 'stratum' and the string you set to equal attribute.

trainingData <- data.frame(
  bapa = c(120, 140, 160, 110, 100, 90),
  stratum = c(1, 1, 1, 2, 2, 2)
)
stratumTab <- data.frame(stratum = c(1, 2), acres = c(200, 50))
attribute <- "bapa"
desiredConfidence <- 0.9

Systematic Sample:

Sampling Strategy Definition:

The first sampling unit is arbitrarily established in the field or randomly selected. Subsequent sampling units are placed at uniform intervals through the target area.

Function Definition:

summarize_systematic(data, attribute = 'attr', popSize = NA, desiredConfidence = 0.95)

Parameter | Description ------------------|----------------------------------------- data | data frame or vector containing observations of variable of interest attribute | character name of attribute to be summarized, must be defined if data is input as a data frame popSize | numeric population size, defaults to NA (unknown population size) desiredConfidence | numeric desired confidence level (e.g. 0.9)

Data Input:

Data input can be either as a vector or data frame.

dataframe <- data.frame(
  bapa = c(120, 140, 160, 110, 100, 90),
  plots = c(1, 2, 3, 4, 5, 6)
)
attribute <- "bapa"

vector <- c(120, 140, 160, 110, 100, 90)

Two-Stage Sample:

Sampling Strategy Definition:

A variation of cluster sampling. A simple random sample of clusters is selected, then a simple random sample of elements within each cluster is selected and sampled.

Function Definition:

summarize_two_stage(data, plot = TRUE, attribute = NA, populationClusters = 0, populationElementsPerCluster = 0, desiredConfidence = 0.95)

Options:

Two-stage sample currently has two data input options: 1. Plot-level data * Set post = FALSE 2. Cluster-level data * Set post = TRUE

Parameter | Include parameter in option #: | Description -----------------------------|--------------------------------|--------------------------------------------- data | 1, 2 | data frame containing observations of variable of interest for either cluster-level of plot-level data. plot | 1, 2 | logical TRUE if parameter data is plot-level, FALSE if parameter data is cluster-level. Default is TRUE. attribute | 1 | character name of attribute to be summarized. populationClusters | 1 | numeric total number of clusters in the population. populationElementsPerCluster | 1 | numeric total number of elements in the population. desiredConfidence | 1, 2 | numeric desired confidence level (e.g. 0.9).

Data Input:

Data can be organized by either by plot (plot = TRUE) or the total sum of the attribute for each cluster (plot = FALSE).

data <- data.frame(
  clusterID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6),
  volume = c(
    500, 650, 610, 490, 475, 505, 940, 825, 915, 210, 185, 170, 450,
    300, 500, 960, 975, 890
  ),
  isUsed = c(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T)
)

Example: summarize_two_stage(data, plot = T, 'volume', populationClusters = 16, populationElementsPerCluster = 160)

3P Sample:

Sampling Strategy Definition:

All trees in an area are estimated. Select few trees are identified and carefully measured. The estimated values are then corrected through a ratio method. This function assumes that this is follows the Point 3P analysis. Variations of Point 3P can include a simple or clustered sample. Difference in analysis of these variations occur before the post-processing assessed in the package. A book by Iles (2003) informed the content of this function.

Function Definition:

summarize_threeP(data, cvPercent, trueNetVBAR, height = FALSE, plot = FALSE, desiredConfidence = 0.95)

Parameter | Description ------------------|----------------------------------------------- data | data frame containing plot number, VBARs, and treeHeights cvPercent | numeric average Coefficient of Variation expressed as a percent (e.g. 50) trueNetVBAR | numeric measured net VBAR (volume to basal area ratio) height | logical TRUE if data input contains height values, FALSE assumes data contains estimateVBAR plot | logical TRUE if the input data is averaged by plot, FALSE if data is tree-level desiredConfidence | numeric desired confidence level (e.g. 0.9)

Options:

3P sample currently has two key options: 1. height a. Height data is included * An estimate VBAR will be roughly calculated * Set height = TRUE b. Estimate VBARs are included * An estimate VBAR is included in the input dat * Set height = FALSE 2. plot a. Plot-level Data is Input * Data is set up as plot-level data, defined below * Set plot = TRUE b. Tree-level Data is Input * Data is set up as tree-level data, defined below * Set plot = FALSE

Data Input:

Height is included, data is plot-level.

dataHeight <- data.frame(
  plotNum = c(5, 4, 3, 5, 4),
  treeCount = c(4, 5, 6, 3, 4),
  BAF = c(10, 10, 10, 10, 10),
  avgTreeVBARSperPlot = c(9, 8, 7, 8, 2),
  treeHeight = c(1, 2, 3, 4, 5)
)

Example: summarize_threeP(dataHeight, cvPercent = 50, trueNetVBAR = 10, height = T, plot = T)

estimateVBAR is included (height data is not used, and so is not included), data is plot-level.

dataEstimateVBAR <- data.frame(
  plotNum = c(1, 2, 3, 4, 5),
  treeCount = c(4, 5, 6, 3, 4),
  BAF = c(10, 10, 10, 10, 10),
  avgTreeVBARSperPlot = c(9, 8, 7, 8, 2),
  estimateVBAR = c(1, 2, 3, 4, 5)
)

Example:

summarize_threeP(
  dataEstimateVBAR,
  cvPercent = 50,
  trueNetVBAR = 10,
  height = F,
  plot = T
)

estimateVBAR is included, data is tree-level.

dataNotPlot <- data.frame(
  plotNum = c(1, 1, 2, 2, 3),
  BAF = c(10, 10, 10, 10, 10),
  VBAR = c(1, 2, 3, 4, 3),
  estimateVBAR = c(1, 2, 3, 4, 5)
)

Example:

summarize_threeP(
  dataNotPlot,
  cvPercent = 50,
  trueNetVBAR = 10,
  height = F,
  plot = F
)

Poisson:

Sampling Strategy Definition:

An assessment of count data that follows the Poisson distribution.

Note: The term 'Poisson sampling' was also used to refer to a precursor to 3P sampling (Iles, 2003). The preliminary 3P version is discussed in Gregoire and Valentine's sampling strategies book (2008). However, in this package the clearest distinction is that Poisson sampling analyzes to count data and 3P considers field estimations.

Function Definition:

summarize_poisson(data, desiredConfidence = 0.95)

Parameter | Description ------------------|----------------------------------------- data | vector containing number of instances per desired unit (e.g. 6 trees were alive at plot 10 -> c(6)) desiredConfidence | numeric desired confidence level (e.g. 0.9)

Data Input:

data <- c(2, 3, 4, 3, 4, 5, 2, 7)

Sources Used

Avery, T. E., & Burkhart, H., E. (1975). Forest Measurements (5th ed.). New York, NY: McGraw-Hill.

Gregoire, T. G., & Valentine, H. T. (2008). Sampling Strategies for Natural Resources and the Environment. Boca Raton, FL: CRC Press.

Iles, K. (2003). A Sampler of Inventory Topics. Canada: Kim Iles & Associates Ltd.

The Pennsylvania State University. (n.d.). Poisson Sampling. Retrieved from https://onlinecourses.science.psu.edu/stat504/node/57

SilviaTerra/forestsamplr documentation built on Jan. 3, 2020, 2:33 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SilviaTerra/forestsamplr Standard Forest Sampling Design Workups

In SilviaTerra/forestsamplr: Standard Forest Sampling Design Workups

Forest Sampling Package

General Information

Common parameters, usage information:

Output:

Functions:

Cluster Sample:

Sampling Strategy Definition:

Function Definition:

Options:

Data Input:

Simple Random Sample

Sampling Strategy Definition:

Function Definition:

Options:

Data Input:

Stratified Sample:

Sampling Strategy Definition:

Function Definition:

Options:

Data Input:

Systematic Sample:

Sampling Strategy Definition:

Function Definition:

Data Input:

Two-Stage Sample:

Sampling Strategy Definition:

Function Definition:

Options:

Data Input:

3P Sample:

Sampling Strategy Definition:

Function Definition:

Options:

Data Input:

Poisson:

Sampling Strategy Definition:

Function Definition:

Data Input:

Sources Used

R Package Documentation

Browse R Packages

We want your feedback!

SilviaTerra/forestsamplr
Standard Forest Sampling Design Workups