The Forest Sampling is a package of functions that calculates workups for sampling designs commonly used in forestry. It is an open source opportunity for collaboration and transparency.
These sampling designs include: Simple Random Sampling Cluster Sampling Stratified Sampling Systematic Sampling Two-Stage Sampling 3P Sampling
Variations are included, described in detail below.
summarize_simple_random
and summarize_systematic
).attribute
: assumes target attribute is called attr
. Set attribute
to the column header used for the variable of interest. Not required for vector input.___Tot
: total number of the ___ unit in the sampling design. Commonly the total number of clusters or plots in the sampling frame.desiredConfidence
: Percent confidence level (two-sided) expressed as decimal. Default is set to 0.95
.The outputs of the functions vary, but all produce a data frame containing: mean variance standard error upper limit for the confidence interval * lower limit for the confidence interval
The following are the basic functions and their associated sampling strategy.
Function | Sampling Strategy
-------------------------------------|---------------------------------------------------------
summarize_all_cluster()
| cluster sampling
summarize_all_simple_random()
| simple random sampling
summarize_stratified()
| stratified sampling
summarize_systematic()
| systematic sampling
summarize_two_stage()
| two-stage sampling
summarize_threeP()
| 3P sampling
summarize_poisson()
| Poisson sampling
Each sampling unit is represented as a cluster, a group of several plots.
summarize_all_cluster(data, attribute = NA, element = TRUE, plotTot = NA, desiredConfidence = 0.95, bernoulli = F)
Cluster sample has 2 options:
1. Cluster sample with a normal distribution
* Set bernoulli = FALSE
or do not include the parameter in the function call
* Ex. summarize_all_cluster(dataPlot, attribute, element = TRUE, bernoulli = F)
2. Cluster sample with a bernoulli distribution
* Set bernoulli = TRUE
* Ex. summarize_all_cluster(data, attribute, plotTot = 250, bernoulli = T)
Parameter | Include parameter in option #: | Description -------------------|--------------------------------|------------------------------------------------------------------------- data | 1, 2 | data frame containing observations of variable of interest for either cluster-level or plot-level data element | 1 | logical true if parameter data is plot-level, false if parameter data is cluster-level. Default is True attribute | 1, 2 | character name of attribute to be summarized plotTot | 2 | numeric population size, equivalent to the total number of possible plots in the population desiredConfidence | 1, 2 | numeric desired confidence level (e.g. 0.9) bernoulli | 1, 2 | logical TRUE if data fitting the Bernoulli distribution is used
Option 1 can take in element (plot) data or stand data.
plotLevelDataExample <- data.frame( clusterID = c( 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5 ), attr = c( 1000, 1250, 950, 900, 1005, 1000, 1250, 950, 900, 1005, 1000, 1250, 950, 900, 1005, 1000, 1250, 950, 900 ), isUsed = c( T, T, T, T, T, T, T, T, T, T, T, T, T, T, F, F, F, F, F ) )
clusterLevelDataExample <- data.frame( clusterID = c(1, 2, 3, 4, 5), clusterElements = c(4, 2, 9, 4, 10), sumAttr = c(1000, 1250, 950, 900, 1005), isUsed = c(T, T, F, T, T) )
Option 2 is the option for the Bernoulli distribution. For this distribution, the total number of plots in the sampling frame must be indicated, and data must be input as follows (where attr is the value of the attribute for each plot):
data <- data.frame( plots = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), propAlive = c( 0.75, 0.80, 0.80, 0.85, 0.70, 0.90, 0.70, 0.75, 0.80, 0.65 ) )
n sampling units are selected. Each sampling unit is selected independently from the selection of other sampling units.
summarize_all_srs(data, attribute = 'attr', popSize = NA, desiredConfidence = 0.95, infiniteReplacement = F, bernoulli = F)
Simple random sample (SRS) has 3 options:
1. SRS of a finite population or sampled without replacement
* Set infiniteReplacement = FALSE
or do not include the parameter in the function call
* Data input can be as a vector or data frame
* Ex. summarize_all_srs(data, attribute, popSize = NA, desiredConfidence = 0.95, infiniteReplacement = F)
2. SRS of an infinite population or sampled with replacement
* Set infiniteReplacement = TRUE
* Data input can be as a vector or data frame
* Ex. summarize_all_srs(data, attribute, popSize = NA, desiredConfidence = 0.95, infiniteReplacement = T)
3. SRS with a Bernoulli distribution
* Set bernoulli = TRUE
* Data input can only be as a data frame
* Ex. summarize_all_srs(data, attribute, popSize = popTot, desiredConfidence = 0.95, infiniteReplacement = F, bernoulli = T)
Parameter | Include parameter in option #: | Description --------------------|--------------------------------|------------------------------------------------------------------------- data | 1, 2 | data frame or vector containing observations of variable of interest data | 3 | data frame containing observations of variable of interest (attribute must be coded as either TRUE and FALSE or 1 and 0) attribute | 1, 2, 3 | character name of attribute to be summarized, must be defined if data is input as a data frame popSize | 1, 2, 3 | numeric population size, defaults to NA (unknown popSize) desiredConfidence | 1, 2, 3 | numeric desired confidence level (e.g. 0.9) infiniteReplacement | 1, 2 | logical true if sample was done with replacement or from an infinite population. FALSE if sampled without replacement, from a finite population (defaults to FALSE) bernoulli | 1, 2, 3 | logical TRUE if data fitting the Bernoulli distribution is used
data <- c(120, 140, 160, 110, 100, 90)
data <- data.frame( bapa = c(120, 140, 160, 110, 100, 90), plots = c(1, 2, 3, 4, 5, 6) ) attribute <- "bapa"
data <- data.frame(alive = c(T, T, F, T, F, F), plots = c(1, 2, 3, 4, 5, 6)) attribute <- "alive"
A simple random sample is performed on at least two subpopulations of known size. The potential subpopulations are formed from dividing the population.
summarize_stratified(trainingData, attribute, stratumTab, desiredConfidence = 0.95, post = T)
Parameter | Description --------------------|----------------------------------------------------- trainingData | data frame containing observations of variable of interest, and stratum assignment for each plot attribute | character name of attribute to be summarized stratumTab | data frame containing acreages for each stratum desiredConfidence | numeric desired confidence level (e.g. 0.9) post | logical true if post-stratification was used
Stratified sample currently has two options:
1. Stratified
* Set post = FALSE
2. Post-Stratified
* Set post = TRUE
Note: The stratum columns MUST be named 'stratum' and the string you set to equal attribute
.
trainingData <- data.frame( bapa = c(120, 140, 160, 110, 100, 90), stratum = c(1, 1, 1, 2, 2, 2) ) stratumTab <- data.frame(stratum = c(1, 2), acres = c(200, 50)) attribute <- "bapa" desiredConfidence <- 0.9
The first sampling unit is arbitrarily established in the field or randomly selected. Subsequent sampling units are placed at uniform intervals through the target area.
summarize_systematic(data, attribute = 'attr', popSize = NA, desiredConfidence = 0.95)
Parameter | Description ------------------|----------------------------------------- data | data frame or vector containing observations of variable of interest attribute | character name of attribute to be summarized, must be defined if data is input as a data frame popSize | numeric population size, defaults to NA (unknown population size) desiredConfidence | numeric desired confidence level (e.g. 0.9)
Data input can be either as a vector or data frame.
dataframe <- data.frame( bapa = c(120, 140, 160, 110, 100, 90), plots = c(1, 2, 3, 4, 5, 6) ) attribute <- "bapa"
vector <- c(120, 140, 160, 110, 100, 90)
A variation of cluster sampling. A simple random sample of clusters is selected, then a simple random sample of elements within each cluster is selected and sampled.
summarize_two_stage(data, plot = TRUE, attribute = NA, populationClusters = 0, populationElementsPerCluster = 0, desiredConfidence = 0.95)
Two-stage sample currently has two data input options:
1. Plot-level data
* Set post = FALSE
2. Cluster-level data
* Set post = TRUE
Parameter | Include parameter in option #: | Description -----------------------------|--------------------------------|--------------------------------------------- data | 1, 2 | data frame containing observations of variable of interest for either cluster-level of plot-level data. plot | 1, 2 | logical TRUE if parameter data is plot-level, FALSE if parameter data is cluster-level. Default is TRUE. attribute | 1 | character name of attribute to be summarized. populationClusters | 1 | numeric total number of clusters in the population. populationElementsPerCluster | 1 | numeric total number of elements in the population. desiredConfidence | 1, 2 | numeric desired confidence level (e.g. 0.9).
Data can be organized by either by plot (plot = TRUE
) or the total sum of the attribute for each cluster (plot = FALSE
).
data <- data.frame( clusterID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6), volume = c( 500, 650, 610, 490, 475, 505, 940, 825, 915, 210, 185, 170, 450, 300, 500, 960, 975, 890 ), isUsed = c(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T) )
Example: summarize_two_stage(data, plot = T, 'volume', populationClusters = 16, populationElementsPerCluster = 160)
All trees in an area are estimated. Select few trees are identified and carefully measured. The estimated values are then corrected through a ratio method. This function assumes that this is follows the Point 3P analysis. Variations of Point 3P can include a simple or clustered sample. Difference in analysis of these variations occur before the post-processing assessed in the package. A book by Iles (2003) informed the content of this function.
summarize_threeP(data, cvPercent, trueNetVBAR, height = FALSE, plot = FALSE, desiredConfidence = 0.95)
Parameter | Description ------------------|----------------------------------------------- data | data frame containing plot number, VBARs, and treeHeights cvPercent | numeric average Coefficient of Variation expressed as a percent (e.g. 50) trueNetVBAR | numeric measured net VBAR (volume to basal area ratio) height | logical TRUE if data input contains height values, FALSE assumes data contains estimateVBAR plot | logical TRUE if the input data is averaged by plot, FALSE if data is tree-level desiredConfidence | numeric desired confidence level (e.g. 0.9)
3P sample currently has two key options:
1. height
a. Height data is included
* An estimate VBAR will be roughly calculated
* Set height = TRUE
b. Estimate VBARs are included
* An estimate VBAR is included in the input dat
* Set height = FALSE
2. plot
a. Plot-level Data is Input
* Data is set up as plot-level data, defined below
* Set plot = TRUE
b. Tree-level Data is Input
* Data is set up as tree-level data, defined below
* Set plot = FALSE
dataHeight <- data.frame( plotNum = c(5, 4, 3, 5, 4), treeCount = c(4, 5, 6, 3, 4), BAF = c(10, 10, 10, 10, 10), avgTreeVBARSperPlot = c(9, 8, 7, 8, 2), treeHeight = c(1, 2, 3, 4, 5) )
Example: summarize_threeP(dataHeight, cvPercent = 50, trueNetVBAR = 10, height = T, plot = T)
dataEstimateVBAR <- data.frame( plotNum = c(1, 2, 3, 4, 5), treeCount = c(4, 5, 6, 3, 4), BAF = c(10, 10, 10, 10, 10), avgTreeVBARSperPlot = c(9, 8, 7, 8, 2), estimateVBAR = c(1, 2, 3, 4, 5) )
Example:
summarize_threeP( dataEstimateVBAR, cvPercent = 50, trueNetVBAR = 10, height = F, plot = T )
dataNotPlot <- data.frame( plotNum = c(1, 1, 2, 2, 3), BAF = c(10, 10, 10, 10, 10), VBAR = c(1, 2, 3, 4, 3), estimateVBAR = c(1, 2, 3, 4, 5) )
Example:
summarize_threeP( dataNotPlot, cvPercent = 50, trueNetVBAR = 10, height = F, plot = F )
An assessment of count data that follows the Poisson distribution.
Note: The term 'Poisson sampling' was also used to refer to a precursor to 3P sampling (Iles, 2003). The preliminary 3P version is discussed in Gregoire and Valentine's sampling strategies book (2008). However, in this package the clearest distinction is that Poisson sampling analyzes to count data and 3P considers field estimations.
summarize_poisson(data, desiredConfidence = 0.95)
Parameter | Description ------------------|----------------------------------------- data | vector containing number of instances per desired unit (e.g. 6 trees were alive at plot 10 -> c(6)) desiredConfidence | numeric desired confidence level (e.g. 0.9)
data <- c(2, 3, 4, 3, 4, 5, 2, 7)
Avery, T. E., & Burkhart, H., E. (1975). Forest Measurements (5th ed.). New York, NY: McGraw-Hill.
Gregoire, T. G., & Valentine, H. T. (2008). Sampling Strategies for Natural Resources and the Environment. Boca Raton, FL: CRC Press.
Iles, K. (2003). A Sampler of Inventory Topics. Canada: Kim Iles & Associates Ltd.
The Pennsylvania State University. (n.d.). Poisson Sampling. Retrieved from https://onlinecourses.science.psu.edu/stat504/node/57
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.