The nwos package was designed to generate estimates for the U.S. Forest Service's National Woodland Owner Survey. The functions incorporate the NWOS sample design to generate estimates. Estimates can be generated for totals, proportions, means, medians, and associated variances. The background on the estimators is provided in Butler (In review).
This vignette shows the basic steps necessary to generate estimates using this package. The package can be downloaded from GitHub (devtools::install_git("git://github.com/bbutler01/nwos.git")). The examples below use the ``wi'' dataset supplied with the package. A description of this dataset is included in Butler (In review).
The basic steps are: Install packages and load data Generate replicates (used for variance estimates) Estimate stratum areas Calculate response rates Calculate weights Generate estimates, including variances
These steps are repeated for each combination of state, stratum (e.g., family forest owners), and domain. The basic estimators assume the data supplied are for an individual state. Wrapper functions can be developed for iterating through states, strata, domains, and variables.
library(nwos) library(tidyverse)
wi <- tbl_df(read.csv("../data/wi.csv")) wi <- wi %>% mutate(ROW_NAME = row.names(wi), AC_WOOD = ACRES_FOREST)
Replicates are used in the bootstrapping variance estimation. Replicates are generated just once to increase processing efficiency, in order to use the same replicates across all estimates, and to facilitate repeatability of the results. This vignette uses only 100 replicates in order to decrease processing time. An actual application would likely use many more replicates, e.g., $R = 2,500$.
WI_REPLICATES <- nwos_replicates(index = row.names(wi), point.count = wi$POINT_COUNT, R = 100)
Areas need to be estimated for each stratum of interest in a state. This is done using a dummy variable indicating inclusion in a stratum, e.g., family forest ownerships (FFO). To capture the full variability in the estimates, stratum areas must be estimated separately for each replicate.
For the example here, the stratum of interest is family forest ownerships (FFO) in Wisconsin. The FFO dummy variable is set to 1 if $LAND_USE == 1$ & $OWN_CD == 45$ & $AC_WOOD >= 1$, and 0 otherwise.
wi <- wi %>% mutate(FFO = if_else(LAND_USE == 1 & OWN_CD == 45 & AC_WOOD >= 1, 1, 0)) # Add stratum variable, FFO WI_FFO_AREA <- nwos_stratum_area(stratum = wi$FFO, point.count = wi$POINT_COUNT, state.area = 33898733) WI_FFO_AREA_REP <- sapply(WI_REPLICATES, nwos_stratum_area_apply, index = wi$ROW_NAME, stratum = wi$FFO, state.area = 33898733)
Response rates need to be calculated for each stratum of interest. Again, response rates need to be calculated for each replicate.
For this example, RESPONSE is defined as 1 if $RESPONSE_PROPENSITY >= 0.5$, and 0 otherwise.
wi <- wi %>% mutate(RESPONSE = if_else(RESPONSE_PROPENSITY >= 0.5, 1, 0)) %>% mutate(RESPONSE = if_else(is.na(RESPONSE_PROPENSITY), 0, RESPONSE)) WI_FFO_RR <- nwos_response_rate(stratum = wi$FFO, point.count = wi$POINT_COUNT, response = wi$RESPONSE) WI_FFO_RR_REP <- sapply(WI_REPLICATES, nwos_response_rate_apply, index = wi$ROW_NAME, stratum = wi$FFO, response = wi$RESPONSE)
Weights are key to the NWOS estimates. For the NWOS, they are a function of the number of sample points owned, acreage owned, and the area of the stratum. These weights are used for all of the estimates, and the estimates calculated above are largely done in order to generate these weights.
For some modeling purposes, extreme values in the weights may cause problems. See the section below on trimmed weights for a potential solution.
wi$WEIGHT <- nwos_weights(stratum = wi$FFO, point.count = wi$POINT_COUNT, response = wi$RESPONSE, area = wi$AC_WOOD, stratum.area = WI_FFO_AREA, response.rate = WI_FFO_RR) WI_FFO_WEIGHTS_REP <- lapply(1:length(WI_REPLICATES), nwos_weights_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, stratum = wi$FFO, response = wi$RESPONSE, area = wi$AC_WOOD, stratum.area = WI_FFO_AREA_REP, response.rate = WI_FFO_RR_REP)
Totals are estimated as a function of weights and can be calculated in terms of ownerships or acreage. Specific domains of interest (e.g., a subset within a stratum) can be specified.
In addition, totals can be made for continuous variables. This does not make sense for many NWOS variables, but it is useful for generating means (see below).
Variances, and sampling errors, are estimated using the estimates for the replicates.
WI_FFO_OWN_TOTAL <- nwos_total(weight = wi$WEIGHT) # wi <- wi %>% mutate(OWNER = 1) WI_FFO_OWN_TOTAL_REP <- sapply(1:length(WI_REPLICATES), nwos_total_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP) # WI_FFO_OWN_TOTAL # sqrt(var(WI_FFO_OWN_TOTAL_REP)) WI_FFO_AC_TOTAL <- nwos_total(weight = wi$WEIGHT, area = wi$AC_WOOD) WI_FFO_OWN_TOTAL_REP <- sapply(1:length(WI_REPLICATES), nwos_total_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, area = wi$AC_WOOD) # WI_FFO_AC_TOTAL # sqrt(var(WI_FFO_AC_TOTAL_REP))
Proportions are typically in terms of what proportion of the ownerships (or acreage) have a specific attribute.
WI_FFO_OWN_Y1_PROP <- nwos_proportion(weight = wi$WEIGHT, variable = wi$Y_1) WI_FFO_OWN_Y1_PROP_REP <- sapply(1:length(WI_REPLICATES), nwos_proportion_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, variable = wi$Y_1) # WI_FFO_OWN_Y1_PROP # sqrt(var(WI_FFO_OWN_Y1_PROP_REP)) WI_FFO_AC_Y1_PROP <- nwos_proportion(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$Y_1) WI_FFO_AC_Y1_PROP_REP <- sapply(1:length(WI_REPLICATES), nwos_proportion_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, area = wi$AC_WOOD, variable = wi$Y_1) # WI_FFO_AC_Y1_PROP # sqrt(var(WI_FFO_AC_Y1_PROP_REP))
Means are in terms of ownerships or acreage, i.e., what is the average value of a given variable in terms of ownerships or acreage.
WI_FFO_OWN_AC_MEAN <- nwos_mean(weight = wi$WEIGHT, variable = wi$AC_WOOD) WI_FFO_OWN_AC_MEAN_REP <- sapply(1:length(WI_REPLICATES), nwos_mean_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, variable = wi$AC_WOOD) # WI_FFO_OWN_AC_MEAN # sqrt(var(WI_FFO_OWN_AC_MEAN_REP)) WI_FFO_AC_AC_MEAN <- nwos_mean(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$AC_WOOD) WI_FFO_AC_AC_MEAN_REP <- sapply(1:length(WI_REPLICATES), nwos_mean_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, area = wi$AC_WOOD, variable = wi$AC_WOOD) # WI_FFO_AC_AC_MEAN # sqrt(var(WI_FFO_AC_AC_MEAN_REP))
WI_FFO_OWN_Y_3_MEAN <- nwos_mean(weight = wi$WEIGHT, variable = wi$Y_3) WI_FFO_OWN_Y_3_MEAN_REP <- sapply(1:length(WI_REPLICATES), nwos_mean_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, variable = wi$Y_3) # WI_FFO_OWN_Y_3_MEAN # sqrt(var(WI_FFO_OWN_Y_3_MEAN_REP)) WI_FFO_AC_Y_3_MEAN <- nwos_mean(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$Y_3) WI_FFO_AC_Y_3_MEAN_REP <- sapply(1:length(WI_REPLICATES), nwos_mean_apply, index.rep = WI_REPLICATES, index = wi$ROW_NAME, weight = WI_FFO_WEIGHTS_REP, area = wi$AC_WOOD, variable = wi$Y_3) # WI_FFO_AC_Y_3_MEAN # sqrt(var(WI_FFO_AC_Y_3_MEAN_REP))
Quantiles, including medians, can be calculated in terms of ownerships or acreage. The current algorithm uses an iterative approach and can be computationally slow.
WI_FFO_OWN_AC_QUANTILE <- nwos_quantile(weight = wi$WEIGHT, domain = wi$FFO, variable = wi$AC_WOOD) # WI_FFO_OWN_AC_QUANTILE WI_FFO_AC_AC_QUANTILE <- nwos_quantile(weight = wi$WEIGHT, area = wi$AC_WOOD, domain = wi$FFO, variable = wi$AC_WOOD) # WI_FFO_AC_AC_QUANTILE
For some purposes, extreme values in the base weights may cause problems (e.g., convergence in models). For these circumstances, trimmed weights may rectify the problem. The only method currently implements in the nwos package for trimming weights is the IQR method (i.e., values greater than or less than $1.5 \times IQR$ are set to $Q_4 + (1.5 \times IQR)$ and $Q_2 - (1.5 \times IQR)$, respectively).
wi$WEIGHT_TRIMMED <- nwos_weights_trimmed(weights = wi$WEIGHT) # summary(wi$WEIGHT[wi$WEIGHT > 0]) # summary(wi$WEIGHT_TRIMMED[wi$WEIGHT_TRIMMED > 0])
Butler, B.J. In review. Weighting for the US Forest Service, National Woodland Owner Survey. U.S. Department of Agriculture, Forest Service, Northern Research Station. Newtown Square, PA.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.