The nwos package was designed to generate estimates for the U.S. Forest Service's National Woodland Owner Survey. The functions incorporate the NWOS sample design to generate estimates. Estimates can be generated for totals, proportions, means, medians, and associated variances. The background on the estimators is provided in Butler (In review).

This vignette shows the basic steps necessary to generate estimates using this package. The package can be downloaded from GitHub (devtools::install_git("git://github.com/bbutler01/nwos.git")). The examples below use the ``wi'' dataset supplied with the package. A description of this dataset is included in Butler (In review).

The basic steps are: Install packages and load data Generate replicates (used for variance estimates) Estimate stratum areas Calculate response rates Calculate weights Generate estimates, including variances

These steps are repeated for each combination of state, stratum (e.g., family forest owners), and domain. The basic estimators assume the data supplied are for an individual state. Wrapper functions can be developed for iterating through states, strata, domains, and variables.

Install packages

# devtools::install_git("git://github.com/bbutler01/nwos.git")
library(nwos)
library(tidyverse)

Load data

wi <- tbl_df(read.csv("../data/wi.csv")) 
wi <- wi %>% mutate(ROW_NAME = row.names(wi),
                    AC_WOOD = ACRES_FOREST)

Generate Replicates

Replicates are used in the bootstrapping variance estimation. Replicates are generated just once to increase processing efficiency, in order to use the same replicates across all estimates, and to facilitate repeatability of the results. This vignette uses only 100 replicates in order to decrease processing time. An actual application would likely use many more replicates, e.g., $R = 2,500$.

WI_REPLICATES <- nwos_replicates(index = row.names(wi), point.count = wi$POINT_COUNT, R = 100)

Estimate stratum areas

Areas need to be estimated for each stratum of interest in a state. This is done using a dummy variable indicating inclusion in a stratum, e.g., family forest ownerships (FFO). To capture the full variability in the estimates, stratum areas must be estimated separately for each replicate.

For the example here, the stratum of interest is family forest ownerships (FFO) in Wisconsin. The FFO dummy variable is set to 1 if $LAND_USE == 1$ & $OWN_CD == 45$ & $AC_WOOD >= 1$, and 0 otherwise.

wi <- wi %>% mutate(FFO = if_else(LAND_USE == 1 & OWN_CD == 45 & AC_WOOD >= 1, 1, 0)) # Add stratum variable, FFO
WI_FFO_AREA <- nwos_stratum_area(stratum = wi$FFO, point.count = wi$POINT_COUNT, state.area = 33898733)
WI_FFO_AREA_REP <- sapply(WI_REPLICATES, nwos_stratum_area_apply, index = wi$ROW_NAME, stratum = wi$FFO, state.area = 33898733)

Calculate response rates

Response rates need to be calculated for each stratum of interest. Again, response rates need to be calculated for each replicate.

For this example, RESPONSE is defined as 1 if $RESPONSE_PROPENSITY >= 0.5$, and 0 otherwise.

wi <- wi %>% mutate(RESPONSE = if_else(RESPONSE_PROPENSITY >= 0.5, 1, 0)) %>%
  mutate(RESPONSE = if_else(is.na(RESPONSE_PROPENSITY), 0, RESPONSE))
WI_FFO_RR <- nwos_response_rate(stratum = wi$FFO, point.count = wi$POINT_COUNT, response = wi$RESPONSE)
WI_FFO_RR_REP <- sapply(WI_REPLICATES, nwos_response_rate_apply, index = wi$ROW_NAME, stratum = wi$FFO, response = wi$RESPONSE)

Generate weights

Weights are key to the NWOS estimates. For the NWOS, they are a function of the number of sample points owned, acreage owned, and the area of the stratum. These weights are used for all of the estimates, and the estimates calculated above are largely done in order to generate these weights.

For some modeling purposes, extreme values in the weights may cause problems. See the section below on trimmed weights for a potential solution.

wi$WEIGHT <- nwos_weights(stratum = wi$FFO, point.count = wi$POINT_COUNT, 
                          response = wi$RESPONSE, area = wi$AC_WOOD, 
                          stratum.area = WI_FFO_AREA, response.rate = WI_FFO_RR)
WI_FFO_WEIGHTS_REP <- lapply(1:length(WI_REPLICATES), 
                             nwos_weights_apply, 
                             index.rep = WI_REPLICATES, 
                             index = wi$ROW_NAME, 
                             stratum = wi$FFO, 
                             response = wi$RESPONSE, 
                             area = wi$AC_WOOD,
                             stratum.area = WI_FFO_AREA_REP, 
                             response.rate = WI_FFO_RR_REP)

Generate estimates - totals

Totals are estimated as a function of weights and can be calculated in terms of ownerships or acreage. Specific domains of interest (e.g., a subset within a stratum) can be specified.

In addition, totals can be made for continuous variables. This does not make sense for many NWOS variables, but it is useful for generating means (see below).

Variances, and sampling errors, are estimated using the estimates for the replicates.

WI_FFO_OWN_TOTAL <- nwos_total(weight = wi$WEIGHT)
# wi <- wi %>% mutate(OWNER = 1)
WI_FFO_OWN_TOTAL_REP <- sapply(1:length(WI_REPLICATES), 
                               nwos_total_apply, 
                               index.rep = WI_REPLICATES,
                               index = wi$ROW_NAME,
                               weight = WI_FFO_WEIGHTS_REP)
# WI_FFO_OWN_TOTAL
# sqrt(var(WI_FFO_OWN_TOTAL_REP))
WI_FFO_AC_TOTAL <- nwos_total(weight = wi$WEIGHT, area = wi$AC_WOOD)
WI_FFO_OWN_TOTAL_REP <- sapply(1:length(WI_REPLICATES), 
                               nwos_total_apply, 
                               index.rep = WI_REPLICATES,
                               index = wi$ROW_NAME,
                               weight = WI_FFO_WEIGHTS_REP,
                               area = wi$AC_WOOD)
# WI_FFO_AC_TOTAL
# sqrt(var(WI_FFO_AC_TOTAL_REP))

Proportions

Proportions are typically in terms of what proportion of the ownerships (or acreage) have a specific attribute.

WI_FFO_OWN_Y1_PROP <- nwos_proportion(weight = wi$WEIGHT, variable = wi$Y_1)
WI_FFO_OWN_Y1_PROP_REP <- sapply(1:length(WI_REPLICATES), 
                                 nwos_proportion_apply, 
                                 index.rep = WI_REPLICATES,
                                 index = wi$ROW_NAME,
                                 weight = WI_FFO_WEIGHTS_REP,
                                 variable = wi$Y_1)
# WI_FFO_OWN_Y1_PROP
# sqrt(var(WI_FFO_OWN_Y1_PROP_REP))
WI_FFO_AC_Y1_PROP <- nwos_proportion(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$Y_1)
WI_FFO_AC_Y1_PROP_REP <- sapply(1:length(WI_REPLICATES), 
                                nwos_proportion_apply, 
                                index.rep = WI_REPLICATES,
                                index = wi$ROW_NAME,
                                weight = WI_FFO_WEIGHTS_REP,
                                area = wi$AC_WOOD,
                                variable = wi$Y_1)
# WI_FFO_AC_Y1_PROP
# sqrt(var(WI_FFO_AC_Y1_PROP_REP))

Means

Means are in terms of ownerships or acreage, i.e., what is the average value of a given variable in terms of ownerships or acreage.

Average size of forest holdings

WI_FFO_OWN_AC_MEAN <- nwos_mean(weight = wi$WEIGHT, variable = wi$AC_WOOD)
WI_FFO_OWN_AC_MEAN_REP <- sapply(1:length(WI_REPLICATES), 
                                 nwos_mean_apply, 
                                 index.rep = WI_REPLICATES,
                                 index = wi$ROW_NAME,
                                 weight = WI_FFO_WEIGHTS_REP,
                                 variable = wi$AC_WOOD)
# WI_FFO_OWN_AC_MEAN
# sqrt(var(WI_FFO_OWN_AC_MEAN_REP))
WI_FFO_AC_AC_MEAN <- nwos_mean(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$AC_WOOD)
WI_FFO_AC_AC_MEAN_REP <- sapply(1:length(WI_REPLICATES), 
                                 nwos_mean_apply, 
                                 index.rep = WI_REPLICATES,
                                 index = wi$ROW_NAME,
                                 weight = WI_FFO_WEIGHTS_REP,
                                 area = wi$AC_WOOD,
                                 variable = wi$AC_WOOD)
# WI_FFO_AC_AC_MEAN
# sqrt(var(WI_FFO_AC_AC_MEAN_REP))

Average of $Y_3$

WI_FFO_OWN_Y_3_MEAN <- nwos_mean(weight = wi$WEIGHT, variable = wi$Y_3)
WI_FFO_OWN_Y_3_MEAN_REP <- sapply(1:length(WI_REPLICATES), 
                                 nwos_mean_apply, 
                                 index.rep = WI_REPLICATES,
                                 index = wi$ROW_NAME,
                                 weight = WI_FFO_WEIGHTS_REP,
                                 variable = wi$Y_3)
# WI_FFO_OWN_Y_3_MEAN
# sqrt(var(WI_FFO_OWN_Y_3_MEAN_REP))
WI_FFO_AC_Y_3_MEAN <- nwos_mean(weight = wi$WEIGHT, area = wi$AC_WOOD, variable = wi$Y_3)
WI_FFO_AC_Y_3_MEAN_REP <- sapply(1:length(WI_REPLICATES), 
                                 nwos_mean_apply, 
                                 index.rep = WI_REPLICATES,
                                 index = wi$ROW_NAME,
                                 weight = WI_FFO_WEIGHTS_REP,
                                 area = wi$AC_WOOD,
                                 variable = wi$Y_3)
# WI_FFO_AC_Y_3_MEAN
# sqrt(var(WI_FFO_AC_Y_3_MEAN_REP))

Quantiles

Quantiles, including medians, can be calculated in terms of ownerships or acreage. The current algorithm uses an iterative approach and can be computationally slow.

WI_FFO_OWN_AC_QUANTILE <- nwos_quantile(weight = wi$WEIGHT, domain = wi$FFO, variable = wi$AC_WOOD)
# WI_FFO_OWN_AC_QUANTILE
WI_FFO_AC_AC_QUANTILE <- nwos_quantile(weight = wi$WEIGHT, area = wi$AC_WOOD, domain = wi$FFO, variable = wi$AC_WOOD)
# WI_FFO_AC_AC_QUANTILE

Trimmed Weights

For some purposes, extreme values in the base weights may cause problems (e.g., convergence in models). For these circumstances, trimmed weights may rectify the problem. The only method currently implements in the nwos package for trimming weights is the IQR method (i.e., values greater than or less than $1.5 \times IQR$ are set to $Q_4 + (1.5 \times IQR)$ and $Q_2 - (1.5 \times IQR)$, respectively).

wi$WEIGHT_TRIMMED <- nwos_weights_trimmed(weights = wi$WEIGHT)
# summary(wi$WEIGHT[wi$WEIGHT > 0])
# summary(wi$WEIGHT_TRIMMED[wi$WEIGHT_TRIMMED > 0])

References

Butler, B.J. In review. Weighting for the US Forest Service, National Woodland Owner Survey. U.S. Department of Agriculture, Forest Service, Northern Research Station. Newtown Square, PA.



bbutler01/nwos documentation built on Aug. 30, 2019, 12:57 p.m.