knitr::opts_chunk$set( echo = FALSE, collapse = TRUE, comment = "#>")
NOTE: This manual is the latest version of the original NWOS-DB users' guide, available at https://www.fia.fs.fed.us/nwos/.
This guide is intended to serve as the official guide to the National Woodland Owner Survey database (NWOS-DB). It serves two broad purposes. One is to provide guidance for the access, use, and interpretation of data by those with access privileges. At present time, this is limited to some USDA employees and a small number of research partners with appropriate data-protection agreements in place. The second purpose is to provide transparency and reference for those interested in the operational and statistical methodology underlying the NWOS (Butler and Caputo, in review; Butler et al., in press)
This guide has four sections. This section provides a general introduction to the NWOS and NWOS-DB. It provides a brief summary of the place of the NWOS within the larger FIA program. It also provides an overview of the design philosophy behind NWOS-DB as well as the broader system of data and tools within which it functions. Section 2 provides a brief overview of the NWOS sample and the process of survey implementation. In a fundamental sense, these define the types of data that are stored in the database. Section 3 describes the tables that constitute the database, their functions, and the conceptual logic that links them together. The final section provides guidance on accessing NWOS data using the nwos R package (Butler and Caputo 2019). It is hoped that this section provides sufficient guidance for a user to access NWOS data in a standardized and convenient format, given basic familiarity with R usage and syntax.
The names of tables and table fields (i.e., columns or attributes) are by convention capitalized in this report (e.g., PLOT_OWNER or OWNCD). In most cases, the association of a given field with a given table is clear given the verbal context (e.g., "In the SAMPLE table, the field NWOS_STUDY is used for…"). Where the association needs to be made explicit, the report uses a naming convention consisting of the table name followed by the field name, separated by a period (e.g., PLOT_OWNER.OWNCD).
The NWOS is part of the USDA Forest Service, Forest Inventory and Analysis (FIA) program. It is one of the three primary components of the program, along with the Timber Products Output (TPO) survey and the core FIA plot-based forest inventory. The purpose of TPO is to provide estimates of wood products production; the survey is administered to primary wood products manufacturers and is designed to elicit the type and quantity of wood products produced as well as other attributes (Coulston et al. 2018). The plot-based inventory (sometimes referred to generically as "the" FIA sample) at its most basic conception consists of a national network of ground plots that are used to estimate "the extent, condition, volume, growth, and depletions of timber" (Burrill et al. 2018) across the entire United States, including affiliated territories (e.g., the Pacific Islands). The NWOS was developed to complement these other two components. Its intended purpose is to better understand the self-reported motivations, activities, intentions, and demographics of private U.S. forest landowners. The primary instruments of the NWOS are landowner questionnaires.
The main module^[A survey ‘module’ is also referred to more generally as a ‘survey’, or as a ‘study’ within portions of the NWOS database and associated literature.] of the NWOS (called the 'base' or 'rural' NWOS) has historically been tied to the network of FIA ground plots (excluding U.S. territories). The NWOS population of interest is private forest landowners and the survey sample frame is derived from the subset of FIA plots that are both forested and privately owned, as determined by the plot-based inventory. Contact information associated with these FIA points is used to develop the NWOS mailing list. This approach has a number of advantages, including the adoption of a rigorous, pre-existing sample and the ability to link questionnaire responses back to measures of physical attributes of the land. When necessary to achieve adequate sample sizes, this base sample has been intensified using a complementary methodology (a more detailed description of the NWOS sampling methodology can be found in Section 2 or Westfall et al., in review).
In recent history, FIA – including the NWOS – has been expanding its purview beyond rural forests. A network of Urban FIA ground plots (similar to the standard network of ‘rural’ ground plots) has been established in multiple cities, to assess urban tree cover and forest resources. Subsequently, an urban version of the NWOS (the Urban National Landowner Survey) was piloted in multiple cities before being implemented fully in six cities in 2019 and 2020. This module is aimed at understanding the values, activities, and perspective of the private owners of urban green space.
The Urban National Landowners Survey is only the first of several new modules of the NWOS that are in various stages of development. In 2019, a pilot test of the NWOS was administered to large corporate ownerships, defined as ownerships owning more than 45,000 acres (Caputo et al. 2017). This marked the first time that a separate, custom survey instrument was created for a specific sub-group of the NWOS sampling frame. In the past, a single, generalized instrument was administered to family, corporate, non-profit, and other private ownerships alike. Moving forward, there is interest in creating customized survey instruments for all such groups. Currently, surveys are being discussed or developed for island territories and tribal ownerships. In the future, similar efforts for other private forest ownerships, nonforest (i.e., "all-lands") ownerships, or even public ownerships may be undertaken. NWOS-DB is being developed as a common repository for all surveys falling under the NWOS banner.
In this report, the word "database" refers to a group of multiple, interlinked tables with a common theme and common function, within the FIA's production space, an Oracle instance (Oracle 12c.2) hosted on Forest Service servers. The NWOS Database (NWOS-DB) consists of a group of tables belonging to a single schema (or user), FS_NWOS, within the FIA production space. Oracle is a relational database system, accessible through SQL (standard query language).
When designing any database structure, a primary consideration is the degree of normalization that will be pursued. Normalization is a foundational database concept that refers very generally to how disaggregated (or decomposed) data are among multiple tables. In a highly normalized database, one would generally find a greater number of (relatively simpler) tables with very specific logic relating them. This allows for reduced redundancy, reduced storage requirements, more optimized performance, and often greater flexibility in the types of data that can be accommodated. There is a tradeoff, however, between normalization and the overall accessibility of the underlying data. Highly normalized databases often require complex relations among tables, and users may find it very difficult to write terse, intuitive queries that correctly return the data the user requires. Views are one way to resolve this tradeoff. Views are essentially pre-established queries that run on-the-fly and that can be referenced like tables (without any redundancy in data storage). Views are often written by the database managers or creators and can be seen as official, 'approved' queries with the correct relations accounted for properly. Users can query views directly, often using simple, intuitive queries, while having some degree of assurance that their queries are 'correct,' while the database still benefits from all the advantages of having a more normalized underlying table structure.
The NWOS database was deliberately designed to have only a moderate degree of normalization. The main rationale behind this choice was the desire to create a simpler, more accessible structure for querying. Having only a few tables with simple and intuitive relations makes it easy to write queries that correctly return the data the user requires. Secondly, NWOS-DB is quite small and streamlined by industry standards and will likely remain so for the foreseeable future. The performance and storage benefits of further normalization do not outweigh the concomitant increase in complexity and loss of accessibility in any meaningful way. For similar reasons, NWOS-DB does not contain any built-in views. While this is partly because of the simple underlying table structure, it is largely because the database was designed to be primarily accessed – not through an SQL client – but through an R package containing a few simple, standardized functions. These functions provide many of the same benefits of database views. They allow users a simple, intuitive mechanism to download the data they want, with joins correctly and automatically made in the background (more on the package below).
NWOS-DB is to be seen as a work-in-progress and must remain responsive and flexible as the needs of the program evolve and as additional surveys and survey instruments are developed. As the total amount and diversity of data increase, it is likely that additional normalization and the inclusion of views will be adopted. These will improve performance and flexibility, while remaining compatible with a model in which an R package is used as the primary frontend and point of access. The tradeoff between normalization and simplicity will be continually assessed over time.
The raw NWOS data are not available publicly due to confidentiality concerns and the presence of personally identifiable information (PII). Access requires a Forest Service Oracle username with basic connection privileges. This username must also be granted an appropriate role within NWOS-DB. There are two such roles, Admin and Analyst. The Admin role has full read and write privileges to all tables. This role is primarily intended for those directly involved in the uploading, cleaning, and management of NWOS data. The Analyst role, on the other hand, has only read (i.e., select) privileges and is intended for researchers and users of NWOS data. The Analyst role is sufficient for accessing raw data using the R package (see below and Section 4 for step-by-step directions).
As mentioned above, the database can be directly accessed through an SQL client (such as PL/SQL or SQL*Plus). Users with either the Admin or Analyst role will be able to run select queries on one or more tables. Users with the Admin role will also be able to update, insert, or delete records through the client. There are certain limitations to accessing the data this way. One is that it is possible to relate tables incorrectly because of the absence of views, resulting in datasets that may be mismatched to a user's purpose. The second is that only "raw" data are stored in the database itself. Weights, imputation sets, and other data sources used for making population-level estimates and analyses are stored outside the database. These data sources, along with tools for accessing, analyzing, and managing NWOS data, comprise a larger NWOS "ecosystem" (see Figure 1).
Within this ecosystem, the preferred method for accessing the database (particularly for users with the Analyst role) is through the nwos R package. This package, freely available through GitHub (https://github.com/familyforestresearchcenter/nwos), allows an authenticated user to access the raw plot-level, sample-level, and questionnaire-level NWOS data. These are downloaded to a user's computer in the form of a custom R object (of type "nwos.object"), which also contains sample weights, imputation sets, and metadata. Additional functions allow users the abilities to generate custom "wide" or "long" datasets, generate metadata tables, run population-level estimates, or complete datasets (i.e., append weights and imputed values) for statistical analyses. There are also functions for exporting datasets, accessing the complete and up-to-date database catalog, and other purposes. For users with the Admin role, the package also contains functions for inserting, updating, and deleting data in the database. Section 4 provides additional information on using the nwos package to access the database, including step-by-step guidance on many of the key functions.
The nwos package accesses a number of data tables other than those stored in the database itself. These are stored on the Forest Service network^[Many of these are likely to be transferred to new tables in a future version of NWOS-DB. The goal is for all tabular data to be stored within the database itself.], accessible to employees and credentialed cooperators, along with a large library of additional scripts used in the implementation of the NWOS (e.g., scripts for drawing the sample, cleaning addresses, generating mailing lists, and logging returns). This library can also be found in a repository on the Forest Service's internal^[This being a private repository, the associated hyperlink is only accessible to those on a Forest Service network and possessing a Forest Service Github account.] GitHub (https://code.fs.usda.gov/forest-service/FS_NWOS). This repository also contains the scripts used to define, create, and update the database in the Oracle production space; as well as an up-to-date version of this guide.
NWOS-DB can be directly joined to tables in other FIA databases (NIMS, UNIMS, etc.) from within the Oracle production space. These are an important part of the NWOS-DB ecosystem, allowing rich analysis that includes both social (e.g., ownership attributes) and biophysical (e.g., tree inventory) data. See Section 3 for more about these linkages.
knitr::include_graphics("NWOS_DBDIAG_ECOSYSTEM.svg")
Figure 1. Relationships among elements in the NWOS-DB "ecosystem." Data are accessed directly through a database client (e.g., PL/SQL, SQL*Plus) or by using the nwos R package. The database is updated in turn through the use of the same package, through the client, or through implementation scripts stored on the Forest Service network. A repository on the internal Forest Service GitHub contains a copy of these implementation scripts as well as the scripts and tools defining the database. Linkages to NIMS, UNIMS, and other FIA databases can be made directly from within the Oracle production space. Solid lines refer to transfer of data. Dashed lines refer to transfer of scripts/code.
The primary sampling unit of the base NWOS is a zero-dimensional sample point (Westfall et al., in review). These points are derived from two different sources: FIA field plots and augmented/intensified sampling points. The FIA field plots, often called Phase 2 or P2 plots from the three-phase sample design adopted by FIA (Bechtold and Patterson 2005), are distributed nationwide using a hybrid of random and systematic sampling approaches. First, a hexagonal sample frame is established across the U.S. Within each cell of this grid, a single sample point is located randomly. At each point, a measurement plot is established. Each measurement plot contains one or more condition classes, defined by a discrete combination of land use, forest type (if forested), ownership group, and other condition class variables (see Bechtold and Patterson 2005 for a complete description of the FIA sampling design). To determine the land use of a given condition class, FIA has established the following definition of 'forest': "Land that has at least 10 percent canopy cover by live tally trees of any size or has had at least 10 percent canopy cover of live tally species in the past, based on the presence of stumps, snags, or other evidence. To qualify, the area must be at least 1.0 acre in size and 120.0 feet wide" (USDA Forest Service 2016).
The nucleus of the NWOS sample is derived from the measurement (i.e., P2) plots. These are stratified by the land use and ownership group of the first condition, which is conventionally located at the center of the measurement plot. The main stratum of interest consists of those plots which are determined to be forested and privately owned. Since only the attributes of the plot center are relevant to the NWOS, each sample plot is in effect no more than a sample point. For the sake of consistency, then, the term sample point will be used in the remainder of this report. Similar sampling designs have been adopted for Urban FIA (and consequently, the Urban National Landowners Survey), albeit with different sampling intensities (i.e., different grid sizes) and condition class definitions.
In 1999, FIA switched its sampling methodology from a periodic one to an annual one, in which a subset of plots (i.e., a panel) is measured each year such that any individual year or group of years within a cycle constitutes an independent sample. Starting in 2019, an annual sampling methodology was established for the NWOS, using P2 data from two years prior as the core of the sample. For periodic surveys (2018 and earlier), the NWOS sample has been organized around a survey "cycle", a period of five or fewer years that constitutes an independent sample. At least initially, annual NWOS surveys are also being organized around a 5-year survey cycle for operational reasons, even though each year or group of years also constitutes an independent sample.
In addition to those points derived from FIA field plots, the NWOS sample also contains augmented and/or intensified points. These refer to additional sets of sample points, which are generated exclusively for the NWOS (i.e., they are not associated with any ground plots) to increase the sample size. Augmented points are generated simply to increase the base sample to an adequate size; intensified sample points are generated for additional purposes, including implementation of custom versions of the base instrument. The terms intensified points and intensification will be used hereafter to refer to both. The process of intensifying the sample frame is similar to the process for locating the original FIA sample points. A hexagonal grid with smaller cell size is overlaid on the initial FIA grid, and one point is located randomly in each empty (i.e., not containing a P2 point) cell. These additional points are iteratively added to the sample until the desired intensity is achieved.
The secondary sampling unit for the NWOS is the ownership, a group of one or more owners that jointly own land. Ownerships may be associated with more than one sample point, but will receive only one survey for a given survey spatial unit (i.e., city for the Urban National Landowners Survey or State for the base NWOS). Therefore, the sample contains fewer unique ownerships than sample points. Points derived from P2 plots are already associated with owner names and contact information; in the case of intensified points, these are derived from commercial data vendors. Ownerships may be part of the sample for multiple cities/States. Each ownership sampled^[The term sample has more than one meaning in this report. It refers in a strict sense to the entire set of sample points and ownerships (regardless of land use and ownership). It also refers in a more general sense to the substrata of interest, those landowners who actually received questionnaires (e.g., for base NWOS, private owners of forested land). The name of the SAMPLE table (see Section 3) is predicated on this second meaning.] is mailed two copies of the questionnaire according the Tailored Design Method (Dillman et al. 2014), with an option to respond to a web-based version. A given ownership may only receive a single copy of the questionnaire if they answer quickly enough (and consequently are removed from the second mailing), or more than two if they request an additional copy or opt to submit the electronic version later in the process. A subset of non-respondents is also contacted for a phone interview, primarily for post-hoc non-response assessment. More information on the implementation of the NWOS, including information on weighting and estimation procedures, can be found in Butler and Caputo (in review) and Butler et al. (in press).
This section provides information about the structure of NWOS-DB. The database catalog includes detailed descriptions of all fields in each table and is accessible in an up-to-date format through the nwos R package (see Section 4). Each field (also known as a column or attribute) is listed in these documents, including a description. For those fields that are coded, a list of the codes and their meanings is included as well.
Each table includes one primary key titled 'CN' (for sequence number). Foreign keys reference the primary key of the table to which they join (e.g., PLOT_OWNER_CN, SAMPLE_CN). Keys are character fields and consist of a concatenation of a 3-digit code (which is a table prefix) and a unique number. For example, a value of CN = 'RES1' is interpreted as the first record in the RESPONSE table. These table-specific prefixes are listed in Table 1. Keys for each table are also described in the catalog.
Table 1. Table-specific primary key prefixes in NWOS-DB. Each value of a table’s primary key consists of the appropriate prefix followed by a unique number.
library(knitr) kable(data.frame(Table=c("CODES", "CONTACTS", "FIELDS", "MODIFICATIONS", "NOTES", "OWNER", "PLOT_OWNER", "QUEST", "QUEST_METADATA", "QUEST_TEXT", "RESPONSE", "SAMPLE", "TABLES"), Prefix=c("COD", "CON", "FLD", "MOD", "NOT", "OWN", "PLT", "QMD", "QST", "RES", "SAM", "TAB", "TXT")),row.names=F)
There are 13 data tables in the database in two table groups, administrative tables (five tables) and survey tables (eight tables). The latter contain data pertaining to the sampling frame, survey organization, and questionnaires. Administrative tables contain database-level metadata as well as notes and modification records for the entire database. All tables belong to the FS_NWOS schema in the Forest Service production space. Figures 2 and 3 illustrate the relationships among these tables.
The eight survey tables are:
The five administrative tables are:
In addition to these 13 data tables, the database also includes 13 update tables. There is one update table corresponding to each of the data tables; each is named using the convention of the name of the corresponding data table followed by the word '_UPDATE' (e.g., SAMPLE_UPDATE, NOTES_UPDATE). The update tables are identical in structure to the data tables, but do not normally contain any data. They are used to load, drop, and alter records as part of routine data management.
knitr::include_graphics("NWOS_DBDIAG_SUR.svg")
Figure 2. Relationships among NWOS-DB survey tables. The figure shows primary keys (underlined), foreign keys (identified with a tilde) and a subset of additional fields.
knitr::include_graphics("NWOS_DBDIAG_ADM.svg")
Figure 3. Relationships among NWOS-DB administrative tables. The figure shows primary keys (in underline), foreign keys (identified with a tilde) and a subset of additional fields. The MODIFICATIONS and NOTES tables can be joined to any other tables (including survey tables) through the RECORD and TABLE_CN fields.
NWOS-DB can be queried using a database client and SQL (standard query language) scripts. For administrative users, this is often a useful and necessary means of access – particularly for special-purpose or infrequent tasks. For the average user, however, a more reliable method is to use to the nwos R package (Butler and Caputo 2019). This R package contains a number of standardized functions for downloading, manipulating, and analyzing NWOS data in a clear and consistent manner.
To use the full capacity of the package, the user must have access to a USDA computer that has been mapped to the internal Forest Service network (i.e. T drive), with read permissions for the NWOS folder space. The computer must be configured with the correct DSN (data source name) for the Forest Service production space (i.e. Oracle instance). In addition to the base R installation, the user will need the RODBC (Ripley and Lapsley 2019) and devtools (Wickham et al. 2019) packages. Finally, it is necessary for the user to have connection privileges to the Forest Service production space and to have been assigned a role within NWOS-DB. With these in place, the user should be able to connect to the production space from a desktop installation of R (R Core Team 2019). The remainder of this section is written from the perspective of a user with the Analyst role. Note: NWOS data is confidential and must be handled using appropriate protocols, including storing any downloaded data on USDA computers or encrypted external media (e.g. flash drives, CDs). Accessing NWOS data without proper credentials is strictly forbidden.
The first step is to install the ‘nwos’ package. This is available as a public repository on GitHub. Start R and run:
devtools::install_github('familyforestresearchcenter/nwos',build_vignettes=T)
This should install the package, including a number of vignettes (a long-form guide to using an R package). These can be viewed using the command:
browseVignettes(package='nwos')
One of these, ‘get-NWOS-data’, demonstrates some of the essential functions for accessing and manipulating NWOS data.
Bechtold, W.A.; Patterson, P.L. 2005. The enhanced Forest Inventory and Analysis program--national sampling design and estimation procedures. Gen. Tech. Rep. SRS-80. Asheville, NC: U.S. Department of Agriculture, Forest Service, Southern Research Station. 85 p. https://doi.org/10.2737/SRS-GTR-80.
Burrill, E. A.; Wilson, A. M.; Turner, J. A.; Pugh, S. A.; Menlove, J.; Christiansen, G.; Conkling, B. L.; David, W. 2018. The Forest Inventory and Analysis Database: database description and user guide version 8.0 for Phase 2. Washington, DC: U.S. Department of Agriculture, Forest Service. 946 p. http://www.fia.fs.fed.us/library/database-documentation/
Butler, B. J.; Caputo, J. 2019. nwos: An R Package for Working with USDA Forest Service, National Woodland Owner Survey Data (version 1.0). https://github.com/familyforestresearchcenter/nwos
Butler, B. J.; Butler, S. M.; Caputo, J.; Dias, J.; Robillard, A.; Sass, E. In press. Family Forest Ownerships of the United States, 2018: Results from the USDA Forest Service, National Woodland Owner Survey. Madison, WI: USDA Forest Service, Northern Research Station. https://doi.org/10.2737/NRS-GTR-199.
Butler, B. J.; Caputo, J. In review. Weighting for the U.S. Forest Service, National Woodland Owner Survey. USDA Forest Service, Northern Research Station.
Butler, B.J.; Hewes, J.H.; Dickinson, B.J.; Andrejczyk, K.; Butler, S.M.; Markowski-Lindsay, M.2016. USDA Forest Service National Woodland Owner Survey: National, regional, and state statistics for family forest and woodland ownerships with 10+ acres, 2011-2013. Res. Bull. NRS-99. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station. 39 p. https://doi.org/10.2737/NRS-RB-99.
Caputo, J.; Butler, B. J.; Hartsell, A.J. 2017. How Large Is Large? Identifying Large Corporate Ownerships in FIA Datasets. Newtown Square, PA: U.S. Department of Agriculture, Forest Service, Northern Research Station.
Coulston, J. W.; Westfall, J. A.; Wear, D. N.; Edgar, C. B.; Prisley, S. P.; Treiman, T. B.; Abt, R. C.; Smith, W. B. 2018. Annual monitoring of US timber production: rationale and design. Forest Science. 64(5): 533-543.
Dillman, D. A.; Smyth, J. D.; Christian, L.M. 2014. Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. 4th ed. Hoboken, NJ: Wiley & Sons.
R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ripley, B.; Lapsley, M. 2019. RODBC: ODBC Database Access. R package version 1.3-16. Available at web address: https://CRAN.R-project.org/package=RODBC
USDA Forest Service. 2016. Forest Inventory and Analysis glossary. Washington, DC: U.S. Department of Agriculture, Forest Service.www.nrs.fs.fed.us/fia/data-tools/State-reports/glossary/default.asp
Wickham, H.; Hester, J.; Chang, W. 2019. devtools: Tools to Make Developing R Packages Easier. R package version 2.2.1. Available at web address: https://CRAN.R-project.org/package=devtools
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.