knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4, tibble.print_max = 4)
library(forestexplorR) library(dplyr)
This vignette describes how mapped forest stand datasets should be formatted to ensure full compatibility with forestexplorR and introduces some data checking functions to assist with the formatting process.
The forestexplorR package expects raw mapped forest stand data to consist of at least two distinct files:
Mapping data - coordinate locations of individual trees within stands
Tree census data - measurements of individual trees taken during repeated censuses of the stands
The mapping dataset should be a data frame where each row represents a unique
tree and contains its identifying and location information. The built-in
dataset mapping
is a cleaned and correctly formatted example:
head(mapping)
To ensure full compatibility with forestexplorR functions, the mapping dataset
should contain the columns: tree_id
, stand_id
, tag
, species
, x_coord
,
and y_coord
. Additional columns can also be included without disrupting
function performance and column order is not important. For details on what each
required column represents, use ?mapping
. If only a few functions from
forestexplorR will be applied, some columns in mapping
may not be required
(e.g. the column tag
is only required for stand_map()
) so please read
individual function documentation before spending time generating any missing
columns.
The tree census dataset should be a data frame where each row represents the
measurement data for a single tree during a specific census of the stand. This
means each tree will appear on x rows where x is the number of censuses in
which that tree was measured. The required columns are: tree_id
, stand_id
,
species
, year
and dbh
. For details on what each required column
represents, use ?tree
. The built-in dataset tree
is a cleaned and
correctly formatted example:
head(tree)
Additional columns can also be included in the tree census dataset without
disrupting function performance (e.g. tag
in tree
) and a mort
column
containing mortality status of each tree during each status is required to use
mortality_model()
. Column order is not important.
To obtain accurate and useful neighborhood descriptions it is important that the mapping and tree census datasets are cleaned of any unusual or missing data. The forestexplorR package contains functions to assist with the cleaning process but these functions only highlight cases of missing data rather than automatically removing the associated observations. This is because not all cases of missing data prevent all types of analysis and some missing data can be inferred from data collection records (e.g. missing year of measurement in tree census data).
mapping_check()
: Check mapping datasetsThe built-in dataset messy_mapping
contains examples of common data errors in
mapping datasets. For instance, there are 10 tree ids that are connected to more
than one mapping record:
messy_mapping %>% group_by(tree_id) %>% summarize(count = n()) %>% filter(count > 1)
The mapping_check()
function checks a mapping dataset for a variety of common
errors and returns a list containing two elements. The first element
($problem_trees
) is a data frame containing the rows of the input mapping
dataset that contain issues, with an additional final column describing the
issue. The arguments "max_x" and "max_y" must be provided so the function can
check for x and y coordinates beyond the stand boundary.
map_issues <- mapping_check(messy_mapping, max_x = 100, max_y = 100) head(map_issues$problem_trees)
The second element ($issue_summary
) is a data frame summarizing the number and
percentage of trees in the mapping dataset that have one or more problems and
each specific type of problem.
head(map_issues$issue_summary)
tree_check()
: Check tree census datasetsThe built-in dataset messy_tree
contains examples of common data errors in
tree census datasets. For instance, there are 10 tree measurement records that
have no dbh information:
messy_tree %>% filter(is.na(dbh))
The tree_check()
function checks a tree census dataset for a variety of common
errors. A mapping dataset needs to be supplied to the function so that trees in
the tree census data that have no associated mapping can be identified.
tree_check()
returns a list containing two elements. The first element
($problem_trees
) is a data frame containing the tree ids that were flagged as
having a data issue and summarizes the issue.
tree_issues <- tree_check(tree_data = messy_tree, map_data = mapping) head(tree_issues$problem_trees)
The second element ($issue_summary
) is a data frame summarizing the number and
percentage of trees in the tree census dataset that have one or more problems
and each specific type of problem. Note that a tree id will be flagged if just
one of its measurement records contains an issue, so many flagged trees are
likely to be usable for most analyses.
head(tree_issues$issue_summary)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.