knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=8, 
  fig.height=6)

What is ForestData?

ForestData is a package that gathers methods to works with dendrometric censuses datasets.

In particular, ForestData has been designed especially to implement and make available Camille Piponiot's corrections algorithms, but it also includes other functions to compute classical rates or metrics (basal area, stem growth, mortality and recruitment) and display them efficiently.

This vignette details how to use these functions, step by step and using the built-in example dataset. ForestData has three main "families" of functions depending on what is intended to do: correct_,compute_ and display_. Each "family" will be detailed, so if you only are interested by one of these items, you can jump directly to the corresponding section.

Install and load the package

ForestData is not yet published on the CRAN. It can easily be downloaded from EcoFoG's Github repository (on this page) and compiled locally, and even more easily be installed directly into R with remotes install_github() function. You can use the following code to install ForestData.

if(!"ForestData" %in% installed.packages()){

  if(!"remotes" %in% installed.packages()){
    install.packages("remotes")
  }

  remotes::install_github("EcoFoG/ForestData")
} 

library(ForestData)

Use ForestData to correct dendrometric censuses data

The main goal of ForestData is to provide generic and simple corrections adapted for forest censuses. Three functions are implemented in order to correct different error sources in such datasets:

After presenting the format and contents of the dataset and a preparation function that simplifies the correction functions calls, we'll go through the presentation of each function in the recommended order (correct_alive, correct_size, correct_recruits). The details of the corrections are found after the examples to simplify the introduction to the package.

Prepare your dataset

A Forest census: data structure

The type of dataset that Forestdata's function are designed to treat is a long-format time series: each line corresponds to a single tree measurements at one census time. Furthermore, the dataset must respect those conditions:

To illustrate the following explanations, a built-in example dataset will be used. It corresponds to the squares 1 and 3 of the 6th Plot of the Paracou Experiment database (more information here)

The dataset can be loaded using the function utils::data.

Let's take a look to its structure:

data(example_census)
str(example_census)

The information used by ForestData's functions is contained in the following columns of this dataset:

All the other columns are only additional information: Family, Genus and Species are taxonomical information and Forest is a geographical indication (where is located the plot to which the tree belongs). This information is invariant for a given tree among its different measurements.

Use prepare_forestdata to simplify your workflow

ForestData's functions have a lot of arguments, including because there is no constraints on the user side for the dataset's column names, and to allow the user setting the functions entirely to adapt it to every specific dataset...

However, most of these arguments are redundant and it can be exhausting to write long calls, only to specify column names for every step.

Fortunately, there is a function that does the job for you: prepare_forestdata. If you specify everything only once using this function, your argument settings will be kept in memory and called automatically by the functions.

For example, let's specify the column names that ForestData needs to correct our dataset:

# specify the example dataset's column names
prepare_forestdata(example_census,
                   plot_col="Plot",
                   id_col="idTree",
                   time_col="CensusYear", 
                   status_col = "CodeAlive",
                   size_col="Circ",
                   measure_type = "C",
                   POM_col = "POM")

As you can see, the arguments of prepare_forestdata are the dataset you are going to run corrections on, the "type of measurement" (either "C" for circumference or "D" for diameter), and the names of the columns containing the informations needed for corrections to run, i.e.:

The column names are simply called with getOption in the next functions. Let's check that these values have been correctly set:

# checking that the options have been set
getOption("plot_col"); getOption("time_col")

Note that if the function is run twice with similar specifications for one or several names, a message indicates that those ones kept unchanged

# 
prepare_forestdata(example_census,plot_col="Plot",id_col="idTree",time_col="CensusYear", status_col = "CodeAlive",size_col="Circ",measure_type = "C",POM_col = "POM")

Moreover, prepare_forestdata assures that your specifications are coherent: if one column name is erroneous, the function stops with an explicit error message.

Use correct_alive to correct tree life status errors in your dataset

Forest censuses are all about tree life statuses: mortality and recruitment depend directly on these variables, and other metrics such as basal area of growth rates should be only computed for live trees. Thus, avoiding errors in tree life statuses is the prime step to ensure that these metrics can be correctly calculated for your forest plots.

correct_alive takes your data with uncorrected tree life status as an input, and detects unseen trees (with no corresponding line or with NA status for a given census year). The function does it by plot if there are several plots with different censusing resolutions. It also spots "reviving" events (when a tree is declared dead but seen alive later on). The function returns the original dataset with additional lines created for unseen trees (in those cases, the trees are set with NA uncorrectedstatus, circumference, and POM), and a new field named "status_corr".

Here is the full call for correct_alive, with our example dataset:

 #Correct it (full call)
example_status_corr <- correct_alive(example_census,
id_col = "idTree",
time_col = "CensusYear",
status_col = "CodeAlive",
plot_col = "Plot",
byplot = TRUE,
dead_confirmation_censuses = 2,
use_size = FALSE,
invariant_columns = c("Genus",
"Species",
"Family",
"Forest",
"binomial_name"))

As you can see, warning messages are sent every time a tree has only one or several records with dead status. This is generally due to the succession of growth over minimum censusing size and death, between two consecutive census. Still, confirming it manually is always useful to ensure that there is no other anomaly.

The function takes 9 arguments as an input:

This function, as well as the other correction functions, include a progress bar to give you a raw estimate of computation advancement in case you deal with particularly big datasets.

If you used prepare_forestdata to preset some of the arguments, the call can be written more shortly:

#Correct it (short version with column names set with prepare_forestdata)

example_status_corr <- correct_alive(example_census,
invariant_columns = c("Genus",
"Species",
"Family",
"Forest",
"binomial_name"))

Let's now take a look to the corrected dataset's structure:

nrow(example_census) #25533 
str(example_status_corr)

The function added 101 lines corresponding to unseen trees. In this example dataset, it corresponds to only 0.4% of the data, thanks to the field verification efforts of the Paracou team. This proportion might be higher, especially for uneasily accesible plots, and life status errors can cause a severe prejudice to mortality rates computation.

Use correct_size to correct tree size measurements

Note: Tree height correction is over the scope of this package. In this vignette "tree size" thus refers to either diameter or circumference.

Here is the full call for correct_size. Here, we apply this correction to the dataset already corrected for life status. We recommend to follow this order.

example_size_corr <- correct_size(example_status_corr, #full
  size_col = "Circ",
  time_col = "CensusYear",
  status_col = "status_corr", # Because we corrected life statuses beforehand
  species_col = "binomial_name", 
  id_col = "idTree",
  POM_col = "POM",
  measure_type = "C",
  positive_growth_threshold = 5,
  negative_growth_threshold = -2,
  default_POM = 1.3,
  pioneers = c("Cecropia", "Pourouma"), #Works with species or genera
  pioneers_treshold = 7.5,
  ignore_POM = FALSE)

Warnings are displayed for trees having only one (or no) non-NA size measurement.

The function has 14 arguments:

If one used prepare_forestdata and corrected for life status beforehand, one has to specify the column containig species names, the positive, negative and pioneers' thresholds, and the name of the fast growing taxa (genera/species) if not letting it to default

# Specifying different thresholds
example_size_corr <- correct_size(example_status_corr,
  species_col = "binomial_name", 
  positive_growth_threshold = 4, #in cm(diameter)/year
  negative_growth_threshold = -1.5, #in cm(diameter), absolute
  pioneers = c("Albizia"), 
  pioneers_treshold = 7)#in cm(diameter)/year

  # Letting all to default with out example dataset
example_size_corr <- correct_size(example_status_corr, species_col="binomial_name")

Use correct_recruits to spot and correct overgrown recruits

correct_recruits spots overgrown recruits according to the specified minimum diameter limit, and a tolerated growth threshold. It adds lines to the dataset corresponding to overgrown recruits' hypothetical sizes before first record. These lines are tagged with the column "corrected_recruits" (FALSE for regular data, TRUE for lines generated with this function). For those, only the corrected size (size_corr) is specified along with tree ID and invariant columns. The columns originally containing size, POM and life status (raw and corrected) are let to NA.

Corrected size values for overgrown recruits should not be used to compute growth rates, and more generally, we advise to use this correction only for the specific cases that absolutely require it. Before correcting for overgrown recruits, we advise to correct life statuses and sizes beforehand.

Here is the full call of the function:

example_recruits <- correct_recruits(example_size_corr,
  dbh_min = 10,
  positive_growth_threshold = 5,
  time_col = "CensusYear",
  id_col = "idTree",
  plot_col = "Plot",
  size_corr_col = "size_corr", #because we already corrected
  status_corr_col = "status_corr", # both size and status
  measure_type = "C",
  invariant_columns = c("Forest", "Family", "Genus", "Species", "binomial_name"),
  byplot = TRUE,
  correct_status = FALSE
)

The function has 12 arguments:

Short version with prepare_forestdata:

example_recruits <- correct_recruits(example_size_corr,
  dbh_min = 10,
  positive_growth_threshold = 5,
  invariant_columns = c("Forest", "Family", "Genus", "Species", "binomial_name")
)

Use display_corrected_trees to generate the graphs and check the corrections.

Oncoming section

Additional details on the corrections

Size corrections

Tree sizes measurements are essential information to study the dynamics of a forest plot over time, through tree stem growth rates. This implies a need to ensure that size measurements are correct. There is a consensus regarding the need to standardize these metrics using the same Point of Measurement (POM) for each tree whenever possible (breast height, 1.30m). However, some trees develop buttresses or defaults that prevent measuring stem size accurately at breast height, and imply to uprise the POM. Most of the error regarding tree size comes from either uncorrect to no report of POM shifts, or imprecisions using a measuring tape. We categorize size measurement errors according to the direction (negative, positive) and temporality (transient, permanent) of the anomalies.

Thresholds and pioneer species

Except for explicit POM changes, anomalies are detected by correct_size using thresholds defining abnormal increase/decrease in tree size. These thresholds depend on the protocol applied for a given census (e.g., basal measurement error rate and field verification protocols, if any) as well as empirical considerations regarding tree growth.

For the Paracou Disturbance Experiment, an abnormal decrease is considered to be under -2cm (absolute, between-census diameter growth) and an abnormal increase is over 5cm/year (annual diameter growth). These apparently permissive thresholds are due to the 35 years of experience gained from the begining of the experiment, as well as intensive and strict field verifications (apurement protocol by A. Dourdain) carried out in parallel of the censusing campaigns, allowing to verificate directly most of the doubtful measurements.

In reality, some heliophilous species car grow far more rapidly than others, and can cause "false positive" anomaly detections. To avoid uprising the growth threshold and the chance of non-detection for real anomalies, it seems better to discriminate, at least grossly, this group of species: In correct_size, we added two arguments to specify a list of generally fast growing (pioneer) species or genera, and a greater positive growth threshold to apply to those species.

For Paracou, practical experience suggests that 7.5cm/year is a reasonable threshold for these species.

The choice of these thresholds is let to the user (although they default to values used for the Paracou database), and these must be set according to the dataset one wants to correct.

Punctual errors

Punctual errors are transient, abnormal increases or decreases in tree size measurement, that eventually returns to "normal" values. This implies to define a threshold to discriminate "abnormal" increases and decreases, as well as the concept of "return-to-normal".

In ForestData's correct_size function, we considered that an error is punctual if one of the two next census' values present a similar change in size, with opposite direction, that counterbalances the anomaly. These can be caused by an erroneous measurement or estimation of tree size at a given census, more rarely for two consecutive censuses.

Punctual errors are corrected by replacing the abnormal values with more reasonable estimates. That is simply done by linearily interpolating these value between the no-NA measurements surrounding the anomaly: right before the increa/decrease, and at the "return-to-normal" point.

Permanent errors, or shifts

Permanent errors or shifts are definitive uprises or downrises of size measurement values, with no detectable "return-to-normal" point.

In most of the case, these are permanent decreases due to POM change: when POM is uprisen, due to the semi-conic shape of the trunk, the measured size is inferior than if it was measured accurately at breast height. Shifts can also (more rarely) be caused by size estimation rather than measurement (or even to the transition from estimation to actual measurement), for stems showing a highly non-cylindric conformation, and may be positive or negative in such cases.

In correct_size, shifts are corrected using two methods:

Overgrown recruits

Most forest inventories focus on trees with DBH over 10cm, or sometimes 1cm. Every tree (or stem) is, supposedly, tagged with a label at the first inventory and followed accross censuses. Recruited trees are newcomers that pass the minimum censusing diameter limit between two campaigns. Ideally, one expects these tree to measure just a few more than this limit, especially if the temporal resolution of the inventory is low - trees generally grow slowly. Having defined a growth threshold T (in cm/year) and with I, the interval between two census (in years), the size of a tree at recruitment time should not exceed 10+T*I.

Overgrown recruits are trees that violate this assumption, generally because those weren't noticed before, for some reason. These individuals can be problematic when it comes to estimate basal area or biomass, especially if they are adult trees. correct_recruits re-creates the previous growth trajectory of those trees using linear extrapolation.

Being highly hypothetical, corrected size values for overgrown recruits should not be used to compute growth rates, and more generally, we advise to use this correction only for the specific cases that absolutely require it.

Use ForestData to compute and display classical metrics in Forest Ecology

In addition to the corrections the package provides functions to compute and display the classical metrics used in Forest Science: annual or absolute growth rates, annual recruitment and mortality rates, and basal area. In this section, we will go through examples of these features to show the possibilities of these tools.

Compute and display mortality and growth rates, or both

The functions compute_mortality and compute_recruitment compute annual mortality or recruitment rates by plot or globally.

Let's see an example with our built-in dataset:

data(example_status_corr)

## Full calls for the corrected dataset
mortality <- compute_mortality(example_status_corr, 
 status_col="status_corr", #When corrected is set to FALSE, correct_alive is triggered, thus remember to specify the column containing uncorrected tree statuses.
 time_col="CensusYear",
 id_col="idTree",
 dead_confirmation_censuses=2,
 byplot = TRUE,
 plot_col = "Plot",
 corrected = TRUE)

recruitment <- compute_recruitment(example_status_corr,
 status_col="status_corr", #When corrected is set to FALSE, correct_alive is triggered, thus remember to specify the column containing uncorrected tree statuses.
 time_col="CensusYear",
 id_col="idTree",
 dead_confirmation_censuses=2,
 byplot = TRUE,
 plot_col = "Plot",
 corrected = TRUE)

summary(mortality); summary(recruitment)

Both functions return comparable outputs and have the same arguments:

The functions arguments can be fully specified, or the call can also be simplified if using prepare_forestdata. This also works:

mortality <- compute_mortality(data,
                  status_col="status_corr",
                  dead_confirmation_censuses=2)

recruitment <- compute_recruitment(data,
                                   status_col="status_corr",
                                   dead_confirmation_censuses=2)

When the argument corrected = TRUE, just remember to specify the name of the column containing uncorrected tree life status, and do everything else normally, for example:

data(example_census)
mortality <- compute_mortality(data,
                  status_col="CodeAlive",#Raw tree status
                  dead_confirmation_censuses=2,
                  corrected = F)

recruitment <- compute_recruitment(data,
                                   status_col="CodeAlive",#Raw tree status
                                   dead_confirmation_censuses=2,
                                   corrected = F)

The package also includes compute_rates, a wrapper for the previous functions that allows to compute both rates jointly:

data(example_status_corr)
# Full Call
rates <- compute_rates(example_status_corr,
                       status_col="status_corr",
                       time_col="CensusYear",
                       id_col="idTree",
                       dead_confirmation_censuses=2,
                       byplot = TRUE,
                       plot_col = "Plot",
                       corrected = TRUE)
# Simplified Call
rates <- compute_rates(example_status_corr,
                       status_col="status_corr",
                       dead_confirmation_censuses=2)

str(rates)

The use of corrected = FALSE is similar to what we saw above.

data(example_census)
compute_rates(example_census,
              status_col = "CodeAlive",#Raw tree statuses
              dead_confirmation_censuses=2,
              corrected=F)

Display mortality and growth rates

display_mortality(mortality)
display_recruitment(recruitment)
data(example_status_corr)
rates <- compute_rates(example_status_corr)
 display_rates(rates = rates)

Compute and display basal_area with compute_ba and display_ba

Compute and display annual and absolute growth with compute_growth and display_growth

data(example_size_corr)
growth <- compute_growth(example_size_corr,
                         size_col = "size_corr",
                         measure_type = "cir",
                         status_col = "CodeAlive",
                         id_col= "idTree",
                         time_col = "CensusYear",
                         what_output = "annual",
                         aggregate = T,
                         by = c("Plot"),
                         stat = "mean",
                         percentiles = c(5,95))

display_growth(growth)

# growth <- compute_growth(example_census)
# growth %>% filter(annual_growth<0)


EcoFoG/ForestData documentation built on Jan. 20, 2021, 10:04 a.m.