knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

RDendrom

An R package for analysis, diagnostics, and presentation of intra-annual tree growth time series.

RDendrom is a suite of functions designed to handle time series data from 'manual' dendrometer band measurements conducted intensively through the growing season. The data enter in a set 'long' form with such as in the following header file.

devtools::install_github("seanmcm/RDendrom")
library(RDendrom)
data(INPUT_data)
head(INPUT.data)

Here we see the basic structure necessary to use the Rdendrom package. The basic variables deal with sites and tags and bands, etc. What must be included is the 'TREE_ID', 'SITE', 'BAND_NUM', 'YEAR', 'ORG_DBH', 'GAP_WIDTH', and the four 'STATUS' columns, which will be detailed later. The status columns should be set by default at all zeros (they should only accept integer ones or zeros), and can be ignored. However, I developed them to be able to handle some of the common adjustments that need to be made to dendrometer band data, some of which have useful applications within 'Rdendrom' functions. 'UNIQUE_ID' is important for some subfunctions that split the data according to the 'SITE', 'TREE_ID', and 'BAND_NUM', or any other combination that uniquely identifies a specific time-series.

Also note that 'BAND_NUM' is a vector of integers from 1 to the number of bands that have been recorded on that tree (not unique identifiers but just the series of bands). Also, 'GAP_WIDTH' must have no na's (i.e., only measured vaules and no missing values). The following code will ensure these two requirements are met.

band.index <- as.numeric(table(INPUT.data$BAND_NUM))
INPUT.data$BAND_NUM <- unlist(mapply(rep, seq(length(band.index)), length.out = band.index))

INPUT.data <- subset(INPUT.data, complete.cases(INPUT.data$GAP_WIDTH))

Once imported and prepped, the first script that needs to be run is the get.optimized.dendro() function.

get.optimized.dendro(INPUT.data,
  no.neg.growth = TRUE, cutoff = 9,
  par.names = c("L", "K", "doyip", "r", "theta", "a", "b", "r.squared", "ts.sd"),
  OUTPUT.folder = ".",
  param.table.name = sprintf("Param_table.csv"),
  Dendro.data.name = sprintf("Dendro_data.Rdata"),
  Dendro.split.name = sprintf("Dendro_data_split.Rdata"))

This takes in the INPUT.data loaded above first estimates diameter at breast height (DBH) based on the baseline DBH (measured from diameter tape)('DBH_ORG'). 'DBH_ORG' must have a value for the first observation of a 'TREE_ID', but can be empty (or repeat that value) after. It does not matter. The function then builds the 'DBH' column from the 'GAP_WIDTH' measurements.

Note that errors in measurements enter from the 'GAP_WIDTH' column, and not the 'DBH' column. 'DBH' is a transformation of gap width into diameter, so any vetting of the raw data due to problem measurements need to address the 'GAP_WIDTH' value.

The function then corrects the 'DBH' value by estimating the true DBH (appropriately labeled 'DBH_TRUE') from the gap width and dbh values using an optimization algorithm written by KC Kushman at Brown University. This corrects for the fact that gaps are not arcs in the circumference but chords, yet we treat them as arcs. The correction is miniscule in most cases, but for small stems with large 'GAP_WIDTH' values, it may lead to biases over time. Regardless of its scientific value, the new values in the column 'DBH_TRUE' are very important because every single analytical, diagnostic, and plotting funcitonin 'Rdendrom' uses this column, and not 'DBH'. If there is public outcry, we could change 'DBH_TRUE' to 'DBH', and change 'DBH' to 'DBH_chord' or something nerdy like that. Cheers!

Having populated the 'DBH_TRUE' column, the function shifts into curve fitting. This involves fitting the LG5 General Addative Model detailed below. If users want to ignore the curve fitting, whether to do it themselves or because they hate curves, let me know, but note: I've spent a ton of time trying different approaches, including Bayesian methods in stan, and a million other ways--this works! But it would be trivial to separate the 'DBH_TRUE' workflow from the fitting.

For fitting, the data must first be filtered. The curve fitting algorithm may not work well with small sample sizes in a year, poor data, extremely small growth, dead trees, shrinking trees, etc. Because this package is designed to be applied to large datasets with many fits, and do this objectively(-ish) and repeatedly, precisely, efficiently, and other adverbs, we filter out certain cases. The cases follow in pseudo-code from the get.optimized.dendro() function.

    IF value of '1' for the 'REMOVE' variable
        remove observation
    IF value of '1' for the 'IGNORE' column
        **'skip'** time series
    IF number of measurements in a year < *cutoff* (default = 9)
        **'skip'** time series
    RUN *lm()* with an intercept, slope, and polynomial term on the time series and get the parameter values
        IF lm slope parameter is negative
            set "IGNORE" to **1**
            **skip**

We then fit the time series data using get.params(). See help files (or code) for details.

OUTPUT from the fitting function and extra metrics

It is important to NOTE that the output of the fitting functions are files and not objects. The default folder is the present working directory, but I recommend making an "OUTPUT" directory to hold the sequence of these files. The output files from the first fitting function are input to the second. The output from the second ammend the first but do not overwrite them. I'd be glad to adjust this as folks see fit.

OBJECTS are generally referred to with dot '.' separators, whereas file names use underscores "_" for the same separation. So for example, the Dendro.split object is stored in the Dendro_split.Rdata file (by default).

There are four files from the output. Three are list objects that contain the same information, but are in different structures, and the fourth is a csv file, param.table, of parameters and extra metrics. The list files are all of the time series data either - combined, the object Dendro.complete is in the 'Dendro_data...' Rdata file (name default designated in the argument) - split by tree id, the object Dendro.tree - split by tree id, year, and band (i.e., the minimal designation of a time series of diameters), is the object Dendro.split.

So to get advanced metrics, we reload the output files from the fitting and run the get.extra.metrics() function.

param.table.name = sprintf("Param_table.csv")
Dendro.data.name = sprintf("Dendro_data.Rdata")
Dendro.split.name = sprintf("Dendro_data_split.Rdata")
OUTPUT.folder <- c(".")
param.table <- read.csv(file = paste(OUTPUT.folder, param.table.name, sep = "/"))
load(file = paste(OUTPUT.folder, Dendro.data.name, sep = "/")) # loads Dendro.complete
load(file = paste(OUTPUT.folder, Dendro.split.name, sep = "/")) #loads Dendro.split
load(file = paste(OUTPUT.folder, "Dendro_Tree.Rdata", sep = "/")) #loads Dendro.tree

get.extra.metrics(
  param.table,
  Dendro.split,
  OUTPUT.folder      = ".",
  param.table.name = sprintf("Param_table_complete.csv"),
  Dendro.data.name = sprintf("Dendro_data_complete.Rdata"),
  Dendro.split.name = sprintf("Dendro_data_split_complete.Rdata"),
  Quantile.hull.name = "Quantile.hull.Rdata")
param.table.name = sprintf("Param_table_complete.csv")
OUTPUT.folder <- c(".")
param.table.extended <- read.csv(file = paste(OUTPUT.folder, param.table.name, sep = "/"))
head(param.table.extended)

Output from this function is similar to the ouptput of the first fitting function,except for an extra list object file, with the name Quantile.hull.name. This is a list object of length n.time.series where each element is the the fitting information from QH.list.

Function fitting details

The primary approach to 'processing' data is through fitting a three-part general additive model (GAM) to the data. This generally follows McMahon and Parker 2014, Ecology and Evolution. The core of the procedure is the fitting of a 5-parameter logistic function to the growth time series. The two other parts of the GAM are the starting diameter and ending diameter which essentially estimates the leaf-off dormant diameter at breast height (DBH) from the winter before and after the growing season. The LG5 function relates dbh, or the diameter measurements at a given day of the year. for a designated series of days.

$$ dbh = L + \frac{(K - L)} { 1 + 1/ \theta * exp(-r (doy - doy_{ip})/ \theta) ^{\theta}} $$

The LG5 comes from a family of functions, known collectively as Richards models, that can produce good candidate parameterizations for intra-annual tree growth data [Oswald:2012]. We found the effort to fit so many parameters worthwhile because the parameter number is important for precise fits, necessary to discover small fluctuations in growth during the year. Four of the parameters have straightforward interpretation: two parameters define the lower and upper asymptotes (L and K respectively), doyip marks the day of the year when the inflection of the curve is predicted to occur, and the rate parameter r describes the slope of the curve at the inflection point. The final parameter, θ allows asymmetrical fits by changing the approach of to the upper asymptote. This is a critical parameter as when θ = 1, the curve is symmetrical. But growth forms are rarely symmetrical, so its inclusion merits the increased efforts in fitting a 5-parameter model.

This fit is estimated using two passes through the optim() function. The resulting curve (which is fit without constraining L or K to starting and stopping diamters) is then combined with estimates of these diameters of growth initiation and cessation (a and b respectively). The horizontal lines at a and b are then combined with the LG5 curve to produce the GAM that best fits the annual time series data. These three pieces of the GAM are combined in order to allow optimal fits to the growth curve while capturing the sometimes abrupt beginning of bole growth and smoother asymptote of ending of growth.

Analyzing

After the initial fits are conducted, additional metrics can be estimated. The primary function for this is the get.extra.metrics() function. See the vignette for examples and the help file for details.

Vetting

The two vetting functions, id.outliers() and find.outliers(), allow either manual or automatic designation of problem measurements respectively. id.outliers() allows interactive (point and click) identification of outliers that are flagged, re-incorporated into the data, and then ignored during re-runs of parameter estimation. find.outliers() uses residuals from the fit curves to flag values outside of a designated numer of standard deviations from zero (default = 3).

It is worth noting that within the get.optimized.dendro() function there are filters that also serve to vet data. Insufficient sample size (default = 9 values) results in skipping the fitting function. That function also fits a linear model to the data and if the slope is negative (representing a dead or shrinking stem), then fitting is skipped. The fits also produce R2 values for every fit that is implemented, so that further vetting and subsetting of data for specific analyses can take into account how well the curves fit the data.

Presenting

There are several plotting functions that present the data directly, with fits, with hulls, over years, and aggregated. Help files explain these different plotting possibilities.

make.dendro.plot.ts(ts.data = Dendro.split[[3]], params = param.table[3, ],
  day = seq(365))
make.dendro.plot.tree(Dendro.ind = Dendro.tree[[1]], param.tab = subset(param.table, TREE_ID == Dendro.tree[[1]]$TREE_ID[1]))


seanmcm/RDendrom documentation built on March 2, 2024, 3:53 p.m.