knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE, warning = FALSE
)
library(allometree)
#devtools::load_all() # to knit manually

1. Model selection {#section1}

In this document, we are using relationship between two size parameters as an example: Predicting tree height from trunk diameter. See the vignette Get started with allometree for a full description of the dataset and allometric equations.

For one species

For a particular species of interest, data will be fit to all allometric equations. The best-fit model is selected based on the lowest bias-corrected Aikaike’s information criterion (AICc) value. Here's an example to select the best-fit model for the species Albizia saman, using the function ss_modelselect().

data(urbantrees, package = "allometree")

Alb_sam <- urbantrees[urbantrees$species == "Albizia saman", ]  # subset data for one species

results <- ss_modelselect(Alb_sam, 
                          response = "height", predictor = "diameter")  # specify colnames of variables

results is a list of 3 elements. More information on the output can be found at ?ss_modelselect.

  1. Table showing models ranked by AICc value, named all_models_rank.
  2. Best-fit model object, named best_model.
  3. Table showing information on the best-fit model, named best_model_info.
head(results$all_models_rank)
results$best_model
results$best_model_info

For multiple species

We can also run ss_modelselect() across multiple species in the full dataset urbantrees. For this, we use the wrapper function ss_modelselect_multi().

results_all <- ss_modelselect_multi(urbantrees, species = "species", # specify colname of species
                                    response = "height", predictor = "diameter")

results_all is a list of 3 elements:

  1. List of tables showing each species' candidate models ranked by AICc value, named ss_models_rank.
  2. List of each species' best-fit model object, named ss_models.
  3. Table showing each species' best-fit model information, named ss_models_info.

 

An overview of the best-fit models across the r nrow(results_all$ss_models_info) species:

results_all$ss_models_info

2. Outlier removal

Let's say we want to check for outlier/influential data points, remove them, and re-fit the filtered dataset to the pre-selected models as defined in results_all$ss_models_info. First, lets visualise these outliers, say, for the species Xanthostemon chrysanthus.

par(mfrow=c(2,2)); plot(results_all$ss_models$`Xanthostemon chrysanthus`)

 

There are numerous ways to deal with outliers. One way is to remove them based on Cook's Distance (bottom-right plot). In general use, outliers may be defined as points with a Cook's distance more than four times the mean Cook's distance (red line in plot below). Note that this threshold is not fixed, and may be adjusted according to biological knowledge of each tree species.

cooks_dist <- cooks.distance(results_all$ss_models$`Xanthostemon chrysanthus`)
plot(cooks_dist, pch = ".", cex = 2, main = "Outliers by Cook's Distance")
abline(h = 4 * mean(cooks_dist, na.rm = T), col = "red")  # threshold line 
text(x = 1:length(cooks_dist) + 1, y = cooks_dist, 
     labels = ifelse(cooks_dist > 4 * mean(cooks_dist, na.rm=T), names(cooks_dist), ""), col = "red")  # outlier labels

 

Lets remove outliers across all species in urbantrees. We can create a function uncooker() that does so for each species using a for() loop, based on their respective best-fit model in results_all$ss_models:

uncooker <- function(data, modellist){

  result <- data[0,]

  for(i in 1:length(modellist)){ # loop over all species 

    cooks_dist <- cooks.distance(modellist[[i]]) 
    outliers <- as.numeric(which(cooks_dist > 4 * mean(cooks_dist, na.rm=T)))  # threshold
    subset <- data[(data$species == names(modellist[i])),][-outliers,]
    result <- rbind.data.frame(result, subset)
  }
  return(result)
}

# run the function uncooker()
urbantrees_clean <- uncooker(urbantrees, results_all$ss_models)

nrow(urbantrees_clean)

r nrow(urbantrees) - nrow(urbantrees_clean) data points were removed to obtain the filtered dataset urbantrees_clean (n = r nrow(urbantrees_clean)).

rm(cooks_dist, uncooker)

3. Re-fit filtered data

For one species

We can now fit urbantrees_clean to the best-fit equations we defined in Section 1. Here's an example for the species Albizia saman again, using the function ss_modelfit(). This function has an additional argument modelcode. In the case of Albizia saman, it is lin_w1 (see results_all$ss_models_info). Note that you can also pick any one of the options found in eqns_info$modelcode.

 

We'll overwrite our previously-defined variables Alb_sam and results, this time with the filtered dataset:

Alb_sam <- urbantrees_clean[urbantrees_clean$species == "Albizia saman", ]

results <- ss_modelfit(Alb_sam, modelcode = "lin_w1", # specify modelcode
                       response = "height", predictor = "diameter") 

results is now a list of 2 elements:

  1. Resulting model object, named fitted_model.
  2. Table showing information on the resulting model, named fitted_model_info.

For multiple species

Similar to the the wrapper function ss_modelselect_multi(), we can also run ss_modelfit_multi() to fit pre-selected models across multiple species. In this function, we need to input a reference table ref_table (i.e. results_all$ss_models_info) that provides information on the species and their corresponding modelcode.

results_all <- ss_modelfit_multi(urbantrees_clean, 
                                 ref_table = results_all$ss_models_info,
                                 species = "species", # colname in both data & ref_table
                                 modelcode = "modelcode", # colname in ref_table
                                 response = "height", predictor = "diameter")

results_all is now a list of 2 elements:

  1. List of each species' resulting model object, named ss_models.
  2. Table showing each species' resulting model information, named ss_models_info.

 

To make our subsequent code less verbose, let's assign the elements ss_models and ss_models_info to variables with the same name:

ss_models <- results_all$ss_models
ss_models_info <- results_all$ss_models_info
rm(results_all, results, Alb_sam)

4. Make predictions

We can simulate some data for a species of interest, and use it's model to make predictions. We'll use the species Albizia saman again as an example. Let's first generate the data based on the range of values for the predictor variable (i.e. diameter size) that was used to fit the model, using the function generate_x():

Alb_sam <- urbantrees[urbantrees$species == 'Albizia saman', ]  

# generate data for subsequent predictions
newdata <- generate_x(Alb_sam, response = "height", predictor = "diameter")

 

ss_predict() can be used to make predictions on the simulated data predict_range_full. This appends columns for predicted height (fit), as well as the lower (lwr) and upper (upr) bounds of the prediction interval:

predictions <- ss_predict(newdata, 
                          models = ss_models, 
                          ref_table = ss_models_info, 
                          predictor = "predictor")
head(predictions)

 

Note that you can simulate data automatically using the ss_simulate() function, which is also a wrapper to ss_predict(). This can be done for one or multiple species, using the argument select_sp. The simulated data can be extrapolated beyond the range used to fit the model, using the argument extrapolate. Here's an example using the models that we previously derived from the filtered dataset urbantrees_clean. Let's say we're interested in Albizia saman, Terminalia mantaly, and Xanthostemon chrysanthus, and would like to predict their height across a diameter size of 0 – 1 m. We'll overwrite our previously-defined variable predictions:

predictions <- ss_simulate(ref_table = ss_models_info, models = ss_models, 
                           select_sp = c("Albizia saman", "Terminalia mantaly", "Xanthostemon chrysanthus"),
                           extrapolate = c(0,1)) 
head(predictions)

5. Plot predictions

We can plot out our predictions using ggplot():

library(ggplot2)

ggplot() +
  facet_wrap(~ species)+ 
  geom_point(data = dplyr::filter(urbantrees, species %in% predictions$species), aes(x = diameter, y = height),
             size=0.35, alpha = 0.9, color = "grey50") +

  # prediction interval
  geom_ribbon(data = predictions, aes(x = predictor, ymin = lwr, ymax = upr), 
              alpha = 0.10) +

  # regression line
  geom_line(data = predictions, aes(x = predictor, y = fit, lty = extrapolated),
            size = 0.8, alpha = 0.6) +

  scale_linetype_manual(values=c("No"= 1 , "Low" = 3, "High" = 3), name = "Extrapolated") +

  xlab("Diameter (m)") + ylab("Height (m)") +

  coord_cartesian(ylim=c(0, 25), xlim= c(0,1))

 

Interpretations of the allometric relationships will vary depending on what type of parameter is analysed (e.g. tree height, trunk diameter, age, crown height, crown diameter and leaf area). In our example, we'll have to be cautious in interpretating extrapolated predictions. As you can see in the figure, extrapolations down to a diameter size of zero for Albizia saman and Terminalia mantaly would not make sense (i.e. the tree cannot have a positive height at that diameter size). Also, while an exponential relationship shows rapid scaling in height for Terminalia mantaly, it would have a natural height limit (i.e. would not continue growing taller at such a steep slope). Hence, it is important that allometric relationships are interpreted in conjunction with information on the biology and growth (e.g. environmental and management) conditions associated with the trees.


References

McPherson E. G., van Doorn N. S. & Peper P. J. (2016) Urban Tree Database and Allometric Equations. General Technical Report PSW-GTR-253, USDA Forest Service, 86.

Song, X. P., Lai, H. R., Wijedasa, L. S., Tan, P. Y., Edwards, P. J., & Richards, D. R. (2020), Height–diameter allometry for the management of city trees in the tropics. Environmental Research Letters, 15, 114017. https://doi.org/10.1088/1748-9326/abbbad



xp-song/allometree documentation built on March 28, 2022, 4:36 a.m.