knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE )
library(allometree) #devtools::load_all() # to knit manually
In this document, we are using relationship between two size parameters as an example: Predicting tree height from trunk diameter. See the vignette Get started with allometree for a full description of the dataset and allometric equations.
For a particular species of interest, data will be fit to all allometric equations. The best-fit model is selected based on the lowest bias-corrected Aikaike’s information criterion (AICc) value. Here's an example to select the best-fit model for the species Albizia saman, using the function ss_modelselect()
.
data(urbantrees, package = "allometree") Alb_sam <- urbantrees[urbantrees$species == "Albizia saman", ] # subset data for one species results <- ss_modelselect(Alb_sam, response = "height", predictor = "diameter") # specify colnames of variables
results
is a list of 3 elements. More information on the output can be found at ?ss_modelselect
.
all_models_rank
.best_model
.best_model_info
. head(results$all_models_rank)
results$best_model
results$best_model_info
We can also run ss_modelselect()
across multiple species in the full dataset urbantrees
. For this, we use the wrapper function ss_modelselect_multi()
.
results_all <- ss_modelselect_multi(urbantrees, species = "species", # specify colname of species response = "height", predictor = "diameter")
results_all
is a list of 3 elements:
ss_models_rank
.ss_models
.ss_models_info
.
An overview of the best-fit models across the r nrow(results_all$ss_models_info)
species:
results_all$ss_models_info
Let's say we want to check for outlier/influential data points, remove them, and re-fit the filtered dataset to the pre-selected models as defined in results_all$ss_models_info
. First, lets visualise these outliers, say, for the species Xanthostemon chrysanthus.
par(mfrow=c(2,2)); plot(results_all$ss_models$`Xanthostemon chrysanthus`)
There are numerous ways to deal with outliers. One way is to remove them based on Cook's Distance (bottom-right plot). In general use, outliers may be defined as points with a Cook's distance more than four times the mean Cook's distance (red line in plot below). Note that this threshold is not fixed, and may be adjusted according to biological knowledge of each tree species.
cooks_dist <- cooks.distance(results_all$ss_models$`Xanthostemon chrysanthus`) plot(cooks_dist, pch = ".", cex = 2, main = "Outliers by Cook's Distance") abline(h = 4 * mean(cooks_dist, na.rm = T), col = "red") # threshold line text(x = 1:length(cooks_dist) + 1, y = cooks_dist, labels = ifelse(cooks_dist > 4 * mean(cooks_dist, na.rm=T), names(cooks_dist), ""), col = "red") # outlier labels
Lets remove outliers across all species in urbantrees
. We can create a function uncooker()
that does so for each species using a for()
loop, based on their respective best-fit model in results_all$ss_models
:
uncooker <- function(data, modellist){ result <- data[0,] for(i in 1:length(modellist)){ # loop over all species cooks_dist <- cooks.distance(modellist[[i]]) outliers <- as.numeric(which(cooks_dist > 4 * mean(cooks_dist, na.rm=T))) # threshold subset <- data[(data$species == names(modellist[i])),][-outliers,] result <- rbind.data.frame(result, subset) } return(result) } # run the function uncooker() urbantrees_clean <- uncooker(urbantrees, results_all$ss_models) nrow(urbantrees_clean)
r nrow(urbantrees) - nrow(urbantrees_clean)
data points were removed to obtain the filtered dataset urbantrees_clean
(n = r nrow(urbantrees_clean)
).
rm(cooks_dist, uncooker)
We can now fit urbantrees_clean
to the best-fit equations we defined in Section 1. Here's an example for the species Albizia saman again, using the function ss_modelfit()
. This function has an additional argument modelcode
. In the case of Albizia saman, it is lin_w1
(see results_all$ss_models_info
). Note that you can also pick any one of the options found in eqns_info$modelcode
.
We'll overwrite our previously-defined variables Alb_sam
and results
, this time with the filtered dataset:
Alb_sam <- urbantrees_clean[urbantrees_clean$species == "Albizia saman", ] results <- ss_modelfit(Alb_sam, modelcode = "lin_w1", # specify modelcode response = "height", predictor = "diameter")
results
is now a list of 2 elements:
fitted_model
.fitted_model_info
.Similar to the the wrapper function ss_modelselect_multi()
, we can also run ss_modelfit_multi()
to fit pre-selected models across multiple species. In this function, we need to input a reference table ref_table
(i.e. results_all$ss_models_info
) that provides information on the species
and their corresponding modelcode
.
results_all <- ss_modelfit_multi(urbantrees_clean, ref_table = results_all$ss_models_info, species = "species", # colname in both data & ref_table modelcode = "modelcode", # colname in ref_table response = "height", predictor = "diameter")
results_all
is now a list of 2 elements:
ss_models
.ss_models_info
.
To make our subsequent code less verbose, let's assign the elements ss_models
and ss_models_info
to variables with the same name:
ss_models <- results_all$ss_models ss_models_info <- results_all$ss_models_info
rm(results_all, results, Alb_sam)
We can simulate some data for a species of interest, and use it's model to make predictions. We'll use the species Albizia saman again as an example. Let's first generate the data based on the range of values for the predictor variable (i.e. diameter size) that was used to fit the model, using the function generate_x()
:
Alb_sam <- urbantrees[urbantrees$species == 'Albizia saman', ] # generate data for subsequent predictions newdata <- generate_x(Alb_sam, response = "height", predictor = "diameter")
ss_predict()
can be used to make predictions on the simulated data predict_range_full
. This appends columns for predicted height (fit
), as well as the lower (lwr
) and upper (upr
) bounds of the prediction interval:
predictions <- ss_predict(newdata, models = ss_models, ref_table = ss_models_info, predictor = "predictor") head(predictions)
Note that you can simulate data automatically using the ss_simulate()
function, which is also a wrapper to ss_predict()
. This can be done for one or multiple species, using the argument select_sp
. The simulated data can be extrapolated beyond the range used to fit the model, using the argument extrapolate
. Here's an example using the models that we previously derived from the filtered dataset urbantrees_clean
. Let's say we're interested in Albizia saman, Terminalia mantaly, and Xanthostemon chrysanthus, and would like to predict their height across a diameter size of 0 – 1 m. We'll overwrite our previously-defined variable predictions
:
predictions <- ss_simulate(ref_table = ss_models_info, models = ss_models, select_sp = c("Albizia saman", "Terminalia mantaly", "Xanthostemon chrysanthus"), extrapolate = c(0,1)) head(predictions)
We can plot out our predictions using ggplot()
:
library(ggplot2) ggplot() + facet_wrap(~ species)+ geom_point(data = dplyr::filter(urbantrees, species %in% predictions$species), aes(x = diameter, y = height), size=0.35, alpha = 0.9, color = "grey50") + # prediction interval geom_ribbon(data = predictions, aes(x = predictor, ymin = lwr, ymax = upr), alpha = 0.10) + # regression line geom_line(data = predictions, aes(x = predictor, y = fit, lty = extrapolated), size = 0.8, alpha = 0.6) + scale_linetype_manual(values=c("No"= 1 , "Low" = 3, "High" = 3), name = "Extrapolated") + xlab("Diameter (m)") + ylab("Height (m)") + coord_cartesian(ylim=c(0, 25), xlim= c(0,1))
Interpretations of the allometric relationships will vary depending on what type of parameter is analysed (e.g. tree height, trunk diameter, age, crown height, crown diameter and leaf area). In our example, we'll have to be cautious in interpretating extrapolated predictions. As you can see in the figure, extrapolations down to a diameter size of zero for Albizia saman and Terminalia mantaly would not make sense (i.e. the tree cannot have a positive height at that diameter size). Also, while an exponential relationship shows rapid scaling in height for Terminalia mantaly, it would have a natural height limit (i.e. would not continue growing taller at such a steep slope). Hence, it is important that allometric relationships are interpreted in conjunction with information on the biology and growth (e.g. environmental and management) conditions associated with the trees.
McPherson E. G., van Doorn N. S. & Peper P. J. (2016) Urban Tree Database and Allometric Equations. General Technical Report PSW-GTR-253, USDA Forest Service, 86.
Song, X. P., Lai, H. R., Wijedasa, L. S., Tan, P. Y., Edwards, P. J., & Richards, D. R. (2020), Height–diameter allometry for the management of city trees in the tropics. Environmental Research Letters, 15, 114017. https://doi.org/10.1088/1748-9326/abbbad
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.