EcoCountHelper: EcoCountHelper: An Ecological Count Data Assistant

EcoCountHelperR Documentation

EcoCountHelper: An Ecological Count Data Assistant

Description

The EcoCountHelper package was initially conceived as a tool to facilitate the analysis and interpretation of ecological count data (e.g., avian point counts, bat acoustic monitoring projects), though it could conceivably be used for any type of count- and GLMM-based analysis. Using a handful of functions, this package uses raw data and glmmTMB objects to help the user:

  1. choose the best error distribution to use for multiple group-level (i.e., species, order) models by corroborating AIC values and mean-variance plots,

  2. assess goodness-of-fit for GLMMs,

  3. visualize model results, and

  4. assess the unscaled "real world" effect of model parameters.

The functions outlined in the pipeline below aside from RealEffectText are designed to process models for multiple response groups in a single function call. For details regarding the implementation of EcoCountHelper for individual models/groups, please refer to the "Single Group Pipeline Notes" section below.

Model Preparation

Before using this package, models must be generated for each group of interest using glmmTMB. All models for a group should contain the same data, but can include different error distribution families (i.e., negative-binomial 1 & 2, Poisson) and different zero-inflated formulas if appropriate. It is also important to note that model names should describe the group each model is associated with as well as the error structure. EcoCountHelper uses regular expressions to determine group membership for each model, therefore group names should be consistently placed within model names for reliable group membership identification. The safest way to accomplish this is to begin each model name with the group name followed by an underscore. If this model name scheme is followed, EcoCountHelper functions will be able to identify the group a model is associated with using its default arguments.

Assuming that all observations for all groups of interest are stored in the same table, group-level data could be partitioned using either a long-form or wide-form data structure. In either case, users often prefer to loop through data using for-loops or custom functions to maintain readable, compact code. While models generated by loops or the apply family of functions may be identical whether the data is in long- or wide-form with respect to groups of interest, the underlying data calls within the model object differ. To account for this, two function groups in the EcoCountHelper pipeline have both a long- and wide-form function associated with them (denoted by a "Long" or "Wide" function suffix). For those two function groups, please be sure to use the function appropriate for your data structure and model construction process.

Best-Fit Model Determination

The first step in the EcoCountHelper pipeline is to determine which model best approximates the error structure for each group. This can be done using the ModelCompare and the DistFit functions. ModelCompare obtains AIC values for each model for a group and returns a table of AIC values and model names for each group along with a table containing the names of the top model for each group as determined by AIC values. The DistFit functions (DistFitLong & DistFitWide) generate mean-variance plots with lines for error distribution families commonly used for count data that allow users to visually assess the best error structure family for each group as is done here. The output from these first two steps can be corroborated to determine which model is the best model for each group.

Goodness-of-Fit Diagnostics

To ensure adequate model fit has been achieved and no major assumption violations have occurred, the ResidPlot functions (ResidPlotLong & ResidPlotWide) allow users to assess residual uniformity, dispersion, and outliers via diagnostic plots and statistics. These functions are essentially wrappers for createDHARMa and testResiduals. For more information regarding diagnostic plot interpretation, please see this document. Note that large sample sizes often lead to significant values for diagnostic statistics, and examination of diagnostic plots is often more informative than simply checking p-values for diagnostic tests returned by these functions. Additionally, we advise that users also check for predictor collinearity using check_collinearity to ensure that no predictors of interest are excessively collinear.

Examining Relative Effect Sizes

If model fit appears adequate based on diagnostic plots, users can visualize the relative effect sizes of predictors with EffectsPlotter. This function plots conditional model estimates for each predictor in a model. If models are appropriately scaled, these estimates can be directly compared across predictors to assess the relative importance of predictors within their respective ranges of observed values.

Interpretting Scaled Estimates

While comparison of scaled estimates is usful for determining the relative importance of predictors within the observed range of values, one cannot ascertain the ecological significance of predictors this way. Users can assess the "real world effect" of a given unscaled change in a predictor with RealEffectText which returns a sentence describing the a group's predicted response to a specified change in a predictor, or RealEffectTabLong/RealEffectTabWide which returns a tabulated version of RealEffectText's output.

Single Group Pipeline Notes

While the pipeline outlined above can be used for individual response groups rather than muliple response groups, there are some necessary naming conventions that must be used for individual response group examination.

  • If the original data underlying a group's models only contains data for a single group, it is likely that there is no vector describing group membership for observations. In this case, two approaches can be used to ensure successful function use.

    1. A vector specifying the group name can be added to the data (e.g., a vector named "Species" populated with the species name) and long-form functions can be used, or

    2. the vector containing count information can be named with the group's name (e.g. the vector containing count data in a table regarding solely Myotis lucifugus could be named "Mylu") and wide-form functions can be used.

  • Additionally, model names must denote the group they are associated with, and the group name/abbreviation must be the same for model names and data indicating group membership as outlined above (e.g. if a count-data vector is named "Mylu" or a group membership vector is populated with "Mylu", all associated model names must contain "Mylu").

Other Functions

  • scale2

  • DumbGrid

  • theme_nocturnal

Suggested Reading

Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C. E. D., Robinson, B. S., Hodgson, D. J., & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6. https://doi.org/10.7717/peerj.4794


huntercole25/EcoCountHelper documentation built on Jan. 14, 2023, 4:13 a.m.