EcoCountHelper | R Documentation |
The EcoCountHelper package was initially conceived as a tool to facilitate the analysis
and interpretation of ecological count data (e.g., avian point counts,
bat acoustic monitoring projects), though it could conceivably be used for any type of
count- and GLMM-based analysis. Using a handful of functions, this package uses raw data and
glmmTMB
objects to help the user:
choose the best error distribution to use for multiple group-level (i.e., species, order) models by corroborating AIC values and mean-variance plots,
assess goodness-of-fit for GLMMs,
visualize model results, and
assess the unscaled "real world" effect of model parameters.
The functions outlined in the pipeline below aside from RealEffectText
are designed
to process models for multiple response groups in a single function call. For details
regarding the implementation of EcoCountHelper for individual models/groups, please refer
to the "Single Group Pipeline Notes" section below.
Before using this package, models must be generated for each group of interest using
glmmTMB
. All models for a group should contain the same data,
but can include different error distribution families (i.e., negative-binomial 1 & 2,
Poisson) and different zero-inflated formulas if appropriate. It is also important
to note that model names should describe the group each model is associated with as well as
the error structure. EcoCountHelper uses regular expressions to determine group membership
for each model, therefore group names should be consistently placed within model
names for reliable group membership identification. The safest way to accomplish this is
to begin each model name with the group name followed by an underscore. If this model name
scheme is followed, EcoCountHelper functions will be able to identify the group a model is associated
with using its default arguments.
Assuming that all observations for all groups of interest are stored in the same table, group-level data could be partitioned using either a long-form or wide-form data structure. In either case, users often prefer to loop through data using for-loops or custom functions to maintain readable, compact code. While models generated by loops or the apply family of functions may be identical whether the data is in long- or wide-form with respect to groups of interest, the underlying data calls within the model object differ. To account for this, two function groups in the EcoCountHelper pipeline have both a long- and wide-form function associated with them (denoted by a "Long" or "Wide" function suffix). For those two function groups, please be sure to use the function appropriate for your data structure and model construction process.
The first step in the EcoCountHelper pipeline is to determine which model best approximates
the error structure for each group. This can be done using the ModelCompare
and the DistFit
functions. ModelCompare
obtains AIC values for each model
for a group and returns a table of AIC values and model names for each group along with
a table containing the names of the top model for each group as determined by AIC values.
The DistFit
functions (DistFitLong
& DistFitWide
)
generate mean-variance plots with lines for error distribution families commonly used for
count data that allow users to visually assess the best error structure
family for each group as is done here.
The output from these first two steps can be corroborated to determine which model is the
best model for each group.
To ensure adequate model fit has been achieved and no major assumption violations have
occurred, the ResidPlot
functions (ResidPlotLong
& ResidPlotWide
)
allow users to assess residual uniformity, dispersion, and outliers via diagnostic plots
and statistics. These functions are essentially wrappers for
createDHARMa
and testResiduals
. For more
information regarding diagnostic plot interpretation, please see
this document.
Note that large sample sizes often lead to significant values for diagnostic statistics,
and examination of diagnostic plots is often more informative than simply checking p-values
for diagnostic tests returned by these functions. Additionally, we advise that users also check
for predictor collinearity using check_collinearity
to ensure that
no predictors of interest are excessively collinear.
If model fit appears adequate based on diagnostic plots, users can visualize the relative
effect sizes of predictors with EffectsPlotter
. This function plots conditional
model estimates for each predictor in a model. If models are appropriately scaled, these
estimates can be directly compared across predictors to assess the relative importance of
predictors within their respective ranges of observed values.
While comparison of scaled estimates is usful for determining the relative importance of
predictors within the observed range of values, one cannot ascertain the ecological
significance of predictors this way. Users can assess the "real world effect" of a given
unscaled change in a predictor with RealEffectText
which returns a sentence
describing the a group's predicted response to a specified change in a predictor, or
RealEffectTabLong
/RealEffectTabWide
which returns a tabulated version of
RealEffectText
's output.
While the pipeline outlined above can be used for individual response groups rather than muliple response groups, there are some necessary naming conventions that must be used for individual response group examination.
If the original data underlying a group's models only contains data for a single group, it is likely that there is no vector describing group membership for observations. In this case, two approaches can be used to ensure successful function use.
A vector specifying the group name can be added to the data (e.g., a vector named "Species" populated with the species name) and long-form functions can be used, or
the vector containing count information can be named with the group's name (e.g. the vector containing count data in a table regarding solely Myotis lucifugus could be named "Mylu") and wide-form functions can be used.
Additionally, model names must denote the group they are associated with, and the group name/abbreviation must be the same for model names and data indicating group membership as outlined above (e.g. if a count-data vector is named "Mylu" or a group membership vector is populated with "Mylu", all associated model names must contain "Mylu").
scale2
DumbGrid
theme_nocturnal
Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C. E. D., Robinson, B. S., Hodgson, D. J., & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6. https://doi.org/10.7717/peerj.4794
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.