R/EcoCountHelper.R

#' EcoCountHelper: An Ecological Count Data Assistant
#' 
#' The EcoCountHelper package was initially conceived as a tool to facilitate the analysis 
#' and interpretation of ecological count data (e.g., avian point counts, 
#' bat acoustic monitoring projects), though it could conceivably be used for any type of 
#' count- and GLMM-based analysis. Using a handful of functions, this package uses raw data and 
#' \code{\link[glmmTMB]{glmmTMB}} objects to help the user: 
#' \enumerate{
#' \item choose the best error distribution 
#' to use for multiple group-level (i.e., species, order) models by corroborating AIC values 
#' and mean-variance plots, 
#' \item assess goodness-of-fit for GLMMs, 
#' \item visualize model results, and 
#' \item assess the unscaled "real world" effect of model parameters.  
#' }
#' The functions outlined in the pipeline below aside from \code{RealEffectText} are designed 
#' to process models for multiple response groups in a single function call. For details 
#' regarding the implementation of EcoCountHelper for individual models/groups, please refer 
#' to the "Single Group Pipeline Notes" section below. 
#' 
#' @section Model Preparation:
#' Before using this package, models must be generated for each group of interest using 
#' \code{\link[glmmTMB]{glmmTMB}}. All models for a group should contain the same data, 
#' but can include different error distribution families (i.e., negative-binomial 1 & 2, 
#' Poisson) and different zero-inflated formulas if appropriate. It is also important 
#' to note that model names should describe the group each model is associated with as well as 
#' the error structure. EcoCountHelper uses regular expressions to determine group membership 
#' for each model, therefore group names should be consistently placed within model 
#' names for reliable group membership identification. The safest way to accomplish this is 
#' to begin each model name with the group name followed by an underscore. If this model name 
#' scheme is followed, EcoCountHelper functions will be able to identify the group a model is associated 
#' with using its default arguments. \cr
#' 
#' Assuming that all observations 
#' for all groups of interest are stored in the same table, group-level data could be partitioned 
#' using either a 
#' \href{https://www.theanalysisfactor.com/wide-and-long-data/}{long-form or wide-form} 
#' data structure. In either case, users often prefer to loop through data using for-loops 
#' or custom functions to maintain readable, compact code. While models generated by loops 
#' or the apply family of functions may be identical whether the data is in long- or wide-form 
#' with respect to groups of interest, the underlying data calls within the model object differ. 
#' To account for this, two function groups in the EcoCountHelper pipeline have both a long- 
#' and wide-form function associated with them (denoted by a "Long" or "Wide" function 
#' suffix). For those two function groups, please be sure to use the function appropriate 
#' for your data structure and model construction process.
#' 
#' @section Best-Fit Model Determination:
#' The first step in the EcoCountHelper pipeline is to determine which model best approximates 
#' the error structure for each group. This can be done using the \code{\link{ModelCompare}} 
#' and the \code{DistFit} functions. \code{\link{ModelCompare}} obtains \link[stats]{AIC} values for each model 
#' for a group and returns a table of AIC values and model names for each group along with 
#' a table containing the names of the top model for each group as determined by AIC values. 
#' The \code{DistFit} functions (\code{\link{DistFitLong}} & \code{\link{DistFitWide}}) 
#' generate mean-variance plots with lines for error distribution families commonly used for 
#' count data that allow users to visually assess the best error structure 
#' family for each group as is done \href{https://groups.nceas.ucsb.edu/non-linear-modeling/projects/owls/WRITEUP/owls.pdf/@@@download}{here}. 
#' The output from these first two steps can be corroborated to determine which model is the 
#' best model for each group. 
#' 
#' @section Goodness-of-Fit Diagnostics:
#' To ensure adequate model fit has been achieved and no major assumption violations have 
#' occurred, the \code{ResidPlot} functions (\code{\link{ResidPlotLong}} & \code{\link{ResidPlotWide}}) 
#' allow users to assess residual uniformity, dispersion, and outliers via diagnostic plots 
#' and statistics. These functions are essentially wrappers for 
#' \code{\link[DHARMa]{createDHARMa}} and \code{\link[DHARMa]{testResiduals}}. For more 
#' information regarding diagnostic plot interpretation, please see 
#' \href{https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html}{this document}. 
#' Note that large sample sizes often lead to significant values for diagnostic statistics, 
#' and examination of diagnostic plots is often more informative than simply checking p-values 
#' for diagnostic tests returned by these functions. Additionally, we advise that users also check 
#' for predictor collinearity using \code{\link[performance]{check_collinearity}} to ensure that 
#' no predictors of interest are excessively collinear. 
#' 
#' @section Examining Relative Effect Sizes:
#' If model fit appears adequate based on diagnostic plots, users can visualize the relative 
#' effect sizes of predictors with \code{\link{EffectsPlotter}}. This function plots conditional 
#' model estimates for each predictor in a model. If models are appropriately scaled, these 
#' estimates can be directly compared across predictors to assess the relative importance of 
#' predictors within their respective ranges of observed values.
#' 
#' @section Interpretting Scaled Estimates:
#' While comparison of scaled estimates is usful for determining the relative importance of 
#' predictors within the observed range of values, one cannot ascertain the ecological 
#' significance of predictors this way. Users can assess the "real world effect" of a given 
#' unscaled change in a predictor with \code{\link{RealEffectText}} which returns a sentence 
#' describing the a group's predicted response to a specified change in a predictor, or 
#' \code{\link{RealEffectTabLong}}/\code{\link{RealEffectTabWide}} which returns a tabulated version of
#' \code{\link{RealEffectText}}'s output.
#' 
#' @section Single Group Pipeline Notes:
#' While the pipeline outlined above can be used for individual response groups rather 
#' than muliple response groups, there are some necessary naming conventions that must 
#' be used for individual response group examination.
#' \itemize{
#' \item If the original data underlying a group's models only contains data for a single 
#' group, it is likely that there is no vector describing group membership for observations. 
#' In this case, two approaches can be used to ensure successful function use.
#' \enumerate{
#' \item A vector specifying the group name can be added to the data (e.g., a vector named 
#' "Species" populated with the species name) and long-form functions 
#' can be used, or
#' \item the vector containing count information can be named with the group's name (e.g. 
#' the vector containing count data in a table regarding solely \emph{Myotis lucifugus} 
#' could be named "Mylu") and wide-form functions can be used.
#' }
#' \item Additionally, model names must denote the group they are associated with, and the group 
#' name/abbreviation must be the same for model names and data indicating group membership as 
#' outlined above (e.g. if a count-data vector is named "Mylu" or a group membership vector is 
#' populated with "Mylu", all associated model names must contain "Mylu").
#' }
#' 
#' @section Other Functions:
#' \itemize{
#' \item\code{\link{scale2}} \cr
#' \item\code{\link{DumbGrid}} \cr
#' \item\code{\link{theme_nocturnal}}
#' }
#' 
#' @section Suggested Reading:
#' Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., 
#' Goodwin, C. E. D., Robinson, B. S., Hodgson, D. J., & Inger, R. (2018). A brief 
#' introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 
#' 6. \url{https://doi.org/10.7717/peerj.4794}

#' 
#' @docType package
#' @name EcoCountHelper
NULL
huntercole25/EcoCountHelper documentation built on Jan. 14, 2023, 4:13 a.m.