R/ggpredict.R

Defines functions ggpredict_helper ggpredict

Documented in ggpredict

#' @title Marginal effects, adjusted predictions and estimated marginal means from regression models
#' @name ggpredict
#'
#' @description
#'   The \pkg{ggeffects} package computes estimated marginal means (predicted values) for the
#'   response, at the margin of specific values or levels from certain model terms,
#'   i.e. it generates predictions by a model by holding the non-focal variables
#'   constant and varying the focal variable(s). \cr \cr
#'   \code{ggpredict()} uses \code{predict()} for generating predictions,
#'   while \code{ggeffect()} computes marginal effects by internally calling
#'   \code{effects::Effect()} and \code{ggemmeans()} uses \code{emmeans::emmeans()}.
#'   The result is returned as consistent data frame.
#'
#' @param model A fitted model object, or a list of model objects. Any model
#'   that supports common methods like \code{predict()}, \code{family()}
#'   or \code{model.frame()} should work. For \code{ggeffect()}, any model
#'   that is supported by \CRANpkg{effects} should work, and for
#'   \code{ggemmeans()}, all models supported by \CRANpkg{emmeans} should work.
#' @param terms Character vector (or a formula) with the names of those terms
#'   from \code{model}, for which predictions should be displayed. At least
#'   one term is required to calculate effects for certain terms, maximum length is
#'   four terms, where the second to fourth term indicate the groups, i.e.
#'   predictions of first term are grouped at the values or levels of the remaining
#'   terms. If \code{terms} is missing or \code{NULL}, adjusted predictions for each
#'   model term are calculated. It is also possible to define specific values for
#'   terms, at which adjusted predictions should be calculated (see 'Details').
#'   All remaining covariates that are not specified in \code{terms} are held
#'   constant (see 'Details'). See also arguments \code{condition} and \code{typical}.
#' @param ci.lvl Numeric, the level of the confidence intervals. For \code{ggpredict()},
#'   use \code{ci.lvl = NA}, if confidence intervals should not be calculated
#'   (for instance, due to computation time). Typically, confidence intervals
#'   based on the standard errors as returned by the \code{predict()} function
#'   are returned, assuming normal distribution (i.e. \code{+/- 1.96 * SE}).
#'   See introduction of \href{https://strengejacke.github.io/ggeffects/articles/ggeffects.html}{this vignette}
#'   for more details.
#' @param type Character, only applies for survival models, mixed effects models
#'   and/or models with zero-inflation. \strong{Note:} For \code{brmsfit}-models
#'   with zero-inflation component, there is no \code{type = "zero_inflated"} nor
#'   \code{type = "zi_random"}; predicted values for \code{MixMod}-models from
#'   \pkg{GLMMadaptive} with zero-inflation component \emph{always} condition on
#'   the zero-inflation part of the model (see 'Details').
#'   \describe{
#'     \item{\code{"fixed"} (or \code{"fe"} or \code{"count"})}{
#'     Predicted values are conditioned on the fixed effects or conditional
#'     model only (for mixed models: predicted values are on the population-level
#'     and \emph{confidence intervals} are returned). For instance, for models
#'     fitted with \code{zeroinfl} from \pkg{pscl}, this would return the
#'     predicted mean from the count component (without zero-inflation).
#'     For models with zero-inflation component, this type calls
#'     \code{predict(..., type = "link")} (however, predicted values are
#'     back-transformed to the response scale).
#'     }
#'     \item{\code{"random"} (or \code{"re"})}{
#'     This only applies to mixed models, and \code{type = "random"} does not
#'     condition on the zero-inflation component of the model. \code{type = "random"}
#'     still returns population-level predictions, however, unlike \code{type = "fixed"},
#'     intervals also consider the uncertainty in the variance parameters (the
#'     mean random effect variance, see \cite{Johnson et al. 2014} for details)
#'     and hence can be considered as \emph{prediction intervals}. For models
#'     with zero-inflation component, this type calls
#'     \code{predict(..., type = "link")} (however, predicted values are
#'     back-transformed to the response scale).
#'     \cr \cr
#'     To get predicted values for each level of the random effects groups, add the
#'     name of the related random effect term to the \code{terms}-argument
#'     (for more details, see \href{https://strengejacke.github.io/ggeffects/articles/introduction_effectsatvalues.html}{this vignette}).
#'     }
#'     \item{\code{"zero_inflated"} (or \code{"fe.zi"} or \code{"zi"})}{
#'     Predicted values are conditioned on the fixed effects and the zero-inflation
#'     component. For instance, for models fitted with \code{zeroinfl}
#'     from \pkg{pscl}, this would return the predicted response (\code{mu*(1-p)})
#'     and for \pkg{glmmTMB}, this would return the expected value \code{mu*(1-p)}
#'     \emph{without} conditioning on random effects (i.e. random effect variances
#'     are not taken into account for the confidence intervals). For models with
#'     zero-inflation component, this type calls \code{predict(..., type = "response")}.
#'     See 'Details'.
#'     }
#'     \item{\code{"zi_random"} (or \code{"re.zi"} or \code{"zero_inflated_random"})}{
#'     Predicted values are conditioned on the zero-inflation component and
#'     take the random effects uncertainty into account. For models fitted with
#'     \code{glmmTMB()}, \code{hurdle()} or \code{zeroinfl()}, this would return the
#'     expected value \code{mu*(1-p)}. For \pkg{glmmTMB}, prediction intervals
#'     also consider the uncertainty in the random effects variances. This
#'     type calls \code{predict(..., type = "response")}. See 'Details'.
#'     }
#'     \item{\code{"zi_prob"} (or \code{"zi.prob"})}{
#'     Predicted zero-inflation probability. For \pkg{glmmTMB} models with
#'     zero-inflation component, this type calls \code{predict(..., type = "zlink")};
#'     models from \pkg{pscl} call \code{predict(..., type = "zero")} and for
#'     \pkg{GLMMadaptive}, \code{predict(..., type = "zero_part")} is called.
#'     }
#'     \item{\code{"simulate"} (or \code{"sim"})}{
#'     Predicted values and confidence resp. prediction intervals are
#'     based on simulations, i.e. calls to \code{simulate()}. This type
#'     of prediction takes all model uncertainty into account, including
#'     random effects variances. Currently supported models are objects of
#'     class \code{lm}, \code{glm}, \code{glmmTMB}, \code{wbm}, \code{MixMod}
#'     and \code{merMod}. See \code{...} for details on number of simulations.
#'     }
#'     \item{\code{"survival"} and \code{"cumulative_hazard"} (or \code{"surv"} and \code{"cumhaz"})}{
#'     Applies only to \code{coxph}-objects from the \pkg{survial}-package and
#'     calculates the survival probability or the cumulative hazard of an event.
#'     }
#'   }
#' @param typical Character vector, naming the function to be applied to the
#'   covariates over which the effect is "averaged". The default is "mean".
#'   See \code{?sjmisc::typical_value} for options.
#' @param back.transform Logical, if \code{TRUE} (the default), predicted values
#'   for log- or log-log transformed responses will be back-transformed to
#'   original response-scale.
#' @param ppd Logical, if \code{TRUE}, predictions for Stan-models are
#'   based on the posterior predictive distribution
#'   (\code{rstantools::posterior_predict()}). If \code{FALSE} (the
#'   default), predictions are based on posterior draws of the linear
#'   predictor (\code{rstantools::posterior_linpred()}).
#' @param condition Named character vector, which indicates covariates that
#'   should be held constant at specific values. Unlike \code{typical}, which
#'   applies a function to the covariates to determine the value that is used
#'   to hold these covariates constant, \code{condition} can be used to define
#'   exact values, for instance \code{condition = c(covariate1 = 20, covariate2 = 5)}.
#'   See 'Examples'.
#' @param interval Type of interval calculation, can either be \code{"confidence"}
#'   (default) or \code{"prediction"}. May be abbreviated. Unlike
#'   \emph{confidence intervals}, \emph{prediction intervals} include the
#'   residual variance (sigma^2). This argument is ignored for mixed models,
#'   as \code{interval = "prediction"} is equivalent to \code{type = "random"}
#'   (and \code{interval = "confidence"} is equivalent to \code{type = "fixed"}).
#'   Note that prediction intervals are not available for all models, but only
#'   for models that work with \code{insight::get_sigma()}.
#' @param vcov.fun String, indicating the name of the \code{vcov*()}-function
#'    from the \pkg{sandwich} or \pkg{clubSandwich}-package, e.g.
#'    \code{vcov.fun = "vcovCL"}, which is used to compute (cluster) robust
#'    standard errors for predictions. If \code{NULL}, standard errors (and
#'    confidence intervals) for predictions are based on the standard errors as
#'    returned by the \code{predict()}-function. \strong{Note} that probably not
#'    all model objects that work with \code{ggpredict()} are also supported
#'    by the \pkg{sandwich} or \pkg{clubSandwich}-package.
#' @param vcov.type Character vector, specifying the estimation type for the
#'    robust covariance matrix estimation (see \code{?sandwich::vcovHC}
#'    or \code{?clubSandwich::vcovCR} for details).
#' @param vcov.args List of named vectors, used as additional arguments that
#'    are passed down to \code{vcov.fun}.
#' @param ... For \code{ggpredict()}, further arguments passed down to
#'    \code{predict()}; for \code{ggeffect()}, further arguments passed
#'    down to \code{effects::Effect()}; and for \code{ggemmeans()},
#'    further arguments passed down to \code{emmeans::emmeans()}.
#'    If \code{type = "sim"}, \code{...} may also be used to set the number of
#'    simulation, e.g. \code{nsim = 500}.
#'
#' @details
#'   \subsection{Supported Models}{
#'   A list of supported models can be found at \url{https://github.com/strengejacke/ggeffects}.
#'   Support for models varies by function, i.e. although \code{ggpredict()},
#'   \code{ggemmeans()} and \code{ggeffect()} support most models, some models
#'   are only supported exclusively by one of the three functions.
#'   }
#'   \subsection{Difference between \code{ggpredict()} and \code{ggeffect()} or \code{ggemmeans()}}{
#'   \code{ggpredict()} calls \code{predict()}, while \code{ggeffect()}
#'   calls \code{effects::Effect()} and \code{ggemmeans()} calls
#'   \code{emmeans::emmeans()} to compute predicted values. Thus, effects returned
#'   by \code{ggpredict()} can be described as \emph{conditional effects} (i.e.
#'   these are conditioned on certain (reference) levels of factors), while
#'   \code{ggemmeans()} and \code{ggeffect()} return \emph{marginal means}, since
#'   the effects are "marginalized" (or "averaged") over the levels of factors.
#'   Therefore, \code{ggpredict()} and \code{ggeffect()} resp. \code{ggemmeans()}
#'   differ in how factors are held constant: \code{ggpredict()} uses the
#'   reference level, while \code{ggeffect()} and \code{ggemmeans()} compute a
#'   kind of "average" value, which represents the proportions of each factor's
#'   category. Use \code{condition} to set a specific level for factors in
#'   \code{ggemmeans()}, so factors are not averaged over their categories,
#'   but held constant at a given level.
#'   }
#'   \subsection{Marginal Effects and Adjusted Predictions at Specific Values}{
#'   Specific values of model terms can be specified via the \code{terms}-argument.
#'   Indicating levels in square brackets allows for selecting only
#'   specific groups or values resp. value ranges. Term name and the start of
#'   the levels in brackets must be separated by a whitespace character, e.g.
#'   \code{terms = c("age", "education [1,3]")}. Numeric ranges, separated
#'   with colon, are also allowed: \code{terms = c("education", "age [30:60]")}.
#'   The stepsize for range can be adjusted using `by`, e.g.
#'   \code{terms = "age [30:60 by=5]"}.
#'   \cr \cr
#'   The \code{terms}-argument also supports the same shortcuts as the
#'   \code{values}-argument in \code{values_at()}. So
#'   \code{terms = "age [meansd]"} would return predictions for the values
#'   one standard deviation below the mean age, the mean age and
#'   one SD above the mean age. \code{terms = "age [quart2]"} would calculate
#'   predictions at the value of the lower, median and upper quartile of age.
#'   \cr \cr
#'   Furthermore, it is possible to specify a function name. Values for
#'   predictions will then be transformed, e.g. \code{terms = "income [exp]"}.
#'   This is useful when model predictors were transformed for fitting the
#'   model and should be back-transformed to the original scale for predictions.
#'   It is also possible to define own functions (see
#'   \href{https://strengejacke.github.io/ggeffects/articles/introduction_effectsatvalues.html}{this vignette}).
#'   \cr \cr
#'   Instead of a function, it is also possible to define the name of a variable
#'   with specific values, e.g. to define a vector \code{v = c(1000, 2000, 3000)} and
#'   then use \code{terms = "income [v]"}.
#'   \cr \cr
#'   You can take a random sample of any size with \code{sample=n}, e.g
#'   \code{terms = "income [sample=8]"}, which will sample eight values from
#'   all possible values of the variable \code{income}. This option is especially
#'   useful for plotting predictions at certain levels of random effects
#'   group levels, where the group factor has many levels that can be completely
#'   plotted. For more details, see \href{https://strengejacke.github.io/ggeffects/articles/introduction_effectsatvalues.html}{this vignette}.
#'   \cr \cr
#'   Finally, numeric vectors for which no specific values are given, a
#'   "pretty range" is calculated (see \code{\link{pretty_range}}), to avoid
#'   memory allocation problems for vectors with many unique values. If a numeric
#'   vector is specified as second or third term (i.e. if this vector represents
#'   a grouping structure), representative values (see \code{\link{values_at}})
#'   are chosen (unless other values are specified). If all values for a numeric
#'   vector should be used to compute predictions, you may use e.g.
#'   \code{terms = "age [all]"}. See also package vignettes.
#'   \cr \cr
#'   To create a pretty range that should be smaller or larger than the default
#'   range (i.e. if no specific values would be given), use the \code{n}-tag,
#'   e.g. \code{terms="age [n=5]"} or \code{terms="age [n=12]"}. Larger
#'   values for \code{n} return a larger range of predicted values.
#'   }
#'   \subsection{Holding covariates at constant values}{
#'   For \code{ggpredict()}, \code{expand.grid()} is called on all unique
#'   combinations of \code{model.frame(model)[, terms]} and used as
#'   \code{newdata}-argument for \code{predict()}. In this case,
#'   all remaining covariates that are not specified in \code{terms} are
#'   held constant: Numeric values are set to the mean (unless changed with
#'   the \code{condition} or \code{typical}-argument), factors are set to their
#'   reference level (may also be changed with \code{condition}) and character
#'   vectors to their mode (most common element).
#'   \cr \cr
#'   \code{ggeffect()} and \code{ggemmeans()}, by default, set remaining numeric
#'   covariates to their mean value, while for factors, a kind of "average" value,
#'   which represents the proportions of each factor's category, is used. For
#'   \code{ggemmeans()}, use \code{condition} to set a specific level for
#'   factors so that these are not averaged over their categories, but held
#'   constant at the given level.
#'   }
#'   \subsection{Bayesian Regression Models}{
#'   \code{ggpredict()} also works with \strong{Stan}-models from
#'   the \CRANpkg{rstanarm} or \CRANpkg{brms}-package. The predicted
#'   values are the median value of all drawn posterior samples. The
#'   confidence intervals for Stan-models are Bayesian predictive intervals.
#'   By default (i.e. \code{ppd = FALSE}), the predictions are based on
#'   \code{rstantools::posterior_linpred()} and hence have some
#'   limitations: the uncertainty of the error term is not taken into
#'   account. The recommendation is to use the posterior predictive
#'   distribution (\code{rstantools::posterior_predict()}).
#'   }
#'   \subsection{Zero-Inflated and Zero-Inflated Mixed Models with brms}{
#'   Models of class \code{brmsfit} always condition on the zero-inflation
#'   component, if the model has such a component. Hence, there is no
#'   \code{type = "zero_inflated"} nor \code{type = "zi_random"} for \code{brmsfit}-models,
#'   because predictions are based on draws of the posterior distribution,
#'   which already account for the zero-inflation part of the model.
#'   }
#'   \subsection{Zero-Inflated and Zero-Inflated Mixed Models with glmmTMB}{
#'   If \code{model} is of class \code{glmmTMB}, \code{hurdle}, \code{zeroinfl}
#'   or \code{zerotrunc}, simulations from a multivariate normal distribution
#'   (see \code{?MASS::mvrnorm}) are drawn to calculate \code{mu*(1-p)}.
#'   Confidence intervals are then based on quantiles of these results. For
#'   \code{type = "zi_random"}, prediction intervals also take the uncertainty in
#'   the random-effect paramters into account (see also Brooks et al. 2017,
#'   pp.391-392 for details).
#'   \cr \cr
#'   An alternative for models fitted with \pkg{glmmTMB} that take all model
#'   uncertainties into account are simulations based on \code{simulate()}, which
#'   is used when \code{type = "sim"} (see Brooks et al. 2017, pp.392-393 for
#'   details).
#'   }
#'   \subsection{MixMod-models from GLMMadaptive}{
#'   Predicted values for the fixed effects component (\code{type = "fixed"} or
#'   \code{type = "zero_inflated"}) are based on \code{predict(..., type = "mean_subject")},
#'   while predicted values for random effects components (\code{type = "random"} or
#'   \code{type = "zi_random"}) are calculated with \code{predict(..., type = "subject_specific")}
#'   (see \code{?GLMMadaptive::predict.MixMod} for details). The latter option
#'   requires the response variable to be defined in the \code{newdata}-argument
#'   of \code{predict()}, which will be set to its typical value (see
#'   \code{?sjmisc::typical_value}).
#'   }
#'
#' @references \itemize{
#'    \item Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400.
#'    \item Johnson PC, O'Hara RB. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. (\doi{10.1111/2041-210X.12225})
#'  }
#'
#' @note
#'   \subsection{Multinomial Models}{
#'   \code{polr}-, \code{clm}-models, or more generally speaking, models with
#'   ordinal or multinominal outcomes, have an additional column
#'   \code{response.level}, which indicates with which level of the response
#'   variable the predicted values are associated.
#'   }
#'   \subsection{Printing Results}{
#'   The \code{print()}-method gives a clean output (especially for predictions
#'   by groups), and indicates at which values covariates were held constant.
#'   Furthermore, the \code{print()}-method has the arguments \code{digits} and
#'   \code{n} to control number of decimals and lines to be printed, and an
#'   argument \code{x.lab} to print factor-levels instead of numeric values
#'   if \code{x} is a factor.
#'   }
#'   \subsection{Limitations}{
#'   The support for some models, for example from package \pkg{MCMCglmm}, is
#'   rather experimental and may fail for certain models. If you encounter
#'   any errors, please file an issue at \url{https://github.com/strengejacke/ggeffects/issues}.
#'   }
#'
#' @return A data frame (with \code{ggeffects} class attribute) with consistent
#'   data columns:
#'         \describe{
#'           \item{\code{x}}{the values of the first term in \code{terms}, used as x-position in plots.}
#'           \item{\code{predicted}}{the predicted values of the response, used as y-position in plots.}
#'           \item{\code{std.error}}{the standard error of the predictions. \emph{Note that the standard errors are always on the link-scale, and not back-transformed for non-Gaussian models!}}
#'           \item{\code{conf.low}}{the lower bound of the confidence interval for the predicted values.}
#'           \item{\code{conf.high}}{the upper bound of the confidence interval for the predicted values.}
#'           \item{\code{group}}{the grouping level from the second term in \code{terms}, used as grouping-aesthetics in plots.}
#'           \item{\code{facet}}{the grouping level from the third term in \code{terms}, used to indicate facets in plots.}
#'         }
#'         The estimated marginal means (predicted values) are always on the
#'         response scale! \cr \cr
#'         For proportional odds logistic regression (see \code{?MASS::polr})
#'         resp. cumulative link models (e.g., see \code{?ordinal::clm}),
#'         an additional column \code{response.level} is returned, which indicates
#'         the grouping of predictions based on the level of the model's response.
#'         \cr \cr Note that for convenience reasons, the columns for the intervals
#'         are always named \code{conf.low} and \code{conf.high}, even though
#'         for Bayesian models credible or highest posterior density intervals
#'         are returned.
#'
#' @examples
#' library(sjlabelled)
#' data(efc)
#' fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
#'
#' ggpredict(fit, terms = "c12hour")
#' ggpredict(fit, terms = c("c12hour", "c172code"))
#' ggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))
#'
#' # specified as formula
#' ggpredict(fit, terms = ~ c12hour + c172code + c161sex)
#'
#' # only range of 40 to 60 for variable 'c12hour'
#' ggpredict(fit, terms = "c12hour [40:60]")
#'
#' # using "summary()" shows that covariate "neg_c_7" is held
#' # constant at a value of 11.84 (its mean value). To use a
#' # different value, use "condition"
#' ggpredict(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20))
#'
#' # to plot ggeffects-objects, you can use the 'plot()'-function.
#' # the following examples show how to build your ggplot by hand.
#'
#' \dontrun{
#' # plot predicted values, remaining covariates held constant
#' library(ggplot2)
#' mydf <- ggpredict(fit, terms = "c12hour")
#' ggplot(mydf, aes(x, predicted)) +
#'   geom_line() +
#'   geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)
#'
#' # three variables, so we can use facets and groups
#' mydf <- ggpredict(fit, terms = c("c12hour", "c161sex", "c172code"))
#' ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
#'   stat_smooth(method = "lm", se = FALSE) +
#'   facet_wrap(~facet, ncol = 2)
#'
#' # select specific levels for grouping terms
#' mydf <- ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex"))
#' ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
#'   stat_smooth(method = "lm", se = FALSE) +
#'   facet_wrap(~facet) +
#'   labs(
#'     y = get_y_title(mydf),
#'     x = get_x_title(mydf),
#'     colour = get_legend_title(mydf)
#'   )
#'
#' # level indication also works for factors with non-numeric levels
#' # and in combination with numeric levels for other variables
#' data(efc)
#' efc$c172code <- sjlabelled::as_label(efc$c172code)
#' fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
#' ggpredict(fit, terms = c("c12hour",
#'   "c172code [low level of education, high level of education]",
#'   "c161sex [1]"))
#'
#' # use categorical value on x-axis, use axis-labels, add error bars
#' dat <- ggpredict(fit, terms = c("c172code", "c161sex"))
#' ggplot(dat, aes(x, predicted, colour = group)) +
#'   geom_point(position = position_dodge(.1)) +
#'   geom_errorbar(
#'     aes(ymin = conf.low, ymax = conf.high),
#'     position = position_dodge(.1)
#'   ) +
#'   scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat))
#'
#' # 3-way-interaction with 2 continuous variables
#' data(efc)
#' # make categorical
#' efc$c161sex <- as_factor(efc$c161sex)
#' fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc)
#' # select only levels 30, 50 and 70 from continuous variable Barthel-Index
#' dat <- ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex"))
#' ggplot(dat, aes(x = x, y = predicted, colour = group)) +
#'   stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
#'   facet_wrap(~facet) +
#'   labs(
#'     colour = get_legend_title(dat),
#'     x = get_x_title(dat),
#'     y = get_y_title(dat),
#'     title = get_title(dat)
#'   )
#'
#' # or with ggeffects' plot-method
#' plot(dat, ci = FALSE)}
#'
#' # predictions for polynomial terms
#' data(efc)
#' fit <- glm(
#'   tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3),
#'   data = efc,
#'   family = poisson()
#' )
#' ggeffect(fit, terms = "e17age")
#' @export
ggpredict <- function(model,
                      terms,
                      ci.lvl = .95,
                      type = "fe",
                      typical = "mean",
                      condition = NULL,
                      back.transform = TRUE,
                      ppd = FALSE,
                      vcov.fun = NULL,
                      vcov.type = NULL,
                      vcov.args = NULL,
                      interval = "confidence",
                      ...) {
  # check arguments
  type <- match.arg(type, choices = c("fe", "fixed", "count", "re", "random",
                                      "fe.zi", "zero_inflated", "re.zi", "zi_random",
                                      "zero_inflated_random", "zi.prob", "zi_prob",
                                      "sim", "simulate", "surv", "survival", "cumhaz",
                                      "cumulative_hazard", "sim_re", "simulate_random",
                                      "debug"))
  interval <- match.arg(interval, choices = c("confidence", "prediction"))
  model.name <- deparse(substitute(model))

  type <- switch(
    type,
    "fixed" = ,
    "count" = "fe",
    "random" = "re",
    "zi" = ,
    "zero_inflated" = "fe.zi",
    "zi_random" = ,
    "zero_inflated_random" = "re.zi",
    "zi_prob" = "zi.prob",
    "survival" = "surv",
    "cumulative_hazard" = "cumhaz",
    "simulate" = "sim",
    "simulate_random" = "sim_re",
    type
  )

  # check if terms are a formula
  if (!missing(terms) && !is.null(terms) && inherits(terms, "formula")) {
    terms <- all.vars(terms)
  }

  # tidymodels?
  if (inherits(model, "model_fit")) {
    model <- model$fit
  }

  # for gamm/gamm4 objects, we have a list with two items, mer and gam
  # extract just the mer-part then
  if (is.gamm(model) || is.gamm4(model)) model <- model$gam

  if (inherits(model, "list") && !inherits(model, c("bamlss", "maxLik"))) {
    res <- lapply(model, function(.x) {
      ggpredict_helper(
        model = .x,
        terms = terms,
        ci.lvl = ci.lvl,
        type = type,
        typical = typical,
        ppd = ppd,
        condition = condition,
        back.transform = back.transform,
        vcov.fun = vcov.fun,
        vcov.type = vcov.type,
        vcov.args = vcov.args,
        interval = interval,
        ...
      )
    })
    class(res) <- c("ggalleffects", class(res))
  } else {
    if (missing(terms) || is.null(terms)) {
      predictors <- insight::find_predictors(model, effects = "fixed", component = "conditional", flatten = TRUE)
      res <- lapply(
        predictors,
        function(.x) {
          tmp <- ggpredict_helper(
            model = model,
            terms = .x,
            ci.lvl = ci.lvl,
            type = type,
            typical = typical,
            ppd = ppd,
            condition = condition,
            back.transform = back.transform,
            vcov.fun = vcov.fun,
            vcov.type = vcov.type,
            vcov.args = vcov.args,
            interval = interval,
            ...
          )

          tmp$group <- .x
          tmp
        }
      )
      names(res) <- predictors
      class(res) <- c("ggalleffects", class(res))
    } else {
      res <- ggpredict_helper(
        model = model,
        terms = terms,
        ci.lvl = ci.lvl,
        type = type,
        typical = typical,
        ppd = ppd,
        condition = condition,
        back.transform = back.transform,
        vcov.fun = vcov.fun,
        vcov.type = vcov.type,
        vcov.args = vcov.args,
        interval = interval,
        ...
      )
    }
  }

  if (!is.null(res)) attr(res, "model.name") <- model.name
  res
}


# workhorse that computes the predictions
# and creates the tidy data frames
ggpredict_helper <- function(model,
                             terms,
                             ci.lvl,
                             type,
                             typical,
                             ppd,
                             condition,
                             back.transform,
                             vcov.fun,
                             vcov.type,
                             vcov.args,
                             interval,
                             ...) {

  # check class of fitted model, to make sure we have just one class-attribute
  # (while "inherits()" may return multiple attributes)
  model_class <- get_predict_function(model)

  # check terms argument, to make sure that terms were not misspelled
  # and are indeed existing in the data
  terms <- .check_vars(terms, model)
  cleaned_terms <- .clean_terms(terms)

  # check model family
  model_info <- .get_model_info(model)

  if (model_class == "coxph" && type == "surv") model_info$is_binomial <- TRUE

  # get model frame
  model_frame <- insight::get_data(model)

  # expand model frame to data grid of unique combinations
  data_grid <- .data_grid(
    model = model, model_frame = model_frame, terms = terms, value_adjustment = typical,
    condition = condition
  )

  # save original frame, for labels, and original terms
  original_model_frame <- model_frame
  original_terms <- terms

  # clear argument from brackets
  terms <- cleaned_terms

  # compute predictions here -----
  prediction_data <- select_prediction_method(
    model_class = model_class,
    model = model,
    data_grid = data_grid,
    ci.lvl = ci.lvl,
    type = type,
    model_info = model_info,
    ppd = ppd,
    terms = original_terms,
    value_adjustment = typical,
    vcov.fun = vcov.fun,
    vcov.type = vcov.type,
    vcov.args = vcov.args,
    condition = condition,
    interval = interval,
    ...
  )

  # return if no predicted values have been computed
  if (is.null(prediction_data)) return(NULL)

  # remember if grouping variable was numeric, possibly needed for plotting
  attr(prediction_data, "continuous.group") <- attr(data_grid, "continuous.group")

  # for survival probabilities or cumulative hazards, we need
  # the "time" variable
  if (model_class == "coxph" && type %in% c("surv", "cumhaz")) {
    terms <- c("time", terms)
    cleaned_terms <- c("time", cleaned_terms)
  }

  result <- .post_processing_predictions(
    model = model,
    prediction_data = prediction_data,
    original_model_frame = original_model_frame,
    cleaned_terms = cleaned_terms
  )

  # check if outcome is log-transformed, and if so,
  # back-transform predicted values to response scale
  result <- .back_transform_response(model, result, back.transform)

  # add raw data as well
  attr(result, "rawdata") <- .get_raw_data(model, original_model_frame, terms)

  .post_processing_labels(
    model = model,
    result = result,
    original_model_frame = original_model_frame,
    data_grid = data_grid,
    cleaned_terms = cleaned_terms,
    original_terms = original_terms,
    model_info = model_info,
    type = type,
    prediction.interval = attr(prediction_data, "prediction.interval", exact = TRUE),
    at_list = .data_grid(
      model = model, model_frame = original_model_frame, terms = original_terms, value_adjustment = typical,
      condition = condition, show_pretty_message = FALSE, emmeans.only = TRUE
    ),
    condition = condition,
    ci.lvl = ci.lvl
  )
}
javifar/ggeffects documentation built on Jan. 21, 2022, 12:04 a.m.