ggeffect | R Documentation |
The ggeffects package computes estimated marginal means (predicted values) for the response, at the margin of specific values or levels from certain model terms, i.e. it generates predictions by a model by holding the non-focal variables constant and varying the focal variable(s).
ggpredict()
uses predict()
for generating predictions,
while ggeffect()
computes marginal effects by internally calling
effects::Effect()
and ggemmeans()
uses emmeans::emmeans()
.
The result is returned as consistent data frame.
ggeffect(model, terms, ci.lvl = 0.95, verbose = TRUE, ...) ggemmeans( model, terms, ci.lvl = 0.95, type = "fe", typical = "mean", condition = NULL, back.transform = TRUE, interval = "confidence", verbose = TRUE, ... ) ggpredict( model, terms, ci.lvl = 0.95, type = "fe", typical = "mean", condition = NULL, back.transform = TRUE, ppd = FALSE, vcov.fun = NULL, vcov.type = NULL, vcov.args = NULL, interval = "confidence", verbose = TRUE, ... ) ## S3 method for class 'ggeffects' as.data.frame( x, row.names = NULL, optional = FALSE, ..., stringsAsFactors = FALSE, terms_to_colnames = FALSE )
model |
A fitted model object, or a list of model objects. Any model
that supports common methods like |
terms |
Character vector, (or a named list or a formula) with the names
of those terms from |
ci.lvl |
Numeric, the level of the confidence intervals. For |
verbose |
Toggle messages or warnings. |
... |
For |
type |
Character, only applies for survival models, mixed effects models
and/or models with zero-inflation. Note: For
|
typical |
Character vector, naming the function to be applied to the
covariates over which the effect is "averaged". The default is "mean".
See |
condition |
Named character vector, which indicates covariates that
should be held constant at specific values. Unlike |
back.transform |
Logical, if |
interval |
Type of interval calculation, can either be |
ppd |
Logical, if |
vcov.fun |
String, indicating the name of the |
vcov.type |
Character vector, specifying the estimation type for the
robust covariance matrix estimation (see |
vcov.args |
List of named vectors, used as additional arguments that
are passed down to |
x |
An object of class |
row.names |
|
optional |
logical. If |
stringsAsFactors |
logical: should the character vector be converted to a factor? |
terms_to_colnames |
Logical, if |
Supported Models
A list of supported models can be found at https://github.com/strengejacke/ggeffects.
Support for models varies by function, i.e. although ggpredict()
,
ggemmeans()
and ggeffect()
support most models, some models
are only supported exclusively by one of the three functions.
Difference between ggpredict()
and ggeffect()
or ggemmeans()
ggpredict()
calls predict()
, while ggeffect()
calls effects::Effect()
and ggemmeans()
calls emmeans::emmeans()
to compute predicted values.
Thus, effects returned by ggpredict()
can be described as conditional effects
(i.e. these are conditioned on certain (reference) levels of factors), while
ggemmeans()
and ggeffect()
return marginal means, since
the effects are "marginalized" (or "averaged") over the levels of factors
(or values of character vectors). Therefore, ggpredict()
and ggeffect()
resp. ggemmeans()
differ in how factors and character vectors are held
constant: ggpredict()
uses the reference level (or "lowest" value in case
of character vectors), while ggeffect()
and ggemmeans()
compute a
kind of "average" value, which represents the proportions of each factor's
category. Use condition
to set a specific level for factors in
ggemmeans()
, so factors are not averaged over their categories,
but held constant at a given level.
Marginal Effects and Adjusted Predictions at Specific Values
Specific values of model terms can be specified via the terms
-argument.
Indicating levels in square brackets allows for selecting only
specific groups or values resp. value ranges. Term name and the start of
the levels in brackets must be separated by a whitespace character, e.g.
terms = c("age", "education [1,3]")
. Numeric ranges, separated
with colon, are also allowed: terms = c("education", "age [30:60]")
.
The stepsize for range can be adjusted using by
, e.g.
terms = "age [30:60 by=5]"
.
The terms
-argument also supports the same shortcuts as the
values
-argument in values_at()
. So
terms = "age [meansd]"
would return predictions for the values
one standard deviation below the mean age, the mean age and
one SD above the mean age. terms = "age [quart2]"
would calculate
predictions at the value of the lower, median and upper quartile of age.
Furthermore, it is possible to specify a function name. Values for
predictions will then be transformed, e.g. terms = "income [exp]"
.
This is useful when model predictors were transformed for fitting the
model and should be back-transformed to the original scale for predictions.
It is also possible to define own functions (see
this vignette).
Instead of a function, it is also possible to define the name of a variable
with specific values, e.g. to define a vector v = c(1000, 2000, 3000)
and
then use terms = "income [v]"
.
You can take a random sample of any size with sample=n
, e.g
terms = "income [sample=8]"
, which will sample eight values from
all possible values of the variable income
. This option is especially
useful for plotting predictions at certain levels of random effects
group levels, where the group factor has many levels that can be completely
plotted. For more details, see
this vignette.
Finally, numeric vectors for which no specific values are given, a "pretty range"
is calculated (see pretty_range()
), to avoid memory allocation problems
for vectors with many unique values. If a numeric vector is specified as
second or third term (i.e. if this vector represents a grouping structure),
representative values (see values_at()
) are chosen (unless other values
are specified). If all values for a numeric vector should be used to compute
predictions, you may use e.g. terms = "age [all]"
. See also package vignettes.
To create a pretty range that should be smaller or larger than the default
range (i.e. if no specific values would be given), use the n
-tag, e.g.
terms="age [n=5]"
or terms="age [n=12]"
. Larger values for n
return a
larger range of predicted values.
Holding covariates at constant values
For ggpredict()
, expand.grid()
is called on all unique
combinations of model.frame(model)[, terms]
and used as
newdata
-argument for predict()
. In this case,
all remaining covariates that are not specified in terms
are
held constant: Numeric values are set to the mean (unless changed with
the condition
or typical
-argument), integer values are set to their
median, factors are set to their reference level (may also be changed with
condition
) and character vectors to their mode (most common element).
ggeffect()
and ggemmeans()
, by default, set remaining numeric
covariates to their mean value, while for factors, a kind of "average" value,
which represents the proportions of each factor's category, is used. The
same applies to character vectors: ggemmeans()
averages over the distribution
of unique values in a character vector, similar to how factors are treated.
For ggemmeans()
, use condition
to set a specific level for
factors so that these are not averaged over their categories, but held
constant at the given level.
Bayesian Regression Models
ggpredict()
also works with Stan-models from
the rstanarm or brms-packages. The predicted
values are the median value of all drawn posterior samples. The
confidence intervals for Stan-models are Bayesian predictive intervals.
By default (i.e. ppd = FALSE
), the predictions are based on
rstantools::posterior_linpred()
and hence have some
limitations: the uncertainty of the error term is not taken into
account. The recommendation is to use the posterior predictive
distribution (rstantools::posterior_predict()
).
Zero-Inflated and Zero-Inflated Mixed Models with brms
Models of class brmsfit
always condition on the zero-inflation
component, if the model has such a component. Hence, there is no
type = "zero_inflated"
nor type = "zi_random"
for brmsfit
-models,
because predictions are based on draws of the posterior distribution,
which already account for the zero-inflation part of the model.
Zero-Inflated and Zero-Inflated Mixed Models with glmmTMB
If model
is of class glmmTMB
, hurdle
, zeroinfl
or zerotrunc
, simulations from a multivariate normal distribution
(see ?MASS::mvrnorm
) are drawn to calculate mu*(1-p)
.
Confidence intervals are then based on quantiles of these results. For
type = "zi_random"
, prediction intervals also take the uncertainty in
the random-effect paramters into account (see also Brooks et al. 2017,
pp.391-392 for details).
An alternative for models fitted with glmmTMB that take all model
uncertainties into account are simulations based on simulate()
, which
is used when type = "sim"
(see Brooks et al. 2017, pp.392-393 for
details).
MixMod-models from GLMMadaptive
Predicted values for the fixed effects component (type = "fixed"
or
type = "zero_inflated"
) are based on predict(..., type = "mean_subject")
,
while predicted values for random effects components (type = "random"
or
type = "zi_random"
) are calculated with predict(..., type = "subject_specific")
(see ?GLMMadaptive::predict.MixMod
for details). The latter option
requires the response variable to be defined in the newdata
-argument
of predict()
, which will be set to its typical value (see
?sjmisc::typical_value
).
A data frame (with ggeffects
class attribute) with consistent data columns:
"x"
: the values of the first term in terms
, used as x-position in plots.
"predicted"
: the predicted values of the response, used as y-position in plots.
"std.error"
: the standard error of the predictions. Note that the standard
errors are always on the link-scale, and not back-transformed for non-Gaussian
models!
"conf.low"
: the lower bound of the confidence interval for the predicted values.
"conf.high"
: the upper bound of the confidence interval for the predicted values.
"group"
: the grouping level from the second term in terms
, used as
grouping-aesthetics in plots.
"facet"
: the grouping level from the third term in terms
, used to indicate
facets in plots.
The estimated marginal means (or predicted values) are always on the response scale!
For proportional odds logistic regression (see ?MASS::polr
)
resp. cumulative link models (e.g., see ?ordinal::clm
),
an additional column "response.level"
is returned, which indicates
the grouping of predictions based on the level of the model's response.
Note that for convenience reasons, the columns for the intervals
are always named "conf.low"
and "conf.high"
, even though
for Bayesian models credible or highest posterior density intervals
are returned.
Multinomial Models
polr
-, clm
-models, or more generally speaking, models with ordinal or
multinominal outcomes, have an additional column response.level
, which
indicates with which level of the response variable the predicted values are
associated.
Printing Results
The print()
-method gives a clean output (especially for predictions by
groups), and indicates at which values covariates were held constant.
Furthermore, the print()
-method has the arguments digits
and n
to
control number of decimals and lines to be printed, and an argument x.lab
to print factor-levels instead of numeric values if x
is a factor.
Limitations
The support for some models, for example from package MCMCglmm, is rather experimental and may fail for certain models. If you encounter any errors, please file an issue at https://github.com/strengejacke/ggeffects/issues.
Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400.
Johnson PC, O'Hara RB. 2014. Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946.
library(sjlabelled) data(efc) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = "c12hour") ggpredict(fit, terms = c("c12hour", "c172code")) ggpredict(fit, terms = c("c12hour", "c172code", "c161sex")) # specified as formula ggpredict(fit, terms = ~ c12hour + c172code + c161sex) # only range of 40 to 60 for variable 'c12hour' ggpredict(fit, terms = "c12hour [40:60]") # terms as named list ggpredict(fit, terms = list(c12hour = 40:60)) # covariate "neg_c_7" is held constant at a value of 11.84 (its mean value). # To use a different value, use "condition" ggpredict(fit, terms = "c12hour [40:60]", condition = c(neg_c_7 = 20)) # to plot ggeffects-objects, you can use the 'plot()'-function. # the following examples show how to build your ggplot by hand. ## Not run: # plot predicted values, remaining covariates held constant library(ggplot2) mydf <- ggpredict(fit, terms = "c12hour") ggplot(mydf, aes(x, predicted)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1) # three variables, so we can use facets and groups mydf <- ggpredict(fit, terms = c("c12hour", "c161sex", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet, ncol = 2) # select specific levels for grouping terms mydf <- ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet) + labs( y = get_y_title(mydf), x = get_x_title(mydf), colour = get_legend_title(mydf) ) # level indication also works for factors with non-numeric levels # and in combination with numeric levels for other variables data(efc) efc$c172code <- sjlabelled::as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = c("c12hour", "c172code [low level of education, high level of education]", "c161sex [1]")) # when "terms" is a named list ggpredict(fit, terms = list( c12hour = seq(0, 170, 30), c172code = c("low level of education", "high level of education"), c161sex = 1) ) # use categorical value on x-axis, use axis-labels, add error bars dat <- ggpredict(fit, terms = c("c172code", "c161sex")) ggplot(dat, aes(x, predicted, colour = group)) + geom_point(position = position_dodge(.1)) + geom_errorbar( aes(ymin = conf.low, ymax = conf.high), position = position_dodge(.1) ) + scale_x_discrete(breaks = 1:3, labels = get_x_labels(dat)) # 3-way-interaction with 2 continuous variables data(efc) # make categorical efc$c161sex <- as_factor(efc$c161sex) fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc) # select only levels 30, 50 and 70 from continuous variable Barthel-Index dat <- ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex")) ggplot(dat, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) + facet_wrap(~facet) + labs( colour = get_legend_title(dat), x = get_x_title(dat), y = get_y_title(dat), title = get_title(dat) ) # or with ggeffects' plot-method plot(dat, ci = FALSE) ## End(Not run) # predictions for polynomial terms data(efc) fit <- glm( tot_sc_e ~ c12hour + e42dep + e17age + I(e17age^2) + I(e17age^3), data = efc, family = poisson() ) ggeffect(fit, terms = "e17age")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.