knitr::opts_chunk$set( collapse = TRUE, fig.width = 6, fig.asp = .4, warning = FALSE, message = FALSE, comment = "#>" ) library(marginaleffects) library(patchwork) library(ggplot2) theme_set(theme_minimal())
In the context of this package, an "Adjusted Prediction" is defined as:
The response predicted by a model for some combination of the regressors' values, such as their means or factor levels (a.k.a. "reference grid").
An adjusted prediction is thus the regression-adjusted response variable (or link, or other fitted value), for a given combination (or grid) of predictors. This grid may or may not correspond to the actual observations in a dataset.
predictions calculates the regression-adjusted predicted values for a single hypothetical unit of observation with all regressors set at their means or modes:
library(marginaleffects) mod <- lm(mpg ~ hp + factor(cyl), data = mtcars) predictions(mod)
In many cases, this is too limiting, and researchers will want to specify a grid of "typical" values over which to compute adjusted predictions.
There are two main ways to select the reference grid over which we want to compute adjusted predictions. The first is using the
variables argument. The second is with the
newdata argument and the
typical() function that we already introduced in the marginal effects vignette.
variables: Levels and Tukey's 5 numbers
variables argument is a handy shortcut to create grids of predictors. Each of the levels of factor/logical/character variables listed in the
variables argument will be displayed. For numeric variables,
predictions will compute adjusted predictions at Tukey's 5 summary numbers. All other variables will be set at their means or modes.
predictions(mod, variables = c("cyl", "hp"))
data.frame produced by
predictions is "tidy", which makes it easy to manipulate with other
R packages and functions:
library(kableExtra) library(tidyverse) predictions(mod, variables = c("cyl", "hp")) %>% select(hp, cyl, predicted) %>% pivot_wider(values_from = predicted, names_from = cyl) %>% kbl(caption = "A table of Adjusted Predictions") %>% kable_styling() %>% add_header_above(header = c(" " = 1, "cyl" = 3))
A second strategy to construct grids of predictors for adjusted predictions is to combine the
newdata argument and the
typical function. Recall that this function creates a "typical" dataset with all variables at their means or modes, except those we explicitly define:
typical(cyl = c(4, 6, 8), model = mod)
We can also use this
typical function in a
predictions call (omitting the
predictions(mod, newdata = typical(cyl = c(4, 6, 8)))
First, we download the
ggplot2movies dataset from the RDatasets archive. Then, we create a variable called
certified_fresh for movies with a rating of at least 8. Finally, we discard some outliers and fit a logistic regression model:
library(tidyverse) dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2movies/movies.csv") %>% mutate(style = case_when(Action == 1 ~ "Action", Comedy == 1 ~ "Comedy", Drama == 1 ~ "Drama", TRUE ~ "Other"), style = factor(style), certified_fresh = rating >= 8) %>% filter(length < 240) mod <- glm(certified_fresh ~ length * style, data = dat, family = binomial)
We can plot adjusted predictions, conditional on the
length variable using the
mod <- glm(certified_fresh ~ length, data = dat, family = binomial) plot_cap(mod, condition = "length")
We can also introduce another condition which will display a categorical variable like
style in different colors. This can be useful in models with interactions:
mod <- glm(certified_fresh ~ length * style, data = dat, family = binomial) plot_cap(mod, condition = c("length", "style"))
Of course, you can also design your own plots or tables by working with the
predictions output directly:
predictions(mod, type = c("response", "link"), newdata = typical(length = 90:120, style = c("Action", "Comedy"))) %>% ggplot(aes(length, predicted, color = style)) + geom_line() + facet_wrap(~type, scales = "free_y")
predictions function computes model-adjusted means on the scale of the output of the
predict(model) function. By default,
predict produces predictions on the
"response" scale, so the adjusted predictions should be interpreted on that scale. However, users can pass a string or a vector of strings to the
type argument, and
predictions will consider different outcomes.
Typical values include
"link", but users should refer to the documentation of the
predict of the package they used to fit the model to know what values are allowable. documentation.
mod <- glm(am ~ mpg, family = binomial, data = mtcars) predictions(mod, type = c("response", "link"))
Users who need more control over the type of adjusted predictions to compute, including a host of options for back-transformation, may want to consider the
We can also plot predictions on different outcome scales:
plot_cap(mod, condition = "mpg", type = "response")
plot_cap(mod, condition = "mpg", type = "link")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.