In strengejacke/ggeffects: Create Tidy Data Frames of Marginal Effects for 'ggplot' from Model Outputs

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "png",
  fig.width = 7,
  fig.height = 3.5,
  message = FALSE, warning = FALSE)
options(width = 800)

if (!requireNamespace("ggplot2", quietly = TRUE) ||
    !requireNamespace("parameters", quietly = TRUE) ||
    !requireNamespace("margins", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
}

Marginal effects and predictions

There is no common language across fields regarding a unique meaning of "marginal effects". Thus, the wording throughout this package may vary. The most generic description of what ggeffects does, is: ggeffects allows us to interpret a statistical model by making predictions generated by the model when one holds the non-focal variables constant and varies the focal variable(s).

In the following, some examples are shown to make clear what is actually calculated and returned by package's functions ggpredict(), ggemmeans(), ggaverage() and ggeffect() (or their wrapper, predict_response()), and how this differs from other functions or software packages that calculate marginal effects.

An example with a simple linear model

First, we fit s simple linear model and look at the coefficient of the only predictor, Sepal.Width.

data(iris)
model1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
coef(model1)["Sepal.Width"]

In this basic example, the coefficient we see here is the slope of the regression line:

library(ggplot2)
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
  geom_point() +
  geom_abline(intercept = coef(model1)["(Intercept)"],
              slope = coef(model1)["Sepal.Width"])

For this simple linear model, the slope for the regression line is always the same for each value of our predictor, Sepal.Width. We can check this by generating predictions of our model.

library(ggeffects)
pr <- predict_response(model1, "Sepal.Width")
pr

"Predictions" returned by ggeffects are essentially interpretations of regression coefficients in terms of comparison. We can compare how much our outcome (Sepal.Length) changes on average, when the focal term (in this case: Sepal.Width) varies: For example, what is the average value of Sepal.Length for observations with a value of, say, 2 for Sepal.Width, compared to observations with a 3 for Sepal.Width?

ggeffects returns predictions for representative values of the focal term(s) by default, hence you see many predicted values (including confidence intervals) in the output for the different values of the focal term(s) (if you want to marginalize over other values than representative values, see the documentation for the margin argument).

If we now look at the differences between any two predicted values, we see that these are identical:

# Difference between predicted values for Sepal.Width = 2 and 3
pr <- predict_response(model1, "Sepal.Width [2,3]")
round(diff(pr$predicted), 4)

# Difference between predicted values for Sepal.Width = 3 and 4
pr <- predict_response(model1, "Sepal.Width [4,5]")
round(diff(pr$predicted), 4)

Furthermore, the difference of predicted values that differ by 1 in the focal term (Sepal.Width), equals the regression coefficient. This is because the interpretation of a regression coefficient can be seen as average difference in the outcome, "comparing two individuals that differ in one predictor [a difference of 1 for Sepal.Width in this case], while being at the same levels of all other predictors." (Gelman, Hill, Vehtari 2020, page 494). We don't have any other predictors in this example, so we don't go into deeper details here.

Thus, the association - or effect - between Sepal.Length and Sepal.Width is the same for every value of Sepal.Width. This means, that for simple linear models, the regression coefficient is also the marginal effect:

library(margins)
margins(model1)

The marginal effect here is at the same time the average marginal effect, because on average, the effect of Sepal.Width on Sepal.Length is -0.2234: when Sepal.Width changes by 1, the value of Sepal.Length changes by -0.2234 on average.

An example with a simple logistic regression model

For the next example, we simulate some data for a logistic regression model.

set.seed(123)
y <- rbinom(300, 1, c(0.3, 0.7))
x <- rnorm(300, 2)
y_1 <- y == 1
x[y_1] <- x[y_1] + rnorm(sum(y_1), 3)

d <- data.frame(x, y)
model2 <- glm(y ~ x, family = binomial(), data = d)

coef(model2)["x"]

The regression coefficient for x (on the logit-scale) is 2.641. However, for a logistic regression, this "slope" is not constant across all values of x, because we have non-linear transformations here. This becomes clearer by looking at the predicted probabilities:

plot(predict_response(model2, "x [all]"), show_ci = FALSE, show_data = TRUE)

As we can see, we have some differences in the case of logistic regression models compared to the linear regression model:

We no longer have the predicted average difference or mean in our outcome, but rather the predicted probability that our outcome is 1 for a given value of x.
Due to the non-linear transformation, the slope differs at different values for x, thus, the "marginal effect" or "association" (in terms of probabilities) is not constant across values of x.
While the regression coefficient in linear models is already on the response scale, and hence the (average) marginal effect equals the regression coefficient, we have different scales in logistic regression models: the coefficients shown in summary() are on the logit-scale (the scale of the linear predictor); exponentiating that coefficient (i.e. exp(coef(model2))) returns an odds ratio; predictions are easy to interpret in terms of probabilities, as mentioned under 1).

First, let's look at the average marginal effect of x in this model:

margins(model2)

The result indicates "the contribution of each variable on the outcome scale", i.e. the "change in the predicted probability that the outcome equals 1" (see vignettes from the margins package). On average, a unit-change in x changes the predicted probability that the outcome equals 1 by 15.4\%.

More generally speaking: The marginal effect represents the difference of (two) predictions for an (infinitesimal) change in x (the focal term). The average marginal effect represents the average slope of that predictor. In other words: the average marginal effects is one value per parameter (term), thus it can be considered as an "adjusted regression coefficient", while predicted values usually predict the average outcome for different values of x - you usually don't have just one coefficient in the latter case that represents the overall effect of x.

I personally find it less intuitive to interpret the overall average marginal effects (which is not grouped by some values or levels of the focal term(s)), in particular for non-Gaussian models, because it is harder to understand an average effect where we actually have varying effects across the range of the focal term. Instead, I rather prefer to look at predictions (or average marginal effects) at different values of the focal term(s), which is what ggeffects returns by default:

predict_response(model2, "x")

For x = -2, the predicted probability that y = 1, as estimated by our model, is zero. For x = 10, the probability is 100\%. In essence, what predict_response() returns, are not average marginal effects, but rather the predicted values at different values of x (possibly adjusted for co-variates, also called non-focal terms, marginalized over the mean or a fixed level of non-focal terms). predict_response(margin = "empirical"), however, would return average marginal effects, at different values or levels of the focal term(s). Thus, the main difference depends on the method how you marginalize over the non-focal terms.

Marginal effects, average marginal effects or predictions?

The language used throughout this package considers "marginal effects" as adjusted predictions, i.e. predicted values. Depending on the response scale, these are either predicted (mean) values, predicted probabilities, predicted (mean) count (for count models) etc. Currently, ggeffects does not calculate average marginal effects. See the last section below for a summary of the different meanings and definitions.

Estimated marginal means

Sometimes, the term estimated marginal means is used as well, because this is commonly used in software packages likes SPSS, but there is also a prominent R package, emmeans.

But what is the difference, for instance, between simple means and "estimated marginal" means? And why "marginal"? The idea behind marginal effects, and estimated marginal means, is that the estimated (or predicted) average outcome value is adjusted for the remaining co-variates. We shall demonstrate this with another linear model.

We first simulate some fake data, where we want to see how income affects well-being. The dataset also includes a variable on health, which we will use later.

set.seed(123)
wellbeing <- runif(300, 0, 100)
income <- rep(NA, 300)
health <- runif(300, 30, 80)
health[wellbeing < 50] <- health[wellbeing < 50] - rnorm(sum(wellbeing < 50), 30, sd = 10)

income[wellbeing < 25] <- sample(1:3, sum(wellbeing < 25), replace = TRUE, prob = c(0.7, 0.2, 0.1))
income[wellbeing >= 25 & wellbeing < 50] <- sample(1:3, sum(wellbeing >= 25 & wellbeing < 50), replace = TRUE, prob = c(0.5, 0.3, 0.2))
income[wellbeing >= 50 & wellbeing < 75] <- sample(1:3, sum(wellbeing >= 50 & wellbeing < 75), replace = TRUE, prob = c(0.35, 0.35, 0.3))
income[wellbeing >= 75] <- sample(1:3, sum(wellbeing >= 75), replace = TRUE, prob = c(0.1, 0.2, 0.7))

income <- factor(income)
levels(income) <- c("low", "middle", "high")
d <- data.frame(income, wellbeing, health)

We now fit a linear model, to look at the regression coefficients:

library(parameters)
model3 <- lm(wellbeing ~ income, data = d)

model_parameters(model3)

We can see that the average well-being is 15.5 points higher for people from middle income groups compared to those from lower income groups. People with higher income even have on average a 33.11 points higher well-being than people with lower income.

We can fairly easy calculate the predicted average well-being by summing up the intercept and the coefficient for each middle and high income. This is what predict_response() also does:

predict_response(model3, "income")

In this example, the "marginal effects" (or estimated marginal means) equal the simple average values of well-being by the different income groups:

aggregate(d$wellbeing, list(d$income), mean)

However, we may conclude that the well-being is not only depending on income, but also on other factors, such as health status. health would be a confounder that impacts the association between income and wellbeing.

model4 <- lm(wellbeing ~ income + health, data = d)
compare_parameters(model3, model4)

Now we see that the effect of income on well-being is less pronounced when we take the health status into account. This "adjustment" for confounding variables can be accounted for when calculating marginal effects (or estimated marginal means). These predicted average values for wellbeing are now no longer the same as the simple group means, due to the adjustment:

predict_response(model4, "income")

This is the difference between simple "means" and "estimated marginal means". The latter are "adjusted" means, based on the model that adjusts for covariates or confounders. Thus, these predicted means are "marginalized" (i.e. averaged) over the levels of all covariates. This is what is meant with ggeffects is "making predictions generated by the model when one holds the non-focal variables constant". However, there are different way how to hold non-focal terms constant, and this is how results from predict_response() differ, depending on the value of the margin argument (described in detail in this vignette).

An example with logistic regression

We can demonstrate the same aspect of "adjusted predictions" we have seen above for linear model, for logistic regression models. Therefore, we generate some fake data again.

smoking <- data.frame(
  sex = factor(c("male", "female", "female", "male", "female", "female",
                 "male", "female", "female", "male", "male", "female",
                 "female"),
               levels = c("male", "female")),
  smoking = factor(c("no", "yes", "yes", "yes", "yes", "no", "no", "yes",
                     "yes", "no", "no", "no", "yes"),
                   levels = c("no", "yes")),
  age = c(10, 45, 50, 40, 45, 12, 14, 55, 60, 10, 14, 50, 40)
)

Looking at the proportions of the table, we see that many more female persons are smoking compared to male persons:

100 * prop.table(table(smoking$sex, smoking$smoking), margin = 1)

In this case, we have no "estimated" or "predicted" means or averages, but predicted probabilities. According to the table, the probability of being female and smoking is 75\%, while it's only 20\% for male persons. We get the same values for the predicted probabilities, if we run a logistic regression model:

model5 <- glm(smoking ~ sex, family = binomial(), data = smoking)

# Looking at the odds ratio for "sex"
model_parameters(model5, exponentiate = TRUE)

# Looking at the predicted probabilities for "sex"
predict_response(model5, "sex")

The reference category for sex is male, so we can estimate the average marginal effects for female persons using margins():

margins(model5)

The interpretation is like stated above: the change in the predicted probability that the outcome equals 1 for female persons is 0.55, i.e. 55\%. This is exactly the difference between the predicted probabilities for male and female persons.

Looking at the age distribution in the sample, we might conclude that our model produces biased estimates, and therefor biased predictions. Remember the high odds ratio of our model, as shown above. Now we include age as possible confounder in our model.

model6 <- glm(smoking ~ sex + age, family = binomial(), data = smoking)

# Looking at the odds ratio for "sex"
compare_parameters(model5, model6, exponentiate = TRUE)

# Looking at the predicted probabilities for "sex"
predict_response(model6, "sex")

As we can see, the female persons were much older than the male persons. Smoking is also associated with age and it is less likely that people smoke when they are children or younger teenagers. Adjusting for age reveals that the probability of smoking is actually higher for male persons, not female.

Conclusion: Marginal effects, conditional effects, predictions...

predict_response() holds non-focal terms constant at their mean value (if these are continuous) or at their reference level (for factors). Thus, effects returned by predict_response() (when margin = "mean_reference", the default) are actually conditional effects (i.e. these are conditioned on certain (reference) levels of factors).
However, predict_response(margin = "marginalmeans") (or: ggemmeans() and ggeffect()) returns marginal effects, since the effects are "marginalized" (or "averaged") over the levels of factors.
Finally, predict_response(margin = "ame") (or margin = "counterfactual", if you prefer this term) returns average marginal effects, grouped by the values or levels of the focal term(s).

There are obviously many different terms, not always used in a consistent way. But: Whenever "marginal effects" are mentioned here, it is about model-based predictions at different values or levels of the focal variable(s), holding the non-focal variables constant at their mean, reference level, averaged over factor levels, or marginalized over the entire range of the non-focal variable(s).