library(knitr) knitr::opts_chunk$set( package.startup.message = FALSE, dpi = 300, collapse = TRUE, comment = "#>", warning = FALSE, message = TRUE ) options(knitr.kable.NA = "", digits = 2, width = 800, modelbased_join_dots = FALSE) pkgs <- c("ggplot2", "marginaleffects", "collapse", "Formula") successfully_loaded <- suppressPackageStartupMessages(vapply(pkgs, requireNamespace, FUN.VALUE = logical(1L), quietly = TRUE)) can_evaluate <- all(successfully_loaded) if (can_evaluate) { knitr::opts_chunk$set(eval = TRUE) junk <- suppressMessages(suppressPackageStartupMessages(vapply(pkgs, require, FUN.VALUE = logical(1L), quietly = TRUE, character.only = TRUE))) } else { knitr::opts_chunk$set(eval = FALSE) }
This vignette demonstrates a typical workflow using easystats packages, with a logistic regression model as an example. We will explore how the modelbased package can help to better understand our model and how to interpret results.
The very first step is usually importing and preparing some data (recoding, re-shaping data and so on - the usual data wrangling tasks), which is easily done using the datawizard package. In this example, we use datawizard only for some minor recodings. The coffee_data
data set is included in the modelbased package. The data set contains information on the effect of coffee consumption on alertness over time. The outcome variable is binary (alertness), and the predictor variables are coffee consumption (treatment) and time.
library(datawizard) # for data management, e.g. recodings data(coffee_data, package = "modelbased") # dichotomize outcome variable coffee_data$alertness <- categorize(coffee_data$alertness, lowest = 0) # rename variable coffee_data <- data_rename(coffee_data, select = c(treatment = "coffee")) # model model <- glm(alertness ~ treatment * time, data = coffee_data, family = binomial())
Let's start by examining the model coefficients. The package that manages everything related to model coefficients is the parameters package. We can use the model_parameters()
function to extract the coefficients from the model. By setting exponentiate = TRUE
, we can obtain the odds ratios for the coefficients.
library(parameters) # coefficients model_parameters(model, exponentiate = TRUE)
The model coefficients are difficult to interpret directly, in particular since we have an interaction effect. Instead, we should use the modelbased package to calculate adjusted predictions for the model.
As we mentioned above, interpreting model results can be hard, and sometimes even misleading, if you only look at the regression coefficients. Instead, it is useful to estimate model-based means or probabilities for the outcome. Ab absolutely easy way to make interpretation easier is to use the modelbased package. You just need to provide your predictors of interest, so called focal terms.
Since we are interested in the interaction effect of coffee consumption (treatment) on alertness depending on different times of the day, we simply specify these two variables as focal terms in the estimate_means()
function. This function calculates predictions on the response scale of the regression model. For logistic regression models, predicted probabilities are calculated. These refer to the adjusted probabilities of the outcome (higher alertness) depending on the predictor variables (treatment and time).
library(modelbased) # predicted probabilities predictions <- estimate_means(model, c("time", "treatment")) predictions
We now see that high alertness
was most likely for the coffee
group in the afternoon
time (about 75% probability of high alertness for the afternoon-coffee group).
We can also visualize these results, using the plot()
method. In short, this will give us a visual interpretation of the model.
# plot predicted probabilities plot(predictions)
We can also see that the predicted probabilities of alertness are higher for participants who consumed coffee compared to those who did not, but only in the morning and in the afternoon. Furthermore, we see differences between the coffee and the control group at each time point - but are these differences statistically significant?
To check this, we finally use the estimate_contrasts()
function to perform pairwise comparisons of the predicted probabilities. This function needs to know the variables that should be compared, or contrasted. In a first step, we want to compare all levels of the variables involved in our interaction term (our focal terms from above).
# pairwise comparisons - quite long table estimate_contrasts(model, c("time", "treatment"))
In the above output, we see all possible pairwise comparisons of the predicted probabilities. The table is quite long, but we can also group the comparisons, e.g. by the variable time.
# group comparisons by "time" estimate_contrasts(model, "treatment", by = "time")
The output shows that the differences between the coffee and the control group are statistically significant only in the noon time.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.