library(knitr) options(knitr.kable.NA = "") knitr::opts_chunk$set( comment = ">", message = FALSE, warning = FALSE, out.width = "100%", collapse = TRUE, strip.white = FALSE, dpi = 450 ) options(digits = 2) pkgs <- c("effectsize", "lme4", "rstanarm") successfully_loaded <- sapply(pkgs, requireNamespace, quietly = TRUE) if (all(successfully_loaded)) { library(performance) library(effectsize) library(lme4) library(rstanarm) } set.seed(333)
The coefficient of determination, denoted $R^2$ and pronounced "R squared", typically corresponds the proportion of the variance in the dependent variable (the response) that is explained (i.e., predicted) by the independent variables (the predictors).
It is an "absolute" index of goodness-of-fit, ranging from 0 to 1 (often expressed in percentage), and can be used for model performance assessment or models comparison.
As models become more complex, the computation of an $R^2$ becomes increasingly less straightforward.
Currently, depending on the context of the regression model object, one can choose from the following measures supported in {performance}
:
# DONT INCLUDE FOR NOW AS IT's NOT COMPLETE d <- data.frame( "Model_class" = c("lm", "glm"), "r2_simple" = c("X", NA), "r2_Tjur" = c(NA, "X") ) knitr::kable(d)
TO BE COMPLETED.
Before we begin, let's first load the package.
library(performance)
lm
m_lm <- lm(wt ~ am * cyl, data = mtcars) r2(m_lm)
glm
In the context of a generalized linear model (e.g., a logistic model which outcome is binary), $R^2$ doesn't measure the percentage of "explained variance", as this concept doesn't apply. However, the $R^2$s that have been adapted for GLMs have retained the name of "R2", mostly because of the similar properties (the range, the sensitivity, and the interpretation as the amount of explanatory power).
For mixed models, performance
will return two different $R^2$s:
The marginal $R^2$ considers only the variance of the fixed effects (without the random effects), while the conditional $R^2$ takes both the fixed and random effects into account (i.e., the total model).
library(lme4) # defining a linear mixed-effects model model <- lmer(Petal.Length ~ Petal.Width + (1 | Species), data = iris) r2(model)
Note that r2
functions only return the $R^2$ values. We would encourage users to instead always use the model_performance
function to get a more comprehensive set of indices of model fit.
model_performance(model)
But, in the current vignette, we would like to exclusively focus on this family of functions and will only talk about this measure.
library(rstanarm) model <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0) r2(model)
As discussed above, for mixed-effects models, there will be two components associated with $R^2$.
# defining a Bayesian mixed-effects model model <- stan_lmer(Petal.Length ~ Petal.Width + (1 | Species), data = iris, refresh = 0) r2(model)
Cohen's $f$ (of ANOVA fame) can be used as a measure of effect size in the context of sequential multiple regression (i.e., nested models). That is, when comparing two models, we can examine the ratio between the increase in $R^2$ and the unexplained variance:
$$ f^{2}={R_{AB}^{2}-R_{A}^{2} \over 1-R_{AB}^{2}} $$
library(effectsize) data(hardlyworking) m1 <- lm(salary ~ xtra_hours, data = hardlyworking) m2 <- lm(salary ~ xtra_hours + n_comps + seniority, data = hardlyworking) cohens_f_squared(m1, model2 = m2)
If you want to know more about these indices, you can check out details and references in the functions that compute them here.
If you want to know about how to interpret these $R^2$ values, see these interpretation guidelines.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.