Effect Sizes for Bayesian Models"

options(knitr.kable.NA = "")
options(digits = 2)
knitr::opts_chunk$set(comment = ">")

pkgs <- c("effectsize", "parameters", "rstanarm", "bayestestR", "car")
if (!all(sapply(pkgs, require, quietly = TRUE, character.only = TRUE))) {
  knitr::opts_chunk$set(eval = FALSE)

Standardized Parameters


Like in OLS / ML or other frequentists methods of model parameter estimation, standardizing the parameters of Bayesian (generalized) linear regression models can allow for the comparison of so-called "effects" within and between models, variables and studies.

As with frequentists methods, standardizing parameters should not be the only method of examining the role different predictors play in a particular Bayesian model, and this vignette generally assumes that the issues of model convergence, goodness of fit and model selection have already been taken care of. (Learn more about how to become a Bayesian master with the bayestestR package.)

Set up

We will examine the predictive role of overtime (xtra_hours), number of compliments given to the boss (n_comps) and seniority in predicting workers salaries. Let's fit the model:


data("hardlyworking", package = "effectsize")


mod <- stan_glm(salary ~ xtra_hours + n_comps + seniority,
  data = hardlyworking,
  prior = normal(0, scale = c(1, 0.5, 0.5), autoscale = TRUE), # set some priors
  refresh = 0

parameters::model_parameters(mod, test = NULL)

Looking at the un-standardized ("raw") parameters, it looks like all predictors positively predict workers' salaries, but which has the highest predictive power? Unfortunately, the predictors are not on the same scale (hours, compliments, years), so comparing them is hard when looking at the raw data. This is where standardization comes in.

Like with frequentists models we can choose from the same standardization methods. Let's use the (slow) "refit" method.


standardize_parameters(mod, method = "refit", ci = 0.89)

Note that the central tendency of the posterior distribution is still the median - the median of the standardized posterior distribution. We can easily change this, of the type of credible interval used:


  method = "basic", ci = 0.89,
  centrality = "MAP", ci_method = "eti"

As we can see, working harder (or at least for longer hours) has stronger predictive power than complementing or seniority. (Do note, however, that this does not mean that if you wish to have a higher salary you should work overtime - the raw parameters seem to suggest that complementing your boss is the way to go, with one compliment worth almost 3.5 times more than a full hours' work!)



In classical frequentists models, the computation of $\eta^2$ or $\eta^2_p$ is straightforward: based on the right combinations of sums-of-squares (SSs), we get the correct proportion of variance accounted for by some predictor term. However such a computation is not as straightforward for Bayesian models, for various reasons (e.g., the model-SS and the residual-SS don't necessarily sum to the total-SS). Although some have proposed Bayesian methods of estimating explained variance in ANOVA designs [@marsman2019bayesian], these are not yet easy to implement with stan-based models.

An alternative route to obtaining effect sizes of explained variance, is via the use of the posterior predictive distribution (PPD). The PPD is the Bayesian expected distribution of possible unobserved values. Thus, after observing some data, we can estimate not just the expected mean values (the conditional marginal means), but also the full distribution of data around these values [@gelman2014bayesian, chapter 7].

By sampling from the PPD, we can decompose the sample to the various SSs needed for the computation of explained variance measures. By repeatedly sampling from the PPD, we can generate a posterior distribution of explained variance estimates. But note that these estimates are conditioned not only on the location-parameters of the model, but also on the scale-parameters of the model! So it is vital to validate the PPD before using it to estimate explained variance measures.


Let's factorize out data from above:

hardlyworking$age_f <- cut(hardlyworking$age,
  breaks = c(25, 35, 45), right = FALSE,
  labels = c("Young", "Less_young")
hardlyworking$comps_f <- cut(hardlyworking$n_comps,
  breaks = c(0, 1, 2, 3),
  include.lowest = TRUE,
  right = FALSE

table(hardlyworking$age_f, hardlyworking$comps_f)

And fit our model:

# use (special) effects coding
contrasts(hardlyworking$age_f) <- bayestestR::contr.bayes
contrasts(hardlyworking$comps_f) <- bayestestR::contr.bayes

modAOV <- stan_glm(salary ~ age_f * comps_f,
  data = hardlyworking, family = gaussian(),
  refresh = 0

We can use eta_squared_posterior() to get the posterior distribution of $eta^2$ or $eta^2_p$ for each effect. Like an ANOVA table, we must make sure to use the right effects-coding and SS-type:

pes_posterior <- eta_squared_posterior(modAOV,
  draws = 500, # how many samples from the PPD?
  partial = TRUE, # partial eta squared
  # type 3 SS
  ss_function = car::Anova, type = 3


bayestestR::describe_posterior(pes_posterior, rope_range = c(0, 0.1), test = "rope")

Compare to:

modAOV_f <- lm(salary ~ age_f * comps_f,
  data = hardlyworking

eta_squared(car::Anova(modAOV_f, type = 3))


Try the effectsize package in your browser

Any scripts or data that you put into this service are public.

effectsize documentation built on Oct. 4, 2021, 9:06 a.m.