library(knitr) options(knitr.kable.NA = "") options(digits = 2) knitr::opts_chunk$set(comment = ">") set.seed(1) pkgs <- c("effectsize", "parameters", "rstanarm", "bayestestR", "car") if (!all(sapply(pkgs, require, quietly = TRUE, character.only = TRUE))) { knitr::opts_chunk$set(eval = FALSE) }

Like in OLS / ML or other frequentists methods of model parameter estimation, standardizing the parameters of Bayesian (generalized) linear regression models can allow for the comparison of so-called "effects" within and between models, variables and studies.

As with frequentists methods, standardizing parameters should not be the only
method of examining the role different predictors play in a particular Bayesian
model, and this vignette generally assumes that the issues of model convergence,
goodness of fit and model selection have already been taken care of. (Learn more
about how to become a Bayesian master with the `bayestestR`

package.)

We will examine the predictive role of overtime (`xtra_hours`

), number of
compliments given to the boss (`n_comps`

) and seniority in predicting workers
salaries. Let's fit the model:

library(rstanarm) data("hardlyworking", package = "effectsize") head(hardlyworking) mod <- stan_glm(salary ~ xtra_hours + n_comps + seniority, data = hardlyworking, prior = normal(0, scale = c(1, 0.5, 0.5), autoscale = TRUE), # set some priors refresh = 0 ) parameters::model_parameters(mod, test = NULL)

Looking at the un-standardized ("raw") parameters, it looks like all predictors positively predict workers' salaries, but which has the highest predictive power? Unfortunately, the predictors are not on the same scale (hours, compliments, years), so comparing them is hard when looking at the raw data. This is where standardization comes in.

Like with frequentists
models
we can choose from the same standardization methods. Let's use the (slow)
`"refit"`

method.

library(effectsize) standardize_parameters(mod, method = "refit", ci = 0.89)

Note that the central tendency of the posterior distribution is still the
*median* - the median of the standardized posterior distribution. We can easily
change this, of the type of credible interval used:

library(effectsize) standardize_parameters(mod, method = "basic", ci = 0.89, centrality = "MAP", ci_method = "eti" )

As we can see, working harder (or at least for longer hours) has stronger
predictive power than complementing or seniority. (Do note, however, that this
does not mean that if you wish to have a higher salary you should work overtime
- the raw parameters seem to suggest that complementing your boss is the way to
go, with one compliment worth almost 3.5 times **more** than a full hours'
work!)

In classical frequentists models, the computation of $\eta^2$ or $\eta^2_p$ is
straightforward: based on the right combinations of sums-of-squares (*SS*s), we
get the correct proportion of variance accounted for by some predictor term.
However such a computation is not as straightforward for Bayesian models, for
various reasons (e.g., the model-*SS* and the residual-*SS* don't necessarily
sum to the total-*SS*). Although some have proposed Bayesian methods of
estimating explained variance in ANOVA designs [@marsman2019bayesian], these are
not yet easy to implement with `stan`

-based models.

An alternative route to obtaining effect sizes of explained variance, is via the
use of the ** posterior predictive distribution** (

By sampling from the PPD, we can decompose the sample to the various *SS*s
needed for the computation of explained variance measures. By repeatedly
sampling from the PPD, we can generate a posterior distribution of explained
variance estimates. But note that **these estimates are conditioned not only on
the location-parameters of the model, but also on the scale-parameters of the
model!** So it is vital to validate the
PPD
before using it to estimate explained variance measures.

Let's factorize out data from above:

hardlyworking$age_f <- cut(hardlyworking$age, breaks = c(25, 35, 45), right = FALSE, labels = c("Young", "Less_young") ) hardlyworking$comps_f <- cut(hardlyworking$n_comps, breaks = c(0, 1, 2, 3), include.lowest = TRUE, right = FALSE ) table(hardlyworking$age_f, hardlyworking$comps_f)

And fit our model:

# use (special) effects coding contrasts(hardlyworking$age_f) <- bayestestR::contr.bayes contrasts(hardlyworking$comps_f) <- bayestestR::contr.bayes modAOV <- stan_glm(salary ~ age_f * comps_f, data = hardlyworking, family = gaussian(), refresh = 0 )

We can use `eta_squared_posterior()`

to get the posterior distribution of
$eta^2$ or $eta^2_p$ for each effect. Like an ANOVA table, we must make sure to
use the right effects-coding and *SS*-type:

pes_posterior <- eta_squared_posterior(modAOV, draws = 500, # how many samples from the PPD? partial = TRUE, # partial eta squared # type 3 SS ss_function = car::Anova, type = 3 ) head(pes_posterior) bayestestR::describe_posterior(pes_posterior, rope_range = c(0, 0.1), test = "rope")

Compare to:

modAOV_f <- lm(salary ~ age_f * comps_f, data = hardlyworking ) eta_squared(car::Anova(modAOV_f, type = 3))

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.