library(knitr) options(knitr.kable.NA = "") options(digits = 2) knitr::opts_chunk$set( echo = TRUE, collapse = TRUE, warning = FALSE, message = FALSE, comment = "#>", out.width = "100%" ) if (!requireNamespace("poorman", quietly = TRUE) || !requireNamespace("lme4", quietly = TRUE) || !requireNamespace("effectsize", quietly = TRUE) || !requireNamespace("lm.beta", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } else { library(parameters) library(poorman) library(effectsize) library(lm.beta) } set.seed(333)
The
model_parameters()
function (also accessible via the shortcut parameters()
) can also be used to
calculate standardized model parameters via the standardize
-argument. Recall
that standardizing data/variable (z-scoring), i.e. centering and scaling,
involves expressing data in terms of standard deviation (i.e., mean = 0, SD =
1). That is, it the process of subtracting the mean and dividing the quantity by
standard deviation. Standardization can help avoid multicollinearity issues when more complex (polynomial, for instance) terms are included in the model.
There are different methods of standardizing model parameters (see also ?effectsize::standardize_parameters
):
"refit"
,"posthoc"
"smart"
"basic"
If you are interested in more statistical and technical details, and how standardization methods relate to different (standardized) effect size measures, read the following vignette from effectsize package, from whence this functionality comes: https://easystats.github.io/effectsize/articles/standardize_parameters.html
standardize = "refit"
is based on a complete model re-fit with a standardized
version of data. Hence, this method is equal to standardizing the variables
before fitting the model. It is the most accurate (Neter et al., 1989), but it
is also the most computationally costly and long (especially for heavy models
such as, for instance, Bayesian models). This method is particularly recommended
for complex models that include interactions or transformations (e.g.,
polynomial or spline terms).
When standardize = "refit"
, model_parameters()
internally calls
effectsize::standardize()
to standardize the data that was used to fit the model and updates the model
with the standardized data. Note that effectsize::standardize()
tries to
detect which variables should be standardized and which not. For instance,
having a log(x)
in the model formula would exclude x
from being
standardized, because x
might get negative values, and thus log(x)
would no
longer be defined. Factors or dates will also not be standardized. Response
variables will be standardized, if appropriate.
library(lme4) data(iris) set.seed(1234) iris$grp <- as.factor(sample(1:3, nrow(iris), replace = TRUE)) # fit example model model <- lme4::lmer( Sepal.Length ~ Species * Sepal.Width + Petal.Length + (1 | grp), data = iris ) # classic model parameters model_parameters(model) # standardized model parameters model_parameters(model, standardize = "refit")
The second output is identical to following:
# standardize continuous variables manually model2 <- lme4::lmer( scale(Sepal.Length) ~ Species * scale(Sepal.Width) + scale(Petal.Length) + (1 | grp), data = iris ) model_parameters(model2)
standardize = "posthoc"
aims at emulating the results obtained by "refit"
without refitting the model. The coefficients are divided by the standard
deviation of the outcome (which becomes their expression unit). Then, the
coefficients related to numeric variables are additionally multiplied by the
standard deviation of the related terms, so that they correspond to changes of 1
SD of the predictor (e.g., "a change in 1 SD of x
is related to a change of
0.24 of the SD of y
"). This does not apply to binary variables or factors, so
the coefficients are still related to changes in levels.
This method is not accurate and tends to give aberrant results when interactions are specified. However, this method of standardization is the "classic" result obtained by many statistical packages when standardized coefficients are requested.
When standardize = "posthoc"
, model_parameters()
internally calls
effectsize::standardize_parameters(method = "posthoc")
.
Test statistic and p-values are not affected, i.e. they are the same as if no
standardization would be applied.
model_parameters(model, standardize = "posthoc")
standardize = "basic"
also applies post-hoc standardization, however, factors
are converted to numeric, which means that it also scales the coefficient by the
standard deviation of model's matrix' parameter of factor levels (transformed to
integers) or binary predictors.
model_parameters(model, standardize = "basic")
Compare the two outputs above and notice how coefficient estimates, standard
errors, confidence intervals, and p-values change for main effect and
interaction effect terms containing Species
variable, the only factor variable
in our model.
This method is the one implemented by default in other software packages, such as lm.beta::lm.beta()
:
library(lm.beta) data(iris) model3 <- lm(Sepal.Length ~ Species * Sepal.Width + Petal.Length, data = iris) mp <- model_parameters(model3, standardize = "basic") out <- lm.beta(model3) data.frame(model_parameters = mp$Std_Coefficient, lm.beta = coef(out))
standardize = "smart"
is similar to standardize = "posthoc"
in that it does
not involve model re-fitting. The difference is that the SD of the response is
computed on the relevant section of the data. For instance, if a factor with 3
levels A (the intercept), B and C is entered as a predictor, the effect
corresponding to B versus A will be scaled by the variance of the response at
the intercept only. As a results, the coefficients for effects of factors are
similar to a Glass' delta.
model_parameters(model, standardize = "smart")
We will use the "refit" method as the baseline. We will then compute the differences between these standardized parameters and the ones provided by the other functions. The bigger the (absolute) number, the worse it is.
SPOILER ALERT: the standardization implemented in
effectsize
is the most accurate and the most flexible.
library(effectsize) library(lm.beta) library(MuMIn) comparison <- function(model, robust = FALSE) { out <- standardize_parameters(model, method = "refit", robust=robust)[1:2] out$posthoc <- tryCatch( out[, 2] - standardize_parameters(model, method="posthoc", robust=robust)[, 2], error = function(error_condition) "Error" ) out$basic <- tryCatch( out[, 2] - standardize_parameters(model, method="basic", robust=robust)[, 2] , error = function(error_condition) "Error" ) out$lm.beta <- tryCatch( out[, 2] - lm.beta::lm.beta(model)$standardized.coefficients, error = function(error_condition) "Error", warning = function(warning_condition) "Error" ) out$MuMIn <- tryCatch( out[, 2] - MuMIn::std.coef(model, partial.sd = FALSE)[, 1], error = function(error_condition) "Error" )
data <- iris data$Group_Sepal.Width <- as.factor(ifelse(data$Sepal.Width > 3, "High", "Low")) data$Binary_Sepal.Width <- as.factor(ifelse(data$Sepal.Width > 3, 1, 0)) summary(data)
model <- lm(Sepal.Length ~ Petal.Width + Sepal.Width, data=data) comparison(model)
library(lme4) model <- lme4::lmer(Sepal.Length ~ Petal.Width + Sepal.Width + (1|Species), data=data) comparison(model)
library(rstanarm) model <- stan_glm(Sepal.Length ~ Petal.Width + Sepal.Width, data=data) comparison(model)
For these simple models, all methods return results equal to the "refit" method (although the other packages fail).
model <- lm(Sepal.Length ~ poly(Petal.Width, 2) + poly(Sepal.Width, 2), data=data) comparison(model)
When transformation are involved (e.g., polynomial transformations), the basic method becomes very unreliable.
model <- lm(Sepal.Length ~ Petal.Width + Group_Sepal.Width, data=data) comparison(model)
model <- glm(Binary_Sepal.Width ~ Petal.Width + Species, data=data, family="binomial") comparison(model)
library(lme4) model <- lme4::lmer(Sepal.Length ~ Petal.Length + Group_Sepal.Width + (1|Species), data=data) comparison(model)
library(rstanarm) model <- stan_lmer(Sepal.Length ~ Petal.Width + Group_Sepal.Width + (1|Species), data=data) comparison(model)
When factors are involved, the basic method (that standardizes the numeric transformation of factors) give again different results.
Long story short, coeffcient obtained via posthoc standardization (without refitting the model) go berserk when interactions are involved. However, this is "normal": a regression model estimates coefficient between two variables when the other predictors are at 0 (are fixed at 0, that people interpret as "adjusted for"). When a standardized data is passed (in the refit method), the effects and interactions are estimated at the means of the other predictors (because 0 is the mean for a standardized variable). Whereas in posthoc standardization, this coefficient correspond to something different (because the 0 corresponds to something different in standardzed and non-standardized data). In other words, when it comes to interaction, passing standardized data results in a different model, which coefficient have an intrinsically different meaning from unstandardized data. And as for now, we are unable to retrieve one from another.
model <- lm(Sepal.Length ~ Petal.Width * Sepal.Width, data=data) comparison(model) ``` --> #### Between factors ```r model <- lm(Sepal.Length ~ Species * Group_Sepal.Width, data=data) comparison(model) ``` --> #### Between factors and continuous ```r model <- lm(Sepal.Length ~ Petal.Width * Group_Sepal.Width, data=data) comparison(model)
model <- lm(Sepal.Length ~ Group_Sepal.Width * Petal.Width, data=data) comparison(model)
Use refit
if possible, but if no interactions, can use posthoc
or
smart
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.