In Hemken/stdBeta_R: Standardized Coefficients for Linear Models

Comparing functions that standardize coefficients

There are several R functions out there that will return \'standardized coefficients\' of one sort or another. The capabilities and assumptions of these functions differ on

scaling variables versus scaling terms,
scaling by one standard deviation or two, and
scaling factor variables or not.

They also differ in what kind of data they return: just coefficients, full lm objects, or something else.

Reviewed here are the following functions:

QuantPsyc::lm.beta
lm.beta::lm.beta
lsr::standardCoefs
arm::standardize
reghelper::beta
effectsize::standardize
stdBeta::stdBeta

Three of these functions work fine for additive effects models - models with no interaction or polynomial terms. However, they fail to produce the expected results where higher order terms are present. Two of these produce valid models, in the sense that they produce the correct predicted values and residuals. However, they are difficult to interpret when the model includes higher order terms, because similar-looking variables are not on the same scales.

Package::functions that return standardized coefficients:

QuantPsyc::lm.beta - standardizes the model matrix terms, including factors. Returns a vector of coefficients. Breaks on interaction terms.
lm.beta::lm.beta - returns an object of class "lm.beta", an augmented "lm" object. Standardizes everything in the model matrix. Same approach as QuantPsyc::lm.beta, but with a different return object.
lsr::standardCoefs - returns a matrix of unstandardized and standardized coefficients. Standardizes the model matrix terms, including factors. Again, same approach as QuantPsyc::lm.beta.
arm::standardize - standardizes model frame, scales by 2 sds rather than 1, y optional, various factor options, options to exclude some vars, returns an "lm" object
reghelper::beta
effectsize::standardize
stdBeta::stdBeta

What are \'Standardized Coefficients\'?

Standardized coefficients are usually described as the coefficients estimated for a model when all of the variables have been standardized. There are some variations on this basic idea, but they all begin by first - in principle - standardizing the variables.

I stipulate \"in principle\" because in some cases you do not actually have to transform your data before estimating a model. Where your model consists of only first-order, continuous variables, all you need are the standard deviations of each of your variables to calculate standardized coefficients from the unstandardized coefficients (coefficients in your data\'s original units).

The shortcut formula for transforming coefficients, usually taught in introductory regression courses, is $$\beta = \sigma_x/\sigma_y \times b$$, where $b$ is an unstandardized coefficient, $\sigma_x$ is the standard deviation of the independent variable in question, and $\sigma_y$ is the standard deviation of the dependent variable.

In quite a bit of statistical software - SAS, Stata, SPSS, and the first three R functions listed above - this formula is blindly applied to models with higher order terms as well, that is, with interaction and polynomial terms. While these coefficients are useful for many purposes, they are not consistent with the basic idea of standardized coefficients, because they are not the coefficients one would estimate were the data transformed first. Just what they do represent is convoluted.

A Demonstration

Let\'s start with R\'s common mtcars data, and estimate a model where these functions fail.

# First, calculate by "hand"
zmpg <- scale(mtcars$mpg)
zdisp <- scale(mtcars$disp)
zwt <- scale(mtcars$wt)
coefficients(lm(zmpg ~ zdisp*zwt))

# Now, fit an unstandardized model
fit <- lm(mpg ~ disp*wt, mtcars)

# And apply the available functions
QuantPsyc::lm.beta(fit) # warning given, wrong results
lm.beta::lm.beta(fit)
lsr::standardCoefs(fit) # agrees with lm.beta::lm.beta, but no intercept
arm::standardize(fit, standardize.y=TRUE) # uses 2 sd
reghelper::beta(fit) # Correct!
effectsize::standardize(fit) # also correct!

The QuantPsyc function uses the classic formula, but is implemented in a way that produces a warning with higher order terms. The lm.beta and lsr functions also apply the classic shortcut formula, which is at odds with the concept. The arm function insists on scaling by 2 standard deviations, an alternative concept.

So none of these really returns the coefficients we want.

stdBeta::stdBeta(fit)

This gives us the coefficients we would gotten \'by hand\'.

Polynomial Regression

R\'s formula specification does not allow polynomial terms to be formed as tensor products like other statistical software, which leaves three commonly used methods for specifying polynomial regressions. The first is to \"inhibit\" interpretation of the exponentiation operator.

coefficients(lm(zmpg ~ zwt + I(zwt^2))) # by "hand"
fitsq <- lm(mpg ~ wt + I(wt^2), mtcars)
stdBeta::stdBeta(fitsq)
reghelper::beta(fitsq) # Correct!
effectsize::standardize(fitsq) # also correct!

The second method would be to use the poly function.

coefficients(lm(zmpg ~ zwt + I(zwt^2))) # by "hand"
fitpoly <- lm(mpg ~ poly(wt,2,raw=TRUE), data=mtcars)
stdBeta::stdBeta(fitpoly)
reghelper::beta(fitpoly) # Error
effectsize::standardize(fitpoly) # also correct!

Factors

mtcars$am <- factor(mtcars$am)
coefficients(lm(zmpg ~ zwt*am, data=mtcars)) # by "hand"
fitfactor <- lm(mpg ~ wt*am, data=mtcars)
stdBeta::stdBeta(fitfactor)
reghelper::beta(fitfactor) # scales factors by default
reghelper::beta(fitfactor, skip="am") # Correct
effectsize::standardize(fitfactor) # also correct!