knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, fig.align = "center" )
This vignette illustrates how to use
std_selected()
, the main function from
the stdmod
package.
More about this package can be found
in vignette("stdmod", package = "stdmod")
or at https://sfcheung.github.io/stdmod/.
std_selected()
to:get the correct standardized regression coefficients of a moderated regression model, and
form the valid confidence intervals of the standardized regression coefficients using nonparametric bootstrapping that takes into account the sampling variation due to standardization.
library(stdmod) dat <- sleep_emo_con head(dat, 3)
This dataset has 500 cases, with sleep duration
(measured in average hours),
conscientiousness, emotional stability, age, and gender (a
"female"
and "male"
).
The names of some variables are shortened for readability:
colnames(dat)[2:4] <- c("sleep", "cons", "emot") head(dat, 3)
Suppose this is the moderated regression model:
Dependent variable (Outcome Variable): sleep duration (sleep
)
Independent variable (Predictor / Focal Variable): emotional stability (emot
)
Moderator: conscientiousness (cons
)
Control variables: age
and gender
lm()
can be used to fit this model:
lm_out <- lm(sleep ~ age + gender + emot * cons, dat = dat) summary(lm_out)
The unstandardized moderation effect is significant, B =
r formatC(coef(lm_out)["emot:cons"], 4, format = "f")
.
For each one unit increase of conscientiousness score, the effect of emotional
stability decreases by r formatC(-1 * coef(lm_out)["emot:cons"], 4, format = "f")
.
Suppose we want to find the correct standardized solution for the moderated regression, that is, all variables except for categorical variables are standardized. In a moderated regression model, the product term should be formed after standardization.
Instead of doing the standardization ourselves before calling lm()
, we can pass
the lm()
output to std_selected()
, and use ~ .
for
the arguments to_scale
and to_center
.
lm_stdall <- std_selected(lm_out, to_scale = ~ ., to_center = ~ .)
Since 0.2.6.3, to_standardize
can be used as a shortcut:
lm_stdall <- std_selected(lm_out, to_standardize = ~ .)
summary(lm_stdall)
In this example, the coefficient of the product term, which naturally can
be called the
standardized moderation effect, is significant, B =
r formatC(coef(lm_stdall)["emot:cons"], 4, format = "f")
.
For each one standard deviation increase of conscientiousness score, the
standardized effect of emotional stability decreases by
r formatC(-1 * coef(lm_stdall)["emot:cons"], 4, format = "f")
.
Standardization is equivalent to centering by mean and then scaling by
(dividing by) standard deviation.
The argument to_center
specifies the variables to be centered
by their means, and the argument to_scale
specifies the variables to be scaled by
their standard deviations. The formula interface of lm()
is used in these two
arguments,
with the variables on the right hand side being the variables to be
centered and/or scaled.
The ".
" on the right hand side represents all variables in the model,
including the outcome variable (sleep duration in this example).
std_selected()
will also skip categorical variables automatically skipped
because standardizing them will make their coefficients not easy to interpret.
Since 0.2.6.3, to_standardize
is added as a shortcut. Listing a variable
on to_standardize
is equivalent to listing this variable
on both to_center
and to_scale
.
Using std_selected
minimizes impact on the workflow. Do regression
as usual. Get the correct standardized coefficients only when we need to
interpret them.
There is one problem with standardized coefficients. The confidence intervals based on ordinary least squares (OLS) fitted to the standardized variables do not take into account the sampling variation of the sample means and standard deviations (Yuan & Chan, 2011). Cheung, Cheung, Lau, Hui, and Vong (2022) suggest using nonparametric bootstrapping, with standardization conducted in each bootstrap sample.
This can be done by std_selected_boot()
, a wrapper of std_selected()
:
if (file.exists("stdmod_lm_stdall_boot.rds")) { lm_stdall_boot <- readRDS("stdmod_lm_stdall_boot.rds") } else { set.seed(870432) lm_stdall_boot <- std_selected_boot(lm_out, to_scale = ~ ., to_center = ~ ., nboot = 5000) saveRDS(lm_stdall_boot, "stdmod_lm_stdall_boot.rds", compress = "xz") }
set.seed(870432) lm_stdall_boot <- std_selected_boot(lm_out, to_scale = ~ ., to_center = ~ ., nboot = 5000)
Since 0.2.6.3, to_standardize
can be used as a shortcut:
lm_stdall_boot <- std_selected_boot(lm_out, to_standardize = ~ . nboot = 5000)
The minimum additional argument is nboot
, the number of bootstrap samples.
summary(lm_stdall_boot)
The output is similar to that of std_selected()
, with additional information
on the bootstrapping process.
tmp <- summary(lm_stdall_boot)$coefficients
The 95% bootstrap percentile confidence interval of the standardized
moderation effect is r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")
to
r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")
.
std_selected()
and std_selected_boot()
can also be used to standardize only
selected variables. There are cases in which we do not want to standardize
some continuous variables because they are measured on interpretable units,
such as hours.
Suppose we want to standardize only emotional stability and conscientiousness,
and do not standardize sleep
duration. We just list emot
and cons
on
to_center
and to_scale
:
lm_std1 <- std_selected(lm_out, to_scale = ~ emot + cons, to_center = ~ emot + cons)
Since 0.2.6.3, to_standardize
can be used a shortuct:
lm_std1 <- std_selected(lm_out, to_standardize = ~ emot + cons)
summary(lm_std1)
The partially standardized moderation effect is
r formatC(coef(lm_std1)["emot:cons"], 4, format = "f")
.
For each one standard deviation increase of conscientiousness score, the
partially standardized effect of emotional stability decreases by
r formatC(-1 * coef(lm_std1)["emot:cons"], 4, format = "f")
.
The function std_selected_boot()
can also be used to form the nonparametric
bootstrap confidence interval when only some of the variables are standardized:
if (file.exists("stdmod_lm_std1_boot.rds")) { lm_std1_boot <- readRDS("stdmod_lm_std1_boot.rds") } else { set.seed(870432) lm_std1_boot <- std_selected_boot(lm_out, to_scale = ~ emot + cons, to_center = ~ emot + cons, nboot = 5000) saveRDS(lm_std1_boot, "stdmod_lm_std1_boot.rds", compress = "xz") }
set.seed(870432) lm_std1_boot <- std_selected_boot(lm_out, to_scale = ~ emot + cons, to_center = ~ emot + cons, nboot = 5000)
Since 0.2.6.3, to_standardize
can be used as a shortcut:
lm_std1_boot <- std_selected_boot(lm_out, to_standardize = ~ emot + cons, nboot = 5000)
Again, the only additional argument is nboot
.
summary(lm_std1_boot)
tmp <- summary(lm_std1_boot)$coefficients
The 95% bootstrap percentile confidence interval of the partially standardized
moderation effect is r formatC(tmp["emot:cons", "CI Lower"], 4, format = "f")
to
r formatC(tmp["emot:cons", "CI Upper"], 4, format = "f")
.
A more detailed illustration can be found at
vignette("moderation", package = "stdmod")
.
vignette("std_selected", package = "stdmod")
illustrates how std_selected()
can be used
to form nonparametric bootstrap percentile confidence interval for
standardized regression coefficients ("betas") for regression models
without a product term.
Further information on the functions can be found in their help pages
(std_selected()
and std_selected_boot()
). For example, parallel computation
can be used when doing bootstrapping, if the number of bootstrapping samples
request is large.
Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502-505. https://doi.org/10.1037/hea0001188.
Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized regression coefficients. Psychometrika, 76(4), 670-690. https://doi.org/10.1007/s11336-011-9224-6
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.