View source: R/lm_betaselect.R
lm_betaselect | R Documentation |
Can fit a linear regression models with selected variables standardized; handle product terms correctly and skip categorical predictors in standardization.
lm_betaselect(
...,
to_standardize = NULL,
not_to_standardize = NULL,
skip_response = FALSE,
do_boot = TRUE,
bootstrap = 100L,
iseed = NULL,
parallel = FALSE,
ncpus = parallel::detectCores(logical = FALSE) - 1,
progress = TRUE,
load_balancing = FALSE,
model_call = c("lm", "glm")
)
glm_betaselect(
...,
to_standardize = NULL,
not_to_standardize = NULL,
skip_response = FALSE,
do_boot = TRUE,
bootstrap = 100L,
iseed = NULL,
parallel = FALSE,
ncpus = parallel::detectCores(logical = FALSE) - 1,
progress = TRUE,
load_balancing = FALSE
)
## S3 method for class 'lm_betaselect'
print(
x,
digits = max(3L, getOption("digits") - 3L),
type = c("beta", "standardized", "raw", "unstandardized"),
...
)
## S3 method for class 'glm_betaselect'
print(
x,
digits = max(3L, getOption("digits") - 3L),
type = c("beta", "standardized", "raw", "unstandardized"),
...
)
raw_output(x)
... |
For |
to_standardize |
A string vector,
which should be the names of the
variables to be standardized.
Default is |
not_to_standardize |
A string
vector, which should be the names
of the variables that should not be
standardized. This argument is useful
when most variables, except for a few,
are to be standardized. This argument
cannot be ued with |
skip_response |
Logical. If
|
do_boot |
Whether bootstrapping
will be conducted. Default is |
bootstrap |
If |
iseed |
If |
parallel |
If |
ncpus |
If |
progress |
Logical. If |
load_balancing |
Logical. If
|
model_call |
The model function
to be called.
If |
x |
An |
digits |
The number of significant digits to be printed for the coefficients. |
type |
The coefficients to be
printed. For |
The functions lm_betaselect()
and glm_betaselect()
let users
select which variables to be
standardized when computing the
standardized solution. They have the
following features:
They automatically skip categorical predictors (i.e., factor or string variables).
They do not standardize a product term, which is incorrect. Instead, they compute the product term with its component variables standardized, if requested.
They standardize the selected
variables before fitting a model.
Therefore, If a model has the term
log(x)
and x
is one of the
selected variables, the model used
the logarithm of the standardized
x
in the model, instead of
standardized log(x)
which is
difficult to interpret.
They can be used to generate nonparametric bootstrap confidence intervals for the standardized solution. Bootstrap confidence interval is better than the default confidence interval ignoring the standardization because it takes into account the sampling variance of the standard deviations. Preliminary support for bootstrap confidence has been found for forming confidence intervals for coefficients involving standardized variables in linear regression (Jones & Waller, 2013).
In some regression programs, users have limited control on which variables to standardize when requesting the so-called "betas". The solution may be uninterpretable or misleading in these conditions:
Dummy variables are standardized and their coefficients cannot be interpreted as the difference between two groups on the outcome variables.
Product terms (interaction terms) are standardized and they cannot be interpreted as the changes in the effects of focal variables when the moderators change (Cheung, Cheung, Lau, Hui, & Vong, 2022).
Variables with meaningful units can be more difficult to interpret when they are standardized (e.g., age).
They standardize the original variables before they are used in the model. Therefore, strictly speaking, they do not standardize the predictors in model, but standardize the input variable (Gelman et al., 2021).
The requested model is then fitted to
the dataset with selected variables
standardized. For the ease of
follow-up analysis, both the results
with selected variables standardized
and the results without
standardization are stored. If
required, the results without
standardization can be retrieved
by raw_output()
.
The output of lm_betaselect()
is
an lm_betaselect
-class object,
and the output of glm_betaselect()
is a glm_betaselect
-class object.
They have the following methods:
A coef
-method for extracting
the coefficients of the model.
(See coef.lm_betaselect()
and coef.glm_betaselect()
for details.)
A vcov
-method for extracting the
variance-covariance matrix of the
estimates of the coefficients.
If bootstrapping is requested, it
can return the matrix based on the
bootstrapping estimates.
(See vcov.lm_betaselect()
and vcov.glm_betaselect()
for details.)
A confint
-method for forming the
confidence intervals of the
estimates of the coefficients.
If bootstrapping is requested, it
can return the bootstrap confidence
intervals.
(See confint.lm_betaselect()
and
confint.glm_betaselect()
for details.)
A summary
-method for printing the
summary of the results, with additional
information such as the number of
bootstrap samples and which variables
have been standardized.
(See summary.lm_betaselect()
and
summary.glm_betaselect()
for details.)
An anova
-method for printing the
ANOVA table. Can also be used to
compare two or more outputs of
lm_betaselect()
or
glm_betaselect()
(See anova.glm_betaselect()
and anova.glm_betaselect()
for details.)
A predict
-method for computing
predicted values. It can be used to
compute the predicted values given
a set of new unstandardized data.
The data will be standardized before
computing the predicted values in
the models with standardization.
(See predict.lm_betaselect()
and
predict.glm_betaselect()
for details.)
The default update
-method for updating
a call also works for an
lm_betaselect
object or
a glm_betaselect()
object. It can
update the model in the same
way it updates a model fitted by
stats::lm()
or stats::glm()
,
and also update
the arguments of lm_betaselect()
or glm_betaselect()
such as the variables to be
standardized.
(See stats::update()
for details.)
Most other methods for the output
of stats::lm()
and stats::glm()
should also work
on an lm_betaselect
-class object
or a glm_betaselect
-class object,
respectively.
Some of them will give the same
results regardless of the variables
standardized. Examples are
rstandard()
and cooks.distance()
.
For some others, they should be used
with cautions if they make use of
the variance-covariance matrix
of the estimates.
To use the methods for lm
objects
or glm
objects
on the results without standardization,
simply use raw_output()
. For example,
to get the fitted values without
standardization, call
fitted(raw_output(x))
, where x
is the output of lm_betaselect()
or glm_betaselect()
.
The function raw_output()
simply extracts
the regression output by stats::lm()
or stats::glm()
on the variables without standardization.
The function lm_betaselect()
returns an object of the class lm_betaselect
,
The function glm_betaselect()
returns an object of the class
glm_betaselect
. They are similar
in structure to the output of
stats::lm()
and stats::glm()
,
with additional information stored.
The function raw_output()
returns
an object of the class lm
or
glm
, which are
the results of fitting the model
to the data by stats::lm()
or stats::glm()
without
standardization.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502-505. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/hea0001188")}
Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aoms/1177732541")}
Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1017/9781139161879")}
Jones, J. A., & Waller, N. G. (2013). Computing confidence intervals for standardized regression coefficients. Psychological Methods, 18(4), 435–453. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/a0033269")}
print.lm_betaselect()
and
print.glm_betaselect()
for the
print
-methods.
data(data_test_mod_cat)
# Standardize only iv
lm_beta_x <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
data = data_test_mod_cat,
to_standardize = "iv")
lm_beta_x
summary(lm_beta_x)
# Manually standardize iv and call lm()
data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]
lm_beta_x_manual <- lm(dv ~ iv_z*mod + cov1 + cat1,
data = data_test_mod_cat)
coef(lm_beta_x)
coef(lm_beta_x_manual)
# Standardize all numeric variables
lm_beta_all <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
data = data_test_mod_cat)
# Note that cat1 is not standardized
summary(lm_beta_all)
data(data_test_mod_cat)
data_test_mod_cat$p <- scale(data_test_mod_cat$dv)[, 1]
data_test_mod_cat$p <- ifelse(data_test_mod_cat$p > 0,
yes = 1,
no = 0)
# Standardize only iv
logistic_beta_x <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat,
to_standardize = "iv")
summary(logistic_beta_x)
logistic_beta_x
summary(logistic_beta_x)
# Manually standardize iv and call glm()
data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]
logistic_beta_x_manual <- glm(p ~ iv_z*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat)
coef(logistic_beta_x)
coef(logistic_beta_x_manual)
# Standardize all numeric predictors
logistic_beta_allx <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
family = binomial,
data = data_test_mod_cat,
to_standardize = c("iv", "mod", "cov1"))
# Note that cat1 is not standardized
summary(logistic_beta_allx)
summary(raw_output(lm_beta_x))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.