lm_betaselect: Betas-Select in a Regression Model
In betaselectr: Betas-Select in Structural Equation Models and Linear Models

lm_betaselect

R Documentation

Betas-Select in a Regression Model

Description

Can fit a linear regression models with selected variables standardized; handle product terms correctly and skip categorical predictors in standardization.

Usage

lm_betaselect(
  ...,
  to_standardize = NULL,
  not_to_standardize = NULL,
  skip_response = FALSE,
  do_boot = TRUE,
  bootstrap = 100L,
  iseed = NULL,
  parallel = FALSE,
  ncpus = parallel::detectCores(logical = FALSE) - 1,
  progress = TRUE,
  load_balancing = FALSE,
  model_call = c("lm", "glm")
)

glm_betaselect(
  ...,
  to_standardize = NULL,
  not_to_standardize = NULL,
  skip_response = FALSE,
  do_boot = TRUE,
  bootstrap = 100L,
  iseed = NULL,
  parallel = FALSE,
  ncpus = parallel::detectCores(logical = FALSE) - 1,
  progress = TRUE,
  load_balancing = FALSE
)

## S3 method for class 'lm_betaselect'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  type = c("beta", "standardized", "raw", "unstandardized"),
  ...
)

## S3 method for class 'glm_betaselect'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  type = c("beta", "standardized", "raw", "unstandardized"),
  ...
)

raw_output(x)

Arguments

`...`	For `lm_betaselect()`. these arguments will be passed directly to `lm()`. For `glm_betaselect()`, these arguments will be passed to `glm()`. For the `print`-method of `lm_betaselect` or `glm_betaselect` objects, this will be passed to other methods.
`to_standardize`	A string vector, which should be the names of the variables to be standardized. Default is `NULL`, indicating all variables are to be standardized.
`not_to_standardize`	A string vector, which should be the names of the variables that should not be standardized. This argument is useful when most variables, except for a few, are to be standardized. This argument cannot be ued with `to_standardize` at the same time. Default is `NULL`, and only `to_standardize` is used.
`skip_response`	Logical. If `TRUE`, will not standardize the response (outcome) variable even if it appears in `to_standardize` or `to_standardize` is not specified. Used for models such as logistic regression models in which there are some restrictions on the response variables (e.g., only 0 or 1 for logistic regression).
`do_boot`	Whether bootstrapping will be conducted. Default is `TRUE`.
`bootstrap`	If `do_boot` is `TRUE`, this argument is the number of bootstrap samples to draw. Default is 100. Should be set to 5000 or even 10000 for stable results.
`iseed`	If `do_boot` is `TRUE` and this argument is not `NULL`, it will be used by `set.seed()` to set the seed for the random number generator. Default is `NULL`.
`parallel`	If `do_boot` is `TRUE` and this argument is `TRUE`, parallel processing will be used to do bootstrapping. Default is `FALSE` because bootstrapping for models fitted by `stats::lm()` or `stats::glm()` is rarely slow. Actually, if both `parallel` and `progress` are set to `TRUE`, the speed may even be slower than serial processing.
`ncpus`	If `do_boot` is `TRUE` and `parallel` is also `TRUE`, this argument is the number of processes to be used in parallel processing. Default is `parallel::detectCores(logical = FALSE) - 1`
`progress`	Logical. If `TRUE`, progress bars will be displayed for long process. Default is `TRUE`.
`load_balancing`	Logical. If `parallel` is `TRUE`, this determines whether load balancing will be used. Default is `FALSE` because the gain in speed is usually minor.
`model_call`	The model function to be called. If `"lm"`, the default, the model will be fitted by `stats::lm()`. If `"glm"`, the model will be fitted by `stats::glm()`. Users should call the corresponding function directly rather than setting this argument manually.
`x`	An `lm_betaselect` or `glm_betaselect` object.
`digits`	The number of significant digits to be printed for the coefficients.
`type`	The coefficients to be printed. For `"beta"` or `"standardized"`, the coefficients after selected variables standardized will be printed. For `"raw"` or `"unstandardized"`, the coefficients before standardization was done will be printed.

Details

The functions lm_betaselect() and glm_betaselect() let users select which variables to be standardized when computing the standardized solution. They have the following features:

They automatically skip categorical predictors (i.e., factor or string variables).
They do not standardize a product term, which is incorrect. Instead, they compute the product term with its component variables standardized, if requested.
They standardize the selected variables before fitting a model. Therefore, If a model has the term log(x) and x is one of the selected variables, the model used the logarithm of the standardized x in the model, instead of standardized log(x) which is difficult to interpret.
They can be used to generate nonparametric bootstrap confidence intervals for the standardized solution. Bootstrap confidence interval is better than the default confidence interval ignoring the standardization because it takes into account the sampling variance of the standard deviations. Preliminary support for bootstrap confidence has been found for forming confidence intervals for coefficients involving standardized variables in linear regression (Jones & Waller, 2013).

Problems With Common Approaches

In some regression programs, users have limited control on which variables to standardize when requesting the so-called "betas". The solution may be uninterpretable or misleading in these conditions:

Dummy variables are standardized and their coefficients cannot be interpreted as the difference between two groups on the outcome variables.
Product terms (interaction terms) are standardized and they cannot be interpreted as the changes in the effects of focal variables when the moderators change (Cheung, Cheung, Lau, Hui, & Vong, 2022).
Variables with meaningful units can be more difficult to interpret when they are standardized (e.g., age).

How The Function Work

They standardize the original variables before they are used in the model. Therefore, strictly speaking, they do not standardize the predictors in model, but standardize the input variable (Gelman et al., 2021).

The requested model is then fitted to the dataset with selected variables standardized. For the ease of follow-up analysis, both the results with selected variables standardized and the results without standardization are stored. If required, the results without standardization can be retrieved by raw_output().

Methods

The output of lm_betaselect() is an lm_betaselect-class object, and the output of glm_betaselect() is a glm_betaselect-class object. They have the following methods:

A coef-method for extracting the coefficients of the model. (See coef.lm_betaselect() and coef.glm_betaselect() for details.)
A vcov-method for extracting the variance-covariance matrix of the estimates of the coefficients. If bootstrapping is requested, it can return the matrix based on the bootstrapping estimates. (See vcov.lm_betaselect() and vcov.glm_betaselect() for details.)
A confint-method for forming the confidence intervals of the estimates of the coefficients. If bootstrapping is requested, it can return the bootstrap confidence intervals. (See confint.lm_betaselect() and confint.glm_betaselect() for details.)
A summary-method for printing the summary of the results, with additional information such as the number of bootstrap samples and which variables have been standardized. (See summary.lm_betaselect() and summary.glm_betaselect() for details.)
An anova-method for printing the ANOVA table. Can also be used to compare two or more outputs of lm_betaselect() or glm_betaselect() (See anova.glm_betaselect() and anova.glm_betaselect() for details.)
A predict-method for computing predicted values. It can be used to compute the predicted values given a set of new unstandardized data. The data will be standardized before computing the predicted values in the models with standardization. (See predict.lm_betaselect() and predict.glm_betaselect() for details.)
The default update-method for updating a call also works for an lm_betaselect object or a glm_betaselect() object. It can update the model in the same way it updates a model fitted by stats::lm() or stats::glm(), and also update the arguments of lm_betaselect() or glm_betaselect() such as the variables to be standardized. (See stats::update() for details.)

Most other methods for the output of stats::lm() and stats::glm() should also work on an lm_betaselect-class object or a glm_betaselect-class object, respectively. Some of them will give the same results regardless of the variables standardized. Examples are rstandard() and cooks.distance(). For some others, they should be used with cautions if they make use of the variance-covariance matrix of the estimates.

To use the methods for lm objects or glm objects on the results without standardization, simply use raw_output(). For example, to get the fitted values without standardization, call fitted(raw_output(x)), where x is the output of lm_betaselect() or glm_betaselect().

The function raw_output() simply extracts the regression output by stats::lm() or stats::glm() on the variables without standardization.

Value

The function lm_betaselect() returns an object of the class lm_betaselect, The function glm_betaselect() returns an object of the class glm_betaselect. They are similar in structure to the output of stats::lm() and stats::glm(), with additional information stored.

The function raw_output() returns an object of the class lm or glm, which are the results of fitting the model to the data by stats::lm() or stats::glm() without standardization.

Author(s)

Shu Fai Cheung https://orcid.org/0000-0002-9871-9448

References

Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502-505. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/hea0001188")}

Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aoms/1177732541")}

Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1017/9781139161879")}

Jones, J. A., & Waller, N. G. (2013). Computing confidence intervals for standardized regression coefficients. Psychological Methods, 18(4), 435–453. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/a0033269")}

Examples


data(data_test_mod_cat)

# Standardize only iv

lm_beta_x <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
                           data = data_test_mod_cat,
                           to_standardize = "iv")
lm_beta_x
summary(lm_beta_x)

# Manually standardize iv and call lm()

data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]

lm_beta_x_manual <- lm(dv ~ iv_z*mod + cov1 + cat1,
                       data = data_test_mod_cat)

coef(lm_beta_x)
coef(lm_beta_x_manual)

# Standardize all numeric variables

lm_beta_all <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
                             data = data_test_mod_cat)
# Note that cat1 is not standardized
summary(lm_beta_all)


data(data_test_mod_cat)

data_test_mod_cat$p <- scale(data_test_mod_cat$dv)[, 1]
data_test_mod_cat$p <- ifelse(data_test_mod_cat$p > 0,
                              yes = 1,
                              no = 0)
# Standardize only iv
logistic_beta_x <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
                                  family = binomial,
                                  data = data_test_mod_cat,
                                  to_standardize = "iv")
summary(logistic_beta_x)

logistic_beta_x
summary(logistic_beta_x)

# Manually standardize iv and call glm()

data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]

logistic_beta_x_manual <- glm(p ~ iv_z*mod + cov1 + cat1,
                              family = binomial,
                              data = data_test_mod_cat)

coef(logistic_beta_x)
coef(logistic_beta_x_manual)

# Standardize all numeric predictors

logistic_beta_allx <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
                                     family = binomial,
                                     data = data_test_mod_cat,
                                     to_standardize = c("iv", "mod", "cov1"))
# Note that cat1 is not standardized
summary(logistic_beta_allx)


summary(raw_output(lm_beta_x))

betaselectr documentation built on April 3, 2025, 8:51 p.m.

betaselectr index

README.md Beta-Select Demonstration: Logistic Regression by `glm()`" Beta-Select Demonstration: Regression by `lm()`" Beta-Select Demonstration: SEM by 'lavaan'"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

betaselectr
Betas-Select in Structural Equation Models and Linear Models

lm_betaselect: Betas-Select in a Regression Model
In betaselectr: Betas-Select in Structural Equation Models and Linear Models

Betas-Select in a Regression Model

Description

Usage

Arguments

Details

Problems With Common Approaches

How The Function Work

Methods

Value

Author(s)

References

See Also

Examples

Related to lm_betaselect in betaselectr...

R Package Documentation

Browse R Packages

We want your feedback!

betaselectr Betas-Select in Structural Equation Models and Linear Models

lm_betaselect: Betas-Select in a Regression Model In betaselectr: Betas-Select in Structural Equation Models and Linear Models

Betas-Select in a Regression Model

Description

Usage

Arguments

Details

Problems With Common Approaches

How The Function Work

Methods

Value

Author(s)

References

See Also

Examples

Related to lm_betaselect in betaselectr...

R Package Documentation

Browse R Packages

We want your feedback!

betaselectr
Betas-Select in Structural Equation Models and Linear Models

lm_betaselect: Betas-Select in a Regression Model
In betaselectr: Betas-Select in Structural Equation Models and Linear Models