This article demonstrates how to use
lav_betaselect()
from the package
betaselectr
to standardize
selected variables in a model fitted
by lavaan
and forming confidence
intervals for the parameters.
The sample dataset from the package
betaselectr
will be used in this
demonstration:
library(betaselectr) head(data_test_medmod) #> dv iv mod med cov1 cov2 #> 1 7.487873 11.42573 16.65805 42.28988 54.14051 15.56069 #> 2 8.474931 16.64790 22.66332 42.08692 39.21125 17.61286 #> 3 11.206539 14.81278 22.80955 32.76869 31.97963 20.77333 #> 4 10.148827 15.79632 22.94451 43.96807 42.72187 15.66971 #> 5 7.421606 14.29621 24.51562 37.10942 42.74174 21.97132 #> 6 6.846435 12.00819 25.22163 35.46051 30.85914 22.35444
This is the path model, fitted by
lavaan::sem()
:
library(lavaan) #> This is lavaan 0.6-19 #> lavaan is FREE software! Please report any bugs. mod <- " med ~ iv + mod + iv:mod + cov1 + cov2 dv ~ med + iv + cov1 + cov2 " fit <- sem(mod, data_test_medmod)
The model has a moderator, mod
, posited
to moderate the effect from iv
to
med
. The product term is iv:mod
.
These are the results:
summary(fit) #> lavaan 0.6-19 ended normally after 2 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 11 #> #> Number of observations 200 #> #> Model Test User Model: #> #> Test statistic 2.303 #> Degrees of freedom 2 #> P-value (Chi-square) 0.316 #> #> Parameter Estimates: #> #> Standard errors Standard #> Information Expected #> Information saturated (h1) model Structured #> #> Regressions: #> Estimate Std.Err z-value P(>|z|) #> med ~ #> iv -6.373 0.985 -6.473 0.000 #> mod -3.899 0.614 -6.346 0.000 #> iv:mod 0.286 0.039 7.340 0.000 #> cov1 -0.093 0.070 -1.327 0.185 #> cov2 0.242 0.133 1.823 0.068 #> dv ~ #> med 0.092 0.011 8.098 0.000 #> iv 0.227 0.038 5.896 0.000 #> cov1 -0.006 0.013 -0.454 0.650 #> cov2 0.030 0.025 1.230 0.219 #> #> Variances: #> Estimate Std.Err z-value P(>|z|) #> .med 60.292 6.029 10.000 0.000 #> .dv 2.087 0.209 10.000 0.000
We can request the standardized solution
using lavaan::standardizedSolution()
:
standardizedSolution(fit, output = "text") #> #> Regressions: #> est.std Std.Err z-value P(>|z|) ci.lower ci.upper #> med ~ #> iv -1.855 0.259 -7.158 0.000 -2.363 -1.347 #> mod -1.956 0.280 -6.988 0.000 -2.504 -1.407 #> iv:mod 3.588 0.428 8.390 0.000 2.750 4.426 #> cov1 -0.077 0.058 -1.332 0.183 -0.189 0.036 #> cov2 0.105 0.057 1.836 0.066 -0.007 0.218 #> dv ~ #> med 0.459 0.052 8.845 0.000 0.357 0.560 #> iv 0.331 0.053 6.279 0.000 0.228 0.434 #> cov1 -0.024 0.054 -0.454 0.650 -0.130 0.081 #> cov2 0.066 0.054 1.233 0.218 -0.039 0.171 #> #> Variances: #> est.std Std.Err z-value P(>|z|) ci.lower ci.upper #> .med 0.656 0.050 13.243 0.000 0.559 0.753 #> .dv 0.569 0.050 11.353 0.000 0.471 0.667
However, for this model, there are several problems:
The product term, iv:mod
, is also
standardized. This is inappropriate.
One simple but underused solution is
to standardize the variables before
forming the product term [@friedrich_defense_1982].
The confidence intervals are formed using the delta-method, which has been found to be inferior to methods such as nonparametric percentile bootstrap confidence interval for the standardized solution [@falk_are_2018]. Although there are situations in which the delta-method confidence and the nonparametric percentile bootstrap confidences can be similar (e.g., sample size is large and the sample estimates are not extreme), it is still safe to at least try both methods and compare the results.
There are cases in which some variables
are measured by meaningful units and
do not need to be standardized. for
example, if cov1
is age measured by
year, then age is more
meaningful than "standardized age".
In path analysis, categorical variables are usually represented by dummy variables, each of them having only two possible values (0 or 1). It is not meaningful to standardize the dummy variables.
lav_betaselect()
The function lav_betaselect()
can be used
to solve this problem by:
standardizing variables before product terms are formed,
standardizing only variables for which standardization can facilitate interpretation, and
forming confidence intervals that take into account selected standardization.
We call the coefficients computed by this kind of standardization betas-select ($\beta{s}{Select}$, $\beta{Select}$ in singular form), to differentiate them from coefficients computed by standardizing all variables, including product terms.
Suppose we only need to
solve the first problem, with the product
term computed after iv
and mod
are standardized:
fit_beta <- lav_betaselect(fit)
fit_beta
This is the output if printed using the default options:
#> #> Selected Standardization: #> #> Standard Error: Nil #> #> Parameter Estimates Settings: #> #> Standard errors: Standard #> Information: Expected #> Information saturated (h1) model: Structured #> #> Regressions: #> BetaSelect #> med ~ #> iv -1.855 #> mod -1.956 #> iv:mod 0.400 #> cov1 -0.077 #> cov2 0.105 #> dv ~ #> med 0.459 #> iv 0.331 #> cov1 -0.024 #> cov2 0.066 #> #> Footnote: #> - Variable(s) standardized: cov1, cov2, dv, iv, med, mod #> - Call 'print()' and set 'standardized_only' to 'FALSE' to print both #> original estimates and betas-select. #> - Product terms (iv:mod) have variables standardized before computing #> them. The product term(s) is/are not standardized.
Compared to the solution with the product
term standardized, the coefficient of
iv:mod
changed substantially from
3.588 to
0.286. As shown by
@cheung_improving_2022, the coefficient
of standardized product term (iv:mod
)
can be substantially different from the
properly standardized product term
(the product of standardized iv
and
standardized mod
).
The footnote will also indicate variables that are standardized, and remarked that product terms are formed after standardization.
Suppose we want to address both the first and the second problems, with
the product term computed after iv
and mod
standardized, and
bootstrap confidence intervals used, that take into account the sampling variation of the standardizers (the standard deviations).
We can call lav_betaselect()
again, with additional arguments
set:
fit_beta <- lav_betaselect(fit, std_se = "bootstrap", bootstrap = 5000, iseed = 2345, parallel = "snow", ncpus = 20) #> Finding product terms in the model ... #> Finished finding product terms. #> #> Compute bootstrapping standardized solution:
These are the additional arguments:
std_se
: The method to compute the
standard errors as well as confidence
intervals. Set to "bootstrap"
for
nonparametric bootstrapping.
iseed
: The seed for the random number
generator used for bootstrapping. Set
this to an integer to make the results
reproducible.
parallel
: The method to be used for
parallel processing. It will be passed
to lavaan::bootstrapLavaan()
. Supported
values are "none"
, "snow"
, and
"multicore"
.
ncpus
: The number of CPU cores to
use if parallel
processing is not
"none"
. Default is
parallel::detectCores(logical = FALSE) - 1
,
or the number of physical cores
minus one.
This is the output if printed with default options:
fit_beta
#> #> Selected Standardization: #> #> Standard Error: Nonparametric bootstrap #> Bootstrap samples: 5000 #> Confidence Interval: Percentile #> Level of Confidence: 95.0% #> #> Parameter Estimates Settings: #> #> Standard errors: Standard #> Information: Expected #> Information saturated (h1) model: Structured #> #> Regressions: #> BetaSelect SE Z p-value Sig CI.Lo CI.Hi CI.Sig #> med ~ #> iv -1.855 0.248 -7.490 0.000 *** -2.307 -1.332 Sig. #> mod -1.956 0.281 -6.950 0.000 *** -2.453 -1.348 Sig. #> iv:mod 0.400 0.047 8.565 0.000 *** 0.298 0.481 Sig. #> cov1 -0.077 0.057 -1.353 0.185 -0.186 0.038 n.s. #> cov2 0.105 0.061 1.725 0.094 . -0.019 0.219 n.s. #> dv ~ #> med 0.459 0.052 8.828 0.000 *** 0.348 0.553 Sig. #> iv 0.331 0.051 6.480 0.000 *** 0.229 0.431 Sig. #> cov1 -0.024 0.058 -0.418 0.686 -0.137 0.090 n.s. #> cov2 0.066 0.058 1.139 0.259 -0.050 0.178 n.s. #> #> Footnote: #> - Variable(s) standardized: cov1, cov2, dv, iv, med, mod #> - Sig codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> - Standard errors, p-values, and confidence intervals are not computed #> for betas-select which are fixed in the standardized solution. #> - P-values for betas-select are asymmetric bootstrap p-value computed #> by the method of Asparouhov and Muthén (2021). #> - Call 'print()' and set 'standardized_only' to 'FALSE' to print both #> original estimates and betas-select. #> - Product terms (iv:mod) have variables standardized before computing #> them. The product term(s) is/are not standardized.
In this dataset, with 200 cases, the delta-method confidence intervals are close to the bootstrap confidence intervals, except obviously for the product term because the coefficient of the product term has substantially different values in the two solutions.
Suppose we want to address also the
the third issue, and standardize only
some of the variables. This can be
done using either to_standardize
or not_to_standardize
.
Use to_standardize
when
the number of variables to standardize
is much fewer than the number of variables
not to standardize.
Use not_to_standardize
when the number variables to standardize
is much more than the
the number of variables not to standardize.
For example, suppose we only
need to standardize dv
and
iv
, cov1
, and cov2
,
this is the call to do
this, setting
to_standardize
to c("iv", "dv", "cov1", "cov2")
:
fit_beta_select_1 <- lav_betaselect(fit, std_se = "bootstrap", to_standardize = c("iv", "dv", "cov1", "cov2"), bootstrap = 5000, iseed = 2345, parallel = "snow", ncpus = 20)
If we want to standardize all
variables except for dv
and mod
, we can use
this call, and set
not_to_standardize
to c("mod", "dv")
:
fit_beta_select_2 <- lav_betaselect(fit, std_se = "bootstrap", not_to_standardize = c("mod", "dv"), bootstrap = 5000, iseed = 2345, parallel = "snow", ncpus = 20)
The results of these calls are identical, and only those of the second version are printed:
fit_beta_select_2
#> Selected Standardization: #> #> Standard Error: Nonparametric bootstrap #> Bootstrap samples: 5000 #> Confidence Interval: Percentile #> Level of Confidence: 95.0% #> #> Parameter Estimates Settings: #> #> Standard errors: Standard #> Information: Expected #> Information saturated (h1) model: Structured #> #> Regressions: #> BetaSelect SE Z p-value Sig CI.Lo CI.Hi CI.Sig #> med ~ #> iv -1.855 0.248 -7.490 0.000 *** -2.307 -1.332 Sig. #> mod -0.407 0.059 -6.950 0.000 *** -0.510 -0.280 Sig. #> iv:mod 0.083 0.010 8.565 0.000 *** 0.062 0.100 Sig. #> cov1 -0.077 0.057 -1.353 0.185 -0.186 0.038 n.s. #> cov2 0.105 0.061 1.725 0.094 . -0.019 0.219 n.s. #> dv ~ #> med 0.878 0.116 7.567 0.000 *** 0.635 1.092 Sig. #> iv 0.634 0.100 6.337 0.000 *** 0.430 0.826 Sig. #> cov1 -0.047 0.112 -0.418 0.686 -0.265 0.168 n.s. #> cov2 0.126 0.111 1.137 0.259 -0.093 0.345 n.s. #> #> Footnote: #> - Variable(s) standardized: cov1, cov2, iv, med #> - Sig codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> - Standard errors, p-values, and confidence intervals are not computed #> for betas-select which are fixed in the standardized solution. #> - P-values for betas-select are asymmetric bootstrap p-value computed #> by the method of Asparouhov and Muthén (2021). #> - Call 'print()' and set 'standardized_only' to 'FALSE' to print both #> original estimates and betas-select. #> - Product terms (iv:mod) have variables standardized before computing #> them. The product term(s) is/are not standardized.
The footnotes show that, by
specifying that dv
and mod
are not
standardized, all the other four variables
are standardized: iv
, med
, cov1
, and cov2
.
Therefore, in this case, it is more
convenient to use not_to_standardize
.
When reporting betas-select, researchers need
to state which variables
are standardized and which are not.
This can be done in table notes,
or in a column of the parameter estimate
tables. The output can of lav_betaselect()
can be printed with show_Bs.by
set
to TRUE
to demonstrate the second
approach:
print(fit_beta_select_2, show_Bs.by = TRUE)
#> Regressions: #> BetaSelect SE Z p-value Sig CI.Lo CI.Hi CI.Sig Selected #> med ~ #> iv -1.855 0.248 -7.490 0.000 *** -2.307 -1.332 Sig. iv,med #> mod -0.407 0.059 -6.950 0.000 *** -0.510 -0.280 Sig. med #> iv:mod 0.083 0.010 8.565 0.000 *** 0.062 0.100 Sig. iv,med #> cov1 -0.077 0.057 -1.353 0.185 -0.186 0.038 n.s. cov1,med #> cov2 0.105 0.061 1.725 0.094 . -0.019 0.219 n.s. cov2,med #> dv ~ #> med 0.878 0.116 7.567 0.000 *** 0.635 1.092 Sig. med #> iv 0.634 0.100 6.337 0.000 *** 0.430 0.826 Sig. iv #> cov1 -0.047 0.112 -0.418 0.686 -0.265 0.168 n.s. cov1 #> cov2 0.126 0.111 1.137 0.259 -0.093 0.345 n.s. cov2 #> #> Footnote: #> - Variable(s) standardized: cov1, cov2, iv, med #> - Sig codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> - Standard errors, p-values, and confidence intervals are not computed #> for betas-select which are fixed in the standardized solution. #> - P-values for betas-select are asymmetric bootstrap p-value computed #> by the method of Asparouhov and Muthén (2021). #> - Call 'print()' and set 'standardized_only' to 'FALSE' to print both #> original estimates and betas-select. #> - The column 'Selected' lists variable(s) standardized when computing #> the standardized coefficient of a parameter. ('NA' for user-defined #> parameters because they are computed from other standardized #> parameters.) #> - Product terms (iv:mod) have variables standardized before computing #> them. The product term(s) is/are not standardized.
When calling lav_betaselect()
,
variables with only two values in
the dataset are assumed to be categorical
and will not be standardized by default.
This can be overriden by setting
skip_categorical_x
to FALSE
, though
not recommended.
In structural equation modeling, there
are situations in which standardizing
all variables is not appropriate, or
when standardization needs to be done
before forming product terms. We are
not aware of tools that can do appropriate
standardization and form confidence
intervals that takes into account the
selective standardization. By promoting
the use of betas-select using
lav_betaselect()
, we hope to make it
easier for researchers to do appropriate
Standardization in when reporting SEM
results.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.