power4test | R Documentation |
An all-in-one function that receives a model specification, generates datasets, fits a model, does the target test, and returns the test results.
power4test(
object = NULL,
nrep = NULL,
ptable = NULL,
model = NULL,
pop_es = NULL,
standardized = TRUE,
n = NULL,
number_of_indicators = NULL,
reliability = NULL,
x_fun = list(),
e_fun = list(),
process_data = NULL,
fit_model_args = list(),
R = NULL,
ci_type = "mc",
gen_mc_args = list(),
gen_boot_args = list(),
test_fun = NULL,
test_args = list(),
map_names = c(fit = "fit"),
results_fun = NULL,
results_args = list(),
test_name = NULL,
test_note = NULL,
do_the_test = TRUE,
sim_all = NULL,
iseed = NULL,
parallel = FALSE,
progress = TRUE,
ncores = max(1, parallel::detectCores(logical = FALSE) - 1),
es1 = c(n = 0, nil = 0, s = 0.1, m = 0.3, l = 0.5, si = 0.141, mi = 0.361, li = 0.51),
es2 = c(n = 0, nil = 0, s = 0.05, m = 0.1, l = 0.15),
es_ind = c("si", "mi", "li"),
n_std = 1e+05,
std_force_monte_carlo = FALSE
)
## S3 method for class 'power4test'
print(
x,
what = c("data", "test"),
digits = 3,
digits_descriptive = 2,
data_long = FALSE,
test_long = FALSE,
fit_to_all_args = list(),
...
)
object |
Optional. If set to a
|
nrep |
The number of replications
to generate the simulated datasets.
Default is |
ptable |
The output of
|
model |
The |
pop_es |
The character vector or
multiline string to
specify population effect sizes
(population values of parameters). See
the help page on how to specify this
argument.
Ignored if |
standardized |
Logical. If
|
n |
The sample size for each dataset. Default is 100. |
number_of_indicators |
A named
vector to specify the number of
indicators for each factors. See
the help page on how to set this
argument. Default is |
reliability |
A named vector
(for a single-group model) or a
named list of named vectors
(for a multigroup model)
to set the reliability coefficient
of each set of indicators. Default
is |
x_fun |
The function(s) used to
generate the exogenous variables or
error terms. If
not supplied, or set to |
e_fun |
The function(s) used to
generate the error terms of indicators,
if any. If
not supplied, or set to |
process_data |
If not |
fit_model_args |
A list of the
arguments to be passed to |
R |
The number of replications
to generate the Monte Carlo or
bootstrapping estimates
for each fit output. No Monte Carlo
nor bootstrapping
estimates will be generated if |
ci_type |
The type of
simulation-based confidence
intervals to use. Can be either
|
gen_mc_args |
A list of
arguments to be passed to
|
gen_boot_args |
A list of
arguments to be passed to
|
test_fun |
A function to do the
test. See 'Details' for the requirement
of this function. There are some
built-in test functions in |
test_args |
A list of arguments
to be passed to the |
map_names |
A named character
vector specifying how the content of
the element |
results_fun |
The function to be
used to extract the test results.
See |
results_args |
A list of
arguments to be passed to the
|
test_name |
String. The name
of the test. Default is |
test_note |
String. An optional
note for the test, stored in the
attribute |
do_the_test |
If |
sim_all |
If set to either a
|
iseed |
The seed for the random
number generator. Default is |
parallel |
If |
progress |
If |
ncores |
The number of CPU cores to use if parallel processing is used. |
es1 |
Set the
values for each label of the effect
size (population value) for correlations and regression
paths.
Used only if |
es2 |
Set the
values for each label of the effect
size (population value) for product term.
Used only if |
es_ind |
The names of labels denoting the effect size of an indirect effect. They will be used to determine the population values of the component paths along an indirect path. |
n_std |
The sample size used to
determine the error variances by
simulation when |
std_force_monte_carlo |
Logical.
If |
x |
The object to be printed. |
what |
A string vector of
what to print, |
digits |
The numbers of digits displayed after the decimal. |
digits_descriptive |
The number of digits displayed after the decimal for the descriptive statistics table. |
data_long |
If |
test_long |
If |
fit_to_all_args |
A named list
of arguments to be passed to
|
... |
Optional arguments to be passed to other print methods |
The function power4test()
is an
all-in-one function for
estimating the power of a test for
a model, given the sample size
and effect sizes (population values
of model parameters).
An object of the class power4test
,
which is a list with two elements:
sim_all
: The output of sim_out()
.
test_all
: A named list of the
output of do_test()
. The names
are the values of test_name
.
This list can have more than one
test because a call to
power4test()
can add new tests
to a power4test
object.
The print
method of power4test
returns x
invisibly. Called for
its side effect.
This is the workflow:
If object
is an output
of the output
of a previous call to
power4test()
with do_the_test
set to FALSE
and so has only the
model and the simulated data,
the following steps
will be skipped and go directly to
doing the test.
Call sim_data()
to determine
the population model and
generate the datasets, using
arguments such as model
and
pop_es
.
Call fit_model()
to fit a
model to each of the datasets,
which is the population model
by default.
If R
is not NULL
and
ci_type = "mc"
, call
gen_mc()
to generate Monte
Carlo estimates using
manymome::do_mc()
. The estimates
can be used by supported functions
such as test_indirect_effect()
.
If R
is not NULL
and
ci_type = "boot"
, call
gen_boot()
to generate
bootstrap estimates using
manymome::do_boot()
. The estimates
can be used by supported functions
such as test_indirect_effect()
.
Merge the results into a
sim_out
object by calling
sim_out()
.
If do_the_test
is FALSE
,
skip the remaining steps and
return a power4test
object,
which contains only the data
generated and optionally the
Monte Carlo or bootstrap
estimates.
If do_the_test
is TRUE
, do
the test.
do_test()
will be called to do
the test in the fit output of
each dataset.
Return a power4test
object which
include the output of sim_out
and, if do_the_test
is TRUE
,
the output of do_test()
.
This function is to be used when users are interested only in the power of one or several tests on a particular aspect of the model, such as a parameter, given a specific effect sizes and sample sizes.
Detailed description on major arguments can be found in sections below.
NOTE: The technical internal workflow of
of power4test()
can be found in
this page: https://sfcheung.github.io/power4mome/articles/power4test_workflow.html.
The function power4test()
can also be used to
update a condition when only some
selected aspects is to be changed.
For example,
instead of calling this function with
all the arguments set just to change
the sample size, it can be called
by supplying an existing
power4test
object and set only
n
to a new sample size. The data
and the tests will be updated
automatically. See the examples for
an illustration.
The function power4test()
can also be used to
add a test to the output from a
previous call to power4test()
.
For example, after simulating the
datasets and doing one test,
the output can be set to object
of power4test()
, and set only
test_fun
and, optionally,
test_fun_args
to do one more test
on the generated datasets. The output
will be the original
object with the results of the new
test added. See the examples for
an illustration.
For power analysis, usually, the
population model (model
) is to be
fitted, and there is no need to
set fit_model_args
.
If power analysis is to be conducted
for fitting a model that is not the
population model, of if non-default
settings are desired when fitting
a model, then the argument fit_model_args
needed to be set to customize the
call to fit_model()
.
For example,
users may want to examine the power
of a test when a misspecified model
is fitted, or the power of a test
when MLR is used as the estimator
when calling lavaan::sem()
.
See the help page of fit_model()
for some examples.
For a single-group model, model
should be a lavaan
model syntax
string of the form of the model.
The population values of the model
parameters are to be determined by
pop_es
.
If the model has latent factors,
the syntax in model
should specify
only the structural model for the
latent factors. There is no
need to specify the measurement
part. Other functions will generate
the measurement part on top of this
model.
For example, this is a simple mediation model:
"m ~ x y ~ m + x"
Whether m
, x
, and y
denote
observed variables or latent factors
are determined by other functions,
such as power4test()
.
Because the model is the population
model, equality constraints are
irrelevant and the model syntax
specifies only the form of the
model. Therefore, model
is
specified as in the case of single
group models.
The argument pop_es
is for specifying
the population values of model
parameters. This section describes
how to do this using named vectors.
If pop_es
is specified by a named
vector, it must follow the convention
below.
The names of the vectors are
lavaan
names for the selected
parameters. For example, m ~ x
denotes the path from x
to m
.
Alternatively, the names can be
either ".beta."
or ".cov."
.
Use ".beta."
to set the default
values for all regression coefficients.
Use ".cov."
to set the default
values for all correlations of
exogenous variables (e.g., predictors).
The names can also be of this form:
".ind.(<path>)"
, whether <path>
denote path in the model. For
example, ".ind.(x->m->y)"
denotes
the path from x
through m
to
y
. Alternatively, the lavaan
symbol ~
can also be used:
".ind.(y~m~x)"
. This form is used
to set the indirect effect (standardized,
by default) along this path. The
value for this name will override
other settings.
If using lavaan
names, can
specify more than one parameter
using +
. For example, y ~ m + x
denotes the two paths from m
and
x
to y
.
The value of each element can be
the label for the effect size: n
for nil, s
for small, m
for
medium, and l
for large. The
value for each label is determined
by es1
and es2
. See the section
on specifying these two arguments.
The value of pop_es
can also be
set to a value, but it must be
quoted as a string, such as "y ~ x" = ".31"
.
This is an example:
c(".beta." = "s", "m1 ~ x" = "-m", "m2 ~ m1" = "l", "y ~ x:w" = "s")
In this example,
All regression coefficients are
set to small (s
) by default, unless
specified otherwise.
The path from x
to m1
is
set to medium and negative (-m
).
The path from m1
to m2
is set
to large (l
).
The coefficient of the product
term x:w
when predicting y
is
set to small (s
).
When setting an indirect effect to
a symbol (default: "si"
, "mi"
,
"li"
, with "i"
added to differentiate
them from the labels for a direct path),
the corresponding value is used to
determine the population values of
all component paths along the pathway.
All the values are assumed to be equal.
Therefore, ".ind.(x->m->y)" = ".20"
is equivalent to setting m ~ x
and y ~ m
to the square root of
.20, such that the corresponding
indirect effect is equal to the
designated value.
This behavior, though restricted, is for quick manipulation of the indirect effect. If different values along a pathway, set the value for each path directly.
Only nonnegative value is supported.
Therefore, ".ind.(x->m->y)" = "-si"
and ".ind.(x->m->y)" = "-.20"
will
throw an error.
The argument pop_es
also supports multigroup
models.
For pop_es
, instead of
named vectors, named list of
named vectors should be used.
The names are the parameters, or
keywords such as .beta.
and
.cov.
, like specifying the
population values for a single
group model.
The elements are character vectors. If it has only one element (e.g., a single string), then it is the the population value for all groups. If it has more than one element (e.g., a vector of three strings), then they are the population values of the groups. For a model of k groups, each vector must have either k elements or one element.
This is an example:
list("m ~ x" = "m", "y ~ m" = c("s", "m", "l"))
In this model, the population value
of the path m ~ x
is medium (m
) for
all groups, while the population
values for the path y ~ m
are
small (s
), medium (m
), and large (l
),
respectively.
When setting the argument pop_es
,
instead of using a named vector
or named list for
pop_es
, the population values of
model parameters can also be
specified using a multiline string,
as illustrated below, to be parsed
by pop_es_yaml()
.
This is an example of the multiline string for a single-group model:
y ~ m: l m ~ x: m y ~ x: nil
The string must follow this format:
Each line starts with tag:
.
tag
can be the name of a
parameter, in lavaan
model
syntax format.
For example, m ~ x
denotes the path from x
to m
.
A tag in lavaan
model syntax can
specify more than one parameter
using +
.
For example, y ~ m + x
denotes the two paths from m
and
x
to y
.
Alternatively, the tag
can be
either .beta.
or .cov.
.
Use .beta.
to set the default
values for all regression coefficients.
Use .cov.
to set the default
values for all correlations of
exogenous variables (e.g., predictors).
After each tag is the value of the population value:
-nil
for nil (zero),
s
for small,
m
for medium, and
l
for large.
si
, mi
, and li
for
small, medium, and large a
standardized indirect effect,
respectively.
Note: n
cannot be used in this mode.
The
value for each label is determined
by es1
and es2
as described
in ptable_pop()
.
The value can also be
set to a numeric value, such as
.30
or -.30
.
This is another example:
.beta.: s y ~ m: l
In this example, all regression
coefficients are small
, while
the path from m
to y
is large.
This is an example of the string for a multigroup model:
y ~ m: l m ~ x: - nil - s y ~ x: nil
The format is similar to that for
a single-group model. If a parameter
has the same value for all groups,
then the line can be specified
as in the case of a single-group
model: tag: value
.
If a parameter has different values across groups, then it must be in this format:
A line starts with the tag, followed
by two or more lines. Each line
starts with a hyphen -
and the
value for a group.
For example:
m ~ x: - nil - s
This denotes that the model has
two groups. The values of the path
from x
to m
for the two
groups are 0 (nil
) and
small (s
), respectively.
Another equivalent way to specify
the values are using []
, on
the same line of a tag.
For example:
m ~ x: [nil, s]
The number of groups is inferred from the number of values for a parameter. Therefore, if a tag has more than one value, each tag must has the same number of value, or only one value.
The tag .beta.
and .cov.
can
also be used for multigroup models.
Note that using named vectors or named lists is more reliable. However, using a multiline string is more user-friendly. If this method failed, please use named vectors or named list instead.
The multiline string is parsed by yaml::read_yaml()
.
Therefore, the format requirement
is actually that of YAML. Users
knowledgeable of YAML can use other
equivalent way to specify the string.
The vector es1
is for correlations,
regression coefficients, and
indirect effect, and the
vector es2
is for for standardized
moderation effect, the coefficients
of a product term. These labels
are to be used in interpreting
the specification in pop_es
.
The arguments number_of_indicators
and reliability
are used to
specify the number of indicators
(e.g., items) for each factor,
and the population reliability
coefficient of each factor,
if the variables in the model
syntax are latent variables.
If a variable in the model is to be
replaced by indicators in the generated
data, set
number_of_indicators
to a named
numeric vector. The names are the
variables of variables with
indicators, as appeared in the
model
syntax. The value of each
name is the number of indicators.
The
argument reliability
should then be
set a named numeric vector (or list,
see the section on multigroup models)
to specify the population reliability
coefficient ("omega") of each set of
indicators. The population standardized factor
loadings are then computed to ensure
that the population reliability
coefficient is of the target value.
These are examples for a single group model:
number of indicator = c(m = 3, x = 4, y = 5)
The numbers of indicators for m
,
x
, and y
are 3, 4, and 5,
respectively.
reliability = c(m = .90, x = .80, y = .70)
The population reliability
coefficients of m
, x
, and y
are
.90, .80, and .70, respectively.
Multigroup models are supported.
The number of groups is inferred
from pop_es
(see the help page
of ptable_pop()
), or directly
from ptable
.
For a multigroup model, the number of indicators for each variable must be the same across groups.
However, the population reliability
coefficients can be different
across groups. For a multigroup model
of k groups,
with one or more population reliability
coefficients differ across groups,
the argument reliability
should be
set to a named list. The names are
the variables to which the population
reliability coefficients are to be
set. The element for each name is
either a single value for the common
reliability coefficient, or a
numeric vector of the reliability
coefficient of each group.
This is an example of reliability
for a model with 2 groups:
reliability = list(x = .80, m = c(.70, .80))
The reliability coefficients of x
are
.80 in all groups, while the
reliability coefficients of m
are
.70 in one group and .80 in another.
If all variables in the model has
the same number of indicators,
number_of_indicators
can be set
to one single value.
Similarly, if all sets of indicators
have the same population reliability
in all groups, reliability
can also
be set to one single value.
By default, variables and error
terms are generated
from a multivariate normal distribution.
If desired, users can supply the
function used to generate an exogenous
variable and error term by setting x_fun
to
a named list.
The names of the list are the variables for which a user function will be used to generate the data.
Each element of the list must also be a list. The first element of this list, can be unnamed, is the function to be used. If other arguments need to be supplied, they should be included as named elements of this list.
For example:
x_fun = list(x = list(power4mome::rexp_rs), w = list(power4mome::rbinary_rs, p1 = .70)))
The variables x
and w
will be
generated by user-supplied functions.
For x
, the function is
power4mome::rexp_rs
. No additional
argument when calling this function.
For w
, the function is
power4mome::rbinary_rx
. The argument
p1 = .70
will be passed to this
function when generating the values
of w
.
If a variable is an endogenous
variable (e.g., being predicted by
another variable in a model), then
x_fun
is used to generate its
error term. Its implied population
distribution may still be different
from that generate by x_fun
because
the distribution also depends on the
distribution of other variables
predicting it.
These are requirements for the user-functions:
They must return a numeric vector.
They mush has an argument n
for
the number of values.
The population mean and standard deviation of the generated values must be 0 and 1, respectively.
The package power4mome
has
helper functions for generating
values from some common nonnormal
distributions and then scaling them
to have population mean and standard
deviation equal to 0 and 1 (by default), respectively.
These are some of them:
rbinary_rs()
.
rexp_rs()
.
rbeta_rs()
.
rlnorm_rs()
.
rpgnorm_rs()
.
To use x_fun
, the variables must
have zero covariances with other
variables in the population. It is
possible to generate nonnormal
multivariate data but we believe this
is rarely needed when estimating
power before having the data.
The function set to test_fun
,
the test function, usually
should work
on the output of lavaan::sem()
,
lmhelprs::many_lm()
, or
stats::lm()
, but can also be a
function that works on the output
of the function set to fit_function
when calling fit_model()
or
power4test()
(see fit_model_args
).
The function has two default requirements.
First, it has an argument fit
, to
be set to the output of
lavaan::sem()
or another output
stored in the element extra$fit
of
a replication in the sim_all
object. (This requirement can be
relaxed; see the section on map_names
.)
That is, the function definition
should be of this from: FUN(fit, ...)
. This is the form of all
test_*
functions in power4mome
.
If other arguments are to be passed
to the test function, they can be set
to test_args
as a named list.
Second, the test function must returns an output that (a) can be processed by the results function (see below), or (b) is of the required format for the output of a results function (see the next section). If it already returns an output of the required format, then there is no need to set the results function.
The test results will be extracted
from the output of test_fun
by the
function set to results_fun
,
the results function. If
the test_fun
already returns an
output of the expected format
(see below), then set results_fun
to NULL
, the default. The output
of test_fun
will be used for
estimating power.
The function set to results_fun
must accept the output of test_fun
,
as the first argument, and return a
named list (which can be a data frame)
or a named vector with some
of the following
elements:
est
: Optional. The estimate of a
parameter, if applicable.
se
: Optional. The standard error
of the estimate, if applicable.
cilo
: Optional. The lower limit of the
confidence interval, if applicable.
cihi
: Optional. The upper limit of the
confidence interval, if applicable.
sig
: Required. If 1
, the test is
significant. If 0
, the test is not
significant. If the test cannot be
done for any reason, it should be
NA
.
The results can then be used to estimate the power or Type I error of the test.
For example, if
the null hypothesis is false, then
the proportion of significant, that
is, the mean of the values of sig
across replications, is the power.
The package power4mome
has some ready-to-use
test functions:
test_indirect_effect()
test_cond_indirect()
test_cond_indirect_effects()
test_moderation()
test_index_of_mome()
test_parameters()
Please refer to their help pages for examples.
This argument is for developers using
a test function that has a different
name for the argument of the fit
object ("fit"
, the default).
If test_fun
is set to a function
that works on an output of, say,
lavaan::sem()
but the argument name
for the output is not fit
, the
mapping can be changed by
map_names
.
For example,
lavaan::parameterEstimates()
receives an output of lavaan::sem()
and reports the test results of model
parameters. However, the argument
name for the lavaan
output is
object.
To instruct do_test()
to
do the test correctly when setting
test_fun
to
lavaan::parameterEstimates
, add
map_names = c(object = "fit")
. The
element fit
in a replication will
then set to the argument object
of
lavaan::parameterEstimates()
.
# Specify the model
model_simple_med <-
"
m ~ x
y ~ m + x
"
# Specify the population values
model_simple_med_es <-
"
m ~ x: m
y ~ m: l
y ~ x: n
"
# Set nrep to a large number in real analysis, such as 400
# Set `parallel` to TRUE for faster, usually much faster, analysis
# Set `progress` to TRUE to display the progress of the analysis
out <- power4test(nrep = 10,
model = model_simple_med,
pop_es = model_simple_med_es,
n = 100,
test_fun = test_parameters,
test_args = list(pars = "m~x"),
iseed = 1234,
parallel = FALSE,
progress = TRUE)
print(out,
test_long = TRUE)
# Change the sample size
out1 <- power4test(out,
n = 200,
iseed = 1234,
parallel = FALSE,
progress = TRUE)
print(out1,
test_long = TRUE)
# Add one more test
out2 <- power4test(out,
test_fun = test_parameters,
test_args = list(pars = "y~x"),
parallel = FALSE,
progress = TRUE)
print(out2,
test_long = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.