View source: R/test_performance.R
test_bf | R Documentation |
Testing whether models are "different" in terms of accuracy or explanatory power is a delicate and often complex procedure, with many limitations and prerequisites. Moreover, many tests exist, each coming with its own interpretation, and set of strengths and weaknesses.
The test_performance()
function runs the most relevant and appropriate
tests based on the type of input (for instance, whether the models are
nested or not). However, it still requires the user to understand what the
tests are and what they do in order to prevent their misinterpretation. See
the Details section for more information regarding the different tests
and their interpretation.
test_bf(...)
## Default S3 method:
test_bf(..., reference = 1, text_length = NULL)
test_likelihoodratio(..., estimator = "ML", verbose = TRUE)
test_lrt(..., estimator = "ML", verbose = TRUE)
test_performance(..., reference = 1, verbose = TRUE)
test_vuong(..., verbose = TRUE)
test_wald(..., verbose = TRUE)
... |
Multiple model objects. |
reference |
This only applies when models are non-nested, and determines which model should be taken as a reference, against which all the other models are tested. |
text_length |
Numeric, length (number of chars) of output lines.
|
estimator |
Applied when comparing regression models using
|
verbose |
Toggle warning and messages. |
Model's "nesting" is an important concept of models comparison. Indeed, many
tests only make sense when the models are "nested", i.e., when their
predictors are nested. This means that all the fixed effects predictors of
a model are contained within the fixed effects predictors of a larger model
(sometimes referred to as the encompassing model). For instance,
model1 (y ~ x1 + x2)
is "nested" within model2 (y ~ x1 + x2 + x3)
. Usually,
people have a list of nested models, for instance m1 (y ~ 1)
, m2 (y ~ x1)
,
m3 (y ~ x1 + x2)
, m4 (y ~ x1 + x2 + x3)
, and it is conventional
that they are "ordered" from the smallest to largest, but it is up to the
user to reverse the order from largest to smallest. The test then shows
whether a more parsimonious model, or whether adding a predictor, results in
a significant difference in the model's performance. In this case, models are
usually compared sequentially: m2 is tested against m1, m3 against m2,
m4 against m3, etc.
Two models are considered as "non-nested" if their predictors are
different. For instance, model1 (y ~ x1 + x2)
and model2 (y ~ x3 + x4)
.
In the case of non-nested models, all models are usually compared
against the same reference model (by default, the first of the list).
Nesting is detected via the insight::is_nested_models()
function.
Note that, apart from the nesting, in order for the tests to be valid,
other requirements have often to be the fulfilled. For instance, outcome
variables (the response) must be the same. You cannot meaningfully test
whether apples are significantly different from oranges!
The estimator is relevant when comparing regression models using
test_likelihoodratio()
. If estimator = "OLS"
, then it uses the same
method as anova(..., test = "LRT")
implemented in base R, i.e., scaling
by n-k (the unbiased OLS estimator) and using this estimator under the
alternative hypothesis. If estimator = "ML"
, which is for instance used
by lrtest(...)
in package lmtest, the scaling is done by n (the
biased ML estimator) and the estimator under the null hypothesis. In
moderately large samples, the differences should be negligible, but it
is possible that OLS would perform slightly better in small samples with
Gaussian errors. For estimator = "REML"
, the LRT is based on the REML-fit
log-likelihoods of the models. Note that not all types of estimators are
available for all model classes.
When estimator = "ML"
, which is the default for linear mixed models (unless
they share the same fixed effects), values from information criteria (AIC,
AICc) are based on the ML-estimator, while the default behaviour of AIC()
may be different (in particular for linear mixed models from lme4, which
sets REML = TRUE
). This default in test_likelihoodratio()
intentional,
because comparing information criteria based on REML fits requires the same
fixed effects for all models, which is often not the case. Thus, while
anova.merMod()
automatically refits all models to REML when performing a
LRT, test_likelihoodratio()
checks if a comparison based on REML fits is
indeed valid, and if so, uses REML as default (else, ML is the default).
Set the estimator
argument explicitely to override the default behaviour.
Bayes factor for Model Comparison - test_bf()
: If all
models were fit from the same data, the returned BF
shows the Bayes
Factor (see bayestestR::bayesfactor_models()
) for each model against
the reference model (which depends on whether the models are nested or
not). Check out
this vignette
for more details.
Wald's F-Test - test_wald()
: The Wald test is a rough
approximation of the Likelihood Ratio Test. However, it is more applicable
than the LRT: you can often run a Wald test in situations where no other
test can be run. Importantly, this test only makes statistical sense if the
models are nested.
Note: this test is also available in base R
through the anova()
function. It returns an F-value
column
as a statistic and its associated p-value.
Likelihood Ratio Test (LRT) - test_likelihoodratio()
:
The LRT tests which model is a better (more likely) explanation of the
data. Likelihood-Ratio-Test (LRT) gives usually somewhat close results (if
not equivalent) to the Wald test and, similarly, only makes sense for
nested models. However, maximum likelihood tests make stronger assumptions
than method of moments tests like the F-test, and in turn are more
efficient. Agresti (1990) suggests that you should use the LRT instead of
the Wald test for small sample sizes (under or about 30) or if the
parameters are large.
Note: for regression models, this is similar to
anova(..., test="LRT")
(on models) or lmtest::lrtest(...)
, depending
on the estimator
argument. For lavaan models (SEM, CFA), the function
calls lavaan::lavTestLRT()
.
For models with transformed response variables (like log(x)
or sqrt(x)
),
logLik()
returns a wrong log-likelihood. However, test_likelihoodratio()
calls insight::get_loglikelihood()
with check_response=TRUE
, which
returns a corrected log-likelihood value for models with transformed
response variables. Furthermore, since the LRT only accepts nested
models (i.e. models that differ in their fixed effects), the computed
log-likelihood is always based on the ML estimator, not on the REML fits.
Vuong's Test - test_vuong()
: Vuong's (1989) test can
be used both for nested and non-nested models, and actually consists of two
tests.
The Test of Distinguishability (the Omega2
column and
its associated p-value) indicates whether or not the models can possibly be
distinguished on the basis of the observed data. If its p-value is
significant, it means the models are distinguishable.
The Robust Likelihood Test (the LR
column and its
associated p-value) indicates whether each model fits better than the
reference model. If the models are nested, then the test works as a robust
LRT. The code for this function is adapted from the nonnest2
package, and all credit go to their authors.
A data frame containing the relevant indices.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307-333.
Merkle, E. C., You, D., & Preacher, K. (2016). Testing non-nested structural equation models. Psychological Methods, 21, 151-163.
compare_performance()
to compare the performance indices of
many different models.
# Nested Models
# -------------
m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Petal.Width + Species, data = iris)
m3 <- lm(Sepal.Length ~ Petal.Width * Species, data = iris)
test_performance(m1, m2, m3)
test_bf(m1, m2, m3)
test_wald(m1, m2, m3) # Equivalent to anova(m1, m2, m3)
# Equivalent to lmtest::lrtest(m1, m2, m3)
test_likelihoodratio(m1, m2, m3, estimator = "ML")
# Equivalent to anova(m1, m2, m3, test='LRT')
test_likelihoodratio(m1, m2, m3, estimator = "OLS")
if (require("CompQuadForm")) {
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2, nested=TRUE)
# Non-nested Models
# -----------------
m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Petal.Length, data = iris)
m3 <- lm(Sepal.Length ~ Species, data = iris)
test_performance(m1, m2, m3)
test_bf(m1, m2, m3)
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2)
}
# Tweak the output
# ----------------
test_performance(m1, m2, m3, include_formula = TRUE)
# SEM / CFA (lavaan objects)
# --------------------------
# Lavaan Models
if (require("lavaan")) {
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ textual + speed "
m1 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + speed "
m2 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + 0 * speed "
m3 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
test_likelihoodratio(m1, m2, m3)
# Different Model Types
# ---------------------
if (require("lme4") && require("mgcv")) {
m1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris)
m2 <- lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
m3 <- gam(Sepal.Length ~ s(Petal.Length, by = Species) + Species, data = iris)
test_performance(m1, m2, m3)
}
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.