knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette demonstrates how to perform equivalence testing for F-tests in ANOVA models using the TOSTER package. While traditional null hypothesis significance testing (NHST) helps determine if effects are different from zero, equivalence testing allows you to determine if effects are small enough to be considered practically equivalent to zero or meaningfully similar.
For an open access tutorial paper explaining how to set equivalence bounds, and how to perform and report equivalence testing for ANOVA models, see @Campbell_2021. These functions are designed for omnibus tests, and additional testing may be necessary for specific comparisons between groups or conditions^[Russ Lenth's emmeans R package has some capacity for equivalence testing on the marginal means (i.e., a form of pairwise testing). See the emmeans package vignettes for details].
Statistical equivalence testing (or "omnibus non-inferiority testing" as described by Campbell & Lakens, 2021) for F-tests is based on the cumulative distribution function of the non-central F distribution.
These tests answer the question: "Can we reject the hypothesis that the total proportion of variance in outcome Y attributable to X is greater than or equal to the equivalence bound $\Delta$?"
The null and alternative hypotheses for equivalence testing with F-tests are:
$$ H_0: \space \eta^2_p \geq \Delta \quad \text{(Effect is meaningfully large)} \ H_1: \space \eta^2_p < \Delta \quad \text{(Effect is practically equivalent to zero)} $$
Where $\eta^2_p$ is the partial eta-squared value (proportion of variance explained) and $\Delta$ is the equivalence bound.
Campbell & Lakens (2021) calculate the p-value for a one-way ANOVA as:
$$ p = p_F(F; J-1, N-J, \frac{N \cdot \Delta}{1-\Delta}) $$
In TOSTER, we use a more generalized approach that can be applied to a variety of designs, including factorial ANOVA. The non-centrality parameter (ncp = $\lambda$) is calculated with the equivalence bound and the degrees of freedom:
$$ \lambda_{eq} = \frac{\Delta}{1-\Delta} \cdot(df_1 + df_2 +1) $$
The p-value for the equivalence test ($p_{eq}$) can then be calculated from traditional ANOVA results and the distribution function:
$$ p_{eq} = p_F(F; df_1, df_2, \lambda_{eq}) $$
Where:
First, let's load the TOSTER package and examine our example dataset:
library(TOSTER) # Get Data data("InsectSprays") # Look at the data structure head(InsectSprays)
The InsectSprays
dataset contains counts of insects in agricultural experimental units treated with different insecticides. Let's first run a traditional ANOVA to examine the effect of spray type on insect counts.
# Build ANOVA aovtest = aov(count ~ spray, data = InsectSprays) # Display overall results knitr::kable(broom::tidy(aovtest), caption = "Traditional ANOVA Test")
From the initial analysis, we can see a clear statistically significant effect of the factor spray
(p-value < 0.001). The F-statistic is 34.7 with 5 and 66 degrees of freedom.
Now, let's perform an equivalence test using the equ_ftest()
function. This function requires the F-statistic, numerator and denominator degrees of freedom, and the equivalence bound.
For this example, we'll set the equivalence bound to a partial eta-squared of 0.35. This means we're testing the null hypothesis that $\eta^2_p \geq 0.35$ against the alternative that $\eta^2_p < 0.35$.
equ_ftest(Fstat = 34.70228, df1 = 5, df2 = 66, eqbound = 0.35)
Looking at the results:
Based on these results, we would conclude:
In essence, we reject the traditional null hypothesis of "no effect" but fail to reject the null hypothesis of the equivalence test. This could be taken as indication of a meaningfully large effect.
If you're doing all your analyses in R, you can use the equ_anova()
function, which accepts objects produced from stats::aov()
, car::Anova()
, and afex::aov_car()
(or any ANOVA from afex
).
equ_anova(aovtest, eqbound = 0.35)
The equ_anova()
function conveniently provides a data frame with results for all effects in the model, including the traditional p-value (p.null
), the estimated partial eta-squared (pes
), and the equivalence test p-value (p.equ
).
You can also perform minimal effect testing instead of equivalence testing by setting MET = TRUE
. This reverses the hypotheses:
$$ H_0: \space \eta^2_p \leq \Delta $$ - Effect is meaningfully large
$$ H_1: \space \eta^2_p > \Delta $$
Let's see how to use this option:
equ_anova(aovtest, eqbound = 0.35, MET = TRUE)
In this case, the minimal effect test is significant (p < 0.001), confirming that the effect is meaningfully larger than our bound of 0.35.
TOSTER provides a function to visualize the uncertainty around partial eta-squared estimates through consonance plots. The plot_pes()
function requires the F-statistic, numerator degrees of freedom, and denominator degrees of freedom.
plot_pes(Fstat = 34.70228, df1 = 5, df2 = 66)
The plots show:
Top plot (Confidence curve): The relationship between p-values and parameter values. The y-axis shows p-values, while the x-axis shows possible values of partial eta-squared. The horizontal lines represent different confidence levels.
Bottom plot (Consonance density): The distribution of plausible values for partial eta-squared. The peak of this distribution represents the most compatible value based on the observed data.
These visualizations help you understand the precision of your effect size estimate and the range of plausible values.
Let's create a second example with a smaller effect to demonstrate the other possible outcome:
# Simulate data with a small effect set.seed(123) groups <- factor(rep(1:3, each = 30)) y <- rnorm(90) + rep(c(0, 0.3, 0.3), each = 30) small_aov <- aov(y ~ groups) # Traditional ANOVA knitr::kable(broom::tidy(small_aov), caption = "Traditional ANOVA Test (Small Effect)") # Equivalence test equ_anova(small_aov, eqbound = 0.15) # Visualize plot_pes(Fstat = 2.36, df1 = 2, df2 = 87)
In this example: 1. The traditional ANOVA shows a marginally significant effect (p = 0.07) 2. The partial eta-squared (0.051) is smaller than our equivalence bound 3. The equivalence test is significant (p < 0.05), indicating the effect is practically equivalent (using our bound of 0.15)
This demonstrates how equivalence testing can help establish that effects are too small to be practically meaningful.
TOSTER includes a function to calculate power for equivalence F-tests. The power_eq_f()
function allows you to determine:
To calculate the sample size needed for 80% power with a specific effect size and equivalence bound:
power_eq_f(df1 = 2, # Numerator df (groups - 1) df2 = NULL, # Set to NULL to solve for sample size eqbound = 0.15, # Equivalence bound power = 0.8) # Desired power
This tells us we need approximately 60 total participants (yielding df2 = 57) to achieve 80% power for detecting equivalence with a bound of 0.15.
To calculate the power for a given sample size and equivalence bound:
power_eq_f(df1 = 2, # Numerator df (groups - 1) df2 = 60, # Error df (N - groups) eqbound = 0.15) # Equivalence bound
With 60 error degrees of freedom (about 63 total participants for 3 groups), we would have approximately 80% power to detect equivalence.
To find the smallest equivalence bound detectable with 80% power given a sample size:
power_eq_f(df1 = 2, # Numerator df (groups - 1) df2 = 60, # Error df (N - groups) power = 0.8) # Desired power
With 60 error degrees of freedom, we could detect equivalence at a bound of approximately 0.145 with 80% power.
Selecting an appropriate equivalence bound is crucial and should be based on:
Whatever your choice, it should be adjusted based on your specific research context and questions.
Equivalence testing for F-tests provides a valuable complement to traditional NHST by allowing researchers to establish evidence for the absence of meaningful effects. The TOSTER package offers user-friendly functions for:
equ_anova()
and equ_ftest()
)plot_pes()
)power_eq_f()
)By incorporating these tools into your analysis workflow, you can make more nuanced inferences about effect sizes and avoid the common pitfall of interpreting non-significant p-values as evidence for the absence of an effect.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.