knitr::opts_chunk$set(echo = TRUE, include = TRUE, message = FALSE, warning = FALSE) # Loading required packages if (!require("pacman")) install.packages("pacman") pacman::p_load(tidyverse, ggpubr, rstatix, palmerpenguins) pacman::p_load_gh("A-Farina/pl462")
Today we will discuss the Analysis of Variance (ANOVA) in its simplest form.
What is a One-Way ANOVA?
What are the statistical assumptions of an ANOVA?
What is a One-Way Repeated Measures ANOVA?
1. Load the Data 2. Understand the data (Distribution, Variability, Outliers) * Summary Statistics * Data Visualization * Outliers 3. Check Assumptions * Normality of Residuals or DV by group * Homogeneity (Equality) of Variance or Sphericity * Independent Samples 4. Conduct the Statistical Test * Run ANOVA 5. Conduct Post-Hoc Analyses as needed * Pairwise *t*-tests * One-Way ANOVAs
# Loading the Data penguins <- palmerpenguins::penguins # Loads the dataset called 'penguins' from the palmerpengins package and assigns it the name "penguins" ?penguins # Loads the help menu giving some descriptive information for the dataset glimpse(penguins) # Gives a quick view of the structure of the dataset.
Research Question: We are interested in determining if different species of penguins have different flipper lengths.
What is the DV?
What is/are the IV(s)? How many levels do they have?
penguins %>% group_by(species) %>% get_summary_stats(flipper_length_mm)
What are your observations given the descriptive statistics?
# Boxplot penguins %>% ggplot() + aes(x = species, y = flipper_length_mm) + geom_boxplot() + geom_jitter(alpha = 0.2, width = .1) # Continuous Density Plot penguins %>% ggplot() + aes(x = flipper_length_mm, fill = species) + geom_density() + facet_grid(species ~ .)
What are your observations (distribution, variability) given these data visualizations?
penguins %>% group_by(species) %>% rstatix::identify_outliers(flipper_length_mm) # Produces a table of outliers and extreme cases.
Are there any outliers in these data?
a. Analyzing the ANOVA model residuals. This approach is generally easier to do and helpful if there are many groups. This is the approach we will take.
# Assumption testing- Normality of Residuals- QQ Plot model <- lm(flipper_length_mm ~ species, data = penguins) ggqqplot(residuals(model)) # Graphically depicts the correlation of these data and a normal distribution. # Shapiro-Wilkes Test shapiro_test(residuals(model)) # Statistically determines normality of these data.
b. Checking Normality for each group separately. This might be helpful if you have only a few groups.
# Assumption testing- Normality DV by groups ggqqplot(penguins, "flipper_length_mm", facet.by = "species") # Graphically depicts the correlation of these data and a normal distribution. # Shapiro-Wilkes Test penguins %>% group_by(species) %>% shapiro_test(flipper_length_mm) # Statistically determines normality of these data.
Note: may be overly sensitive if data is more than 50. In that case, the qq plot is preferred.
Are these data normally distributed?
If the data are not normally distributed, a Kruskal-Wallis test is a non-parametric alternative to the One-Way ANOVA.
# Residuals vs Fitted Plot plot(model, 1) # Graphically compares the residuals to fitted values (mean of each group) # Assumption testing- Homogeneity of Variance penguins %>% levene_test(flipper_length_mm ~ species) # Statistically shows homogeneity of variance.
If the data do not have homogeneity of variance, a welch one way ANOVA test can be performed (welch_anova_test()
)
Do these data meet the assumption of homogeneity of variance?
Are these independent samples?
# One-Way ANOVA- Tests of Between-Subject with equal variance penguins %>% anova_test(flipper_length_mm ~ species)
Interpret the results
penguins %>% pairwise_t_test(flipper_length_mm ~ species)
Interpret the results
# Loading the Data grit <- pl462::grit # Loads the dataset called 'grit' from the pl462 package and assigns it the name "penguins" ?grit # Loads the help menu giving some descriptive information for the dataset glimpse(grit) # Gives a quick view of the structure of the dataset.
Research Question: We are interested in determining whether grit changes based on when the measure is taken (matriculation, freshmen year, sophmore year)
What is the DV?
What is/are the IV(s)? How many levels do they have?
grit %>% group_by(class) %>% get_summary_stats(grit)
What are your observations given the descriptive statistics?
# Boxplot grit %>% ggplot() + aes(x = class, y = grit) + geom_boxplot() + geom_jitter(alpha = 0.2, width = .1) # Continuous Density Plot grit %>% ggplot() + aes(x = grit, fill = class) + geom_density() + facet_grid(class ~ .)
What are your observations (distribution, variability) given these data visualizations?
grit %>% group_by(class) %>% rstatix::identify_outliers(grit) # Produces a table of outliers and extreme cases.
Are there any outliers in these data?
a. Analyzing the ANOVA model residuals. This approach is generally easier to do and helpful if there are many groups. This is the approach we will take.
# Assumption testing- Normality of Residuals- QQ Plot model <- lm(grit ~ class, data = grit) ggqqplot(residuals(model)) # Graphically depicts the correlation of these data and a normal distribution. # Shapiro-Wilkes Test shapiro_test(residuals(model)) # Statistically determines normality of these data.
b. Checking Normality for each group separately. This might be helpful if you have only a few groups.
# Assumption testing- Normality DV by groups ggqqplot(grit, "grit", facet.by = "class") # Graphically depicts the correlation of these data and a normal distribution. # Shapiro-Wilkes Test grit %>% group_by(class) %>% shapiro_test(grit) # Statistically determines normality of these data.
Note: may be overly sensitive if data is more than 50. In that case, the qq plot is preferred.
Are these data normally distributed?
If the data are not normally distributed, a Kruskal-Wallis test is a non-parametric alternative to the One-Way ANOVA (kruskal_test()
).
anova_test()
# One-Way Repeated Measures ANOVA- Tests of Within-Subject. grit %>% anova_test(wid = id, dv = grit, within = class)
Interpret the results
grit %>% pairwise_t_test(grit ~ class, paired = TRUE)
Interpret the results
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.