library(learnr) library(umdpsyc) library(gradethis) library(ggplot2) knitr::opts_chunk$set(exercise.checker = gradethis::grade_learnr) knitr::opts_chunk$set(echo = FALSE) learnr::tutorial_options( exercise.timelimit = 60)
As with t-tests, R is very useful for running ANOVA with large datasets. Computing values such as sum of squares and finding the F-value can be quite cumbersome, but the aov
function in R makes all of that easy to do. It takes a bit more work to interpret and get other values such as eta squared, but all of that can be done in R still.
In this tutorial, we will look at the relationship between exam scores and two categorical variables: method of note-taking (paper or computer) and self-reported student anxiety (low, medium, high). For more information on studies conducted on this topic, see this paper on anxiety and exam scores and this paper on note-taking and exam scores.
As a reminder, here is what the structure of the exam_scores
data looks like.
data(exam_scores) head(exam_scores)
Suppose we are interested in seeing if there is a relationship between a student's reported anxiety score (low, medium, high) and their scores on an exam. Since the anxiety score has more than two levels, it wouldn't be appropriate to use a t-test in this case. Using ANOVA would allow us to see if there were any differences across the different levels of anxiety.
Just to get an idea of what the relationships look like before we run the ANOVA, let's take a look at some side-by-side boxplots.
qplot(x = factor(anxiety), y = exam1, data = exam_scores, geom = 'boxplot')
As before, we won't go through all of the exact steps in writing out the hypotheses here, so remember to keep in mind that running the code is only a portion of the statistical analysis.
aov
You can use the aov
function to run ANOVA in R. The first argument uses the formula notation, with the "outcome" variable going on the lefthand side of a tilde (~
) and the "predictor" going on the righthand side. Recall that we had to use the factor
function in order to make sure R knows to treat a variable as categorical. The second argument is used to specify the data frame that we are pulling the data from. Here, we are going to be using the exam_scores
data frame, and using the exam1
and anxiety
columns.
The aov
function outputs an aov
object, which represents the ANOVA being run with the data. In order to see the result of the analysis, we need to use summary
on this aov
object.
exam_anova <- aov(exam1 ~ factor(anxiety), data=exam_scores) summary(exam_anova)
Note that this is the same sort of syntax used for running a linear model using lm
, which is discussed in the regression
tutorial.
The output from the summary
function gives a lot of information about the ANOVA that was run. It essentially gives you everything you need to run the test, such as degrees of freedom and sum of squares. It also gives you the final F statistic under F value
and the p-value under Pr(>F)
. The asterisks next to the p-value correspond to different alpha levels, which you can interpret using the last line of output. That is, if there are three stars next to the p-value, then it means the p-value is between 0 and 0.001. This helps you see quickly whether the result is significant or not.
The output doesn't explicitly give you eta squared values, but you can actually compute it quite easily based on the provided output. Recall that the eta squared is found using
$$\eta^2 = \frac{SS_{effect}}{SS_{total}}$$ So, we can find this using some simple arithmetic.
2805/(2805+4958)
To do factorial ANOVA, we use the aov()
function with a formula to define the variables as we did before, except this time, we simply add additional variables. Here, we will include both note_mode
and anxiety
. We use a +
to include the variables here, which assumes that the variables are independent. Just like with using lm
to run linear models, you can use a *
to include an interaction term as well.
exam_anova2 <- aov(exam1 ~ factor(note_mode) + factor(anxiety), data=exam_scores) summary(exam_anova2)
The output should look similar to a one-way ANOVA, except with additional variables.
learnr::question("Fill in the blanks: The p-values for note taking and anxiety are ____ and ____, respectively.", answer("1.1e-07, 0.000855", correct = TRUE, message = "Good job!"), answer("1932.1, 441.6", correct = FALSE), answer("33.587, 7.678", correct = FALSE), answer("1, 2", correct = FALSE), allow_retry = TRUE, random_answer_order = TRUE )
How would you run an ANOVA to determine whether there is a relationship between method of note taking and the scores on exam 1? Run the model and output the summary.
grade_result( pass_if(~ identical(.result, summary(aov(exam1 ~ factor(note_mode), data=exam_scores))), "Good Job!"), fail_if(~ TRUE) )
What is the eta squared value for note_mode
in the ANOVA you ran in Exercise 1?
grade_result( pass_if(~ identical(.result, 1932/(1932+5830)), "Good Job!"), fail_if(~ TRUE) )
Note that note_mode
has two categories. What if you had decided to simply run a two-sided independent samples t-test for the difference in mean exam score between the paper note-taking group and the computer note-taking group? Complete the code and run the t-test below, which includes an argument var.equal = TRUE
to include an equal variances assumptions (recall that ANOVA has an equal variances assumption).
# Uncomment the below code by removing the "#" and complete it. # t.test( ~ , data = exam_scores, var.equal = TRUE)
learnr::question("Looking at the p-values, how do they compare?", answer("The t-test p-value is lower.", correct=FALSE), answer("They are the same up to a rounding error.", correct = TRUE, message = "Good job!"), answer("The t-test p-value is higher.", correct = FALSE), allow_retry = TRUE, random_answer_order = TRUE )
submission_code <- function(UID){ httr::sha1_hash('anova31',as.character(UID)) }
Generate your submission code by putting in your UID in the function below. For example, if your UID is 2
, then your code should look like submission_code(UID = 2)
# Replace the number below with your UID submission_code(2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.