In kimbrianj/umdpsyc: UMD PSYC R Learning Package

library(learnr)
library(umdpsyc)
library(gradethis)
library(ggplot2)
knitr::opts_chunk$set(exercise.checker = gradethis::grade_learnr)
knitr::opts_chunk$set(echo = FALSE)
learnr::tutorial_options(
  exercise.timelimit = 60)

ANOVA

As with t-tests, R is very useful for running ANOVA with large datasets. Computing values such as sum of squares and finding the F-value can be quite cumbersome, but the aov function in R makes all of that easy to do. It takes a bit more work to interpret and get other values such as eta squared, but all of that can be done in R still.

In this tutorial, we will look at the relationship between exam scores and two categorical variables: method of note-taking (paper or computer) and self-reported student anxiety (low, medium, high). For more information on studies conducted on this topic, see this paper on anxiety and exam scores and this paper on note-taking and exam scores.

As a reminder, here is what the structure of the exam_scores data looks like.

data(exam_scores)
head(exam_scores)

ANOVA in R

Suppose we are interested in seeing if there is a relationship between a student's reported anxiety score (low, medium, high) and their scores on an exam. Since the anxiety score has more than two levels, it wouldn't be appropriate to use a t-test in this case. Using ANOVA would allow us to see if there were any differences across the different levels of anxiety.

Just to get an idea of what the relationships look like before we run the ANOVA, let's take a look at some side-by-side boxplots.

qplot(x = factor(anxiety), y = exam1, data = exam_scores, geom = 'boxplot')

As before, we won't go through all of the exact steps in writing out the hypotheses here, so remember to keep in mind that running the code is only a portion of the statistical analysis.

Using `aov`

You can use the aov function to run ANOVA in R. The first argument uses the formula notation, with the "outcome" variable going on the lefthand side of a tilde (~) and the "predictor" going on the righthand side. Recall that we had to use the factor function in order to make sure R knows to treat a variable as categorical. The second argument is used to specify the data frame that we are pulling the data from. Here, we are going to be using the exam_scores data frame, and using the exam1 and anxiety columns.

The aov function outputs an aov object, which represents the ANOVA being run with the data. In order to see the result of the analysis, we need to use summary on this aov object.

exam_anova <- aov(exam1 ~ factor(anxiety), data=exam_scores)
summary(exam_anova)

Note that this is the same sort of syntax used for running a linear model using lm, which is discussed in the regression tutorial.

ANOVA output

The output from the summary function gives a lot of information about the ANOVA that was run. It essentially gives you everything you need to run the test, such as degrees of freedom and sum of squares. It also gives you the final F statistic under F value and the p-value under Pr(>F). The asterisks next to the p-value correspond to different alpha levels, which you can interpret using the last line of output. That is, if there are three stars next to the p-value, then it means the p-value is between 0 and 0.001. This helps you see quickly whether the result is significant or not.

Finding eta squared

The output doesn't explicitly give you eta squared values, but you can actually compute it quite easily based on the provided output. Recall that the eta squared is found using

$$\eta^2 = \frac{SS_{effect}}{SS_{total}}$$ So, we can find this using some simple arithmetic.

2805/(2805+4958)

Factorial ANOVA

To do factorial ANOVA, we use the aov() function with a formula to define the variables as we did before, except this time, we simply add additional variables. Here, we will include both note_mode and anxiety. We use a + to include the variables here, which assumes that the variables are independent. Just like with using lm to run linear models, you can use a * to include an interaction term as well.

exam_anova2 <- aov(exam1 ~ factor(note_mode) + factor(anxiety), data=exam_scores)
summary(exam_anova2)

The output should look similar to a one-way ANOVA, except with additional variables.

learnr::question("Fill in the blanks: The p-values for note taking and anxiety are ____ and ____, respectively.",
         answer("1.1e-07, 0.000855", correct = TRUE,  message = "Good job!"),
         answer("1932.1, 441.6",  correct = FALSE),
         answer("33.587, 7.678",  correct = FALSE),
         answer("1, 2",  correct = FALSE),
         allow_retry = TRUE,
         random_answer_order = TRUE
)

Exercises

Exercise 1

How would you run an ANOVA to determine whether there is a relationship between method of note taking and the scores on exam 1? Run the model and output the summary.

grade_result(
  pass_if(~ identical(.result, summary(aov(exam1 ~ factor(note_mode), data=exam_scores))), "Good Job!"),
  fail_if(~ TRUE)
)

Exercise 2

What is the eta squared value for note_mode in the ANOVA you ran in Exercise 1?

grade_result(
  pass_if(~ identical(.result, 1932/(1932+5830)), "Good Job!"),
  fail_if(~ TRUE)
)

Exercise 3

Note that note_mode has two categories. What if you had decided to simply run a two-sided independent samples t-test for the difference in mean exam score between the paper note-taking group and the computer note-taking group? Complete the code and run the t-test below, which includes an argument var.equal = TRUE to include an equal variances assumptions (recall that ANOVA has an equal variances assumption).

# Uncomment the below code by removing the "#" and complete it.
# t.test( ~ , data = exam_scores, var.equal = TRUE)

learnr::question("Looking at the p-values, how do they compare?",
         answer("The t-test p-value is lower.", correct=FALSE),
         answer("They are the same up to a rounding error.", correct = TRUE,  message = "Good job!"),
         answer("The t-test p-value is higher.",  correct = FALSE),
         allow_retry = TRUE,
         random_answer_order = TRUE
)

Submitting work

submission_code <- function(UID){
  httr::sha1_hash('anova31',as.character(UID))
}

Generate your submission code by putting in your UID in the function below. For example, if your UID is 2, then your code should look like submission_code(UID = 2)

# Replace the number below with your UID
submission_code(2)

kimbrianj/umdpsyc documentation built on Jan. 27, 2021, 7:36 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kimbrianj/umdpsyc
UMD PSYC R Learning Package

In kimbrianj/umdpsyc: UMD PSYC R Learning Package

ANOVA

ANOVA in R

Using `aov`

ANOVA output

Finding eta squared

Factorial ANOVA

Exercises

Exercise 1

Exercise 2

Exercise 3

Submitting work

R Package Documentation

Browse R Packages

We want your feedback!

kimbrianj/umdpsyc UMD PSYC R Learning Package

In kimbrianj/umdpsyc: UMD PSYC R Learning Package

ANOVA

ANOVA in R

Using aov

ANOVA output

Finding eta squared

Factorial ANOVA

Exercises

Exercise 1

Exercise 2

Exercise 3

Submitting work

R Package Documentation

Browse R Packages

We want your feedback!

kimbrianj/umdpsyc
UMD PSYC R Learning Package

Using `aov`