In npaterno/stat231: Course Materials for Stat 231

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(Stat231)
library(tidyverse)
library(openintro)

ANOVA

For this lab, we will be peaking at section 7.5 (without worry about all the formulas). The analysis of variance - anova - test is equivalent to the chi-squared test but for numeric data instead of categorical data; It will compare the means of three or more samples.

We will be going back to the diamonds data set and running an anova test to see if there is a difference in the average price of diamonds based on clarity. We will nest the test inside a summary() function to get the test and print out of results in one line.

summary(aov(price ~ clarity, data = diamonds))

The important parts of this print out are the $F$ value and $Pr(>F)$. The $F$ value is similar to a $z$ or $t$ score. The $Pr(>F)$ is a p-value.

Using the p-value method, $<2e-16$ means the p-value is less than $0.0000000000000002$. Using any of the typical values for alpha, we would reject our null hypothesis and can say there is a difference in the average price of diamonds based on their clarity rating.

Using the classic method, we need to calculate a critical F value.

qf(0.95, #1-alpha
   df1 = 7,
   df2 = 53932)

This means our critical region starts at approximately $F = 2$, and our test statistic is $F = 215$. We are well into the critical region and again conclude that there is a difference in the average price of diamonds based on their clarity.

npaterno/stat231 documentation built on May 1, 2023, 6:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com