Case Study 12.1: Exam vs Test & Attendance
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

## Do not delete this!
## It loads the s20x library for you. If you delete it 
## your document may not compile it.
require(s20x)

knitr::opts_chunk$set(
  dev = "png",
  fig.ext = "png",
  dpi = 96
)

Problem

We wish to determine if regular attendance in class and passing the test is associated with exam mark. We have seen individually that passing the test and regular attendance in class increases the chances for a good exam mark --- on average.

The variables of interest are:

Exam: Exam mark out of 100.
Attend: A two-level factor with the levels Yes and No.
Pass.test: A two-level factor with the levels Yes and No.

Question of Interest

We are interested in the relationship between final exam mark and passing the test, and whether this relationship differs for students who did and did not attend lectures regularly.

Read in and Inspect the Data

load(system.file("extdata", "Stats20x.df.rda", package = "s20x"))

Stats20x.df = read.table("STATS20x.txt", header = T)

head(Stats20x.df)
# So, we have to create the variable `Pass.test' using the `Test` variable
Stats20x.df$Pass.test = factor(ifelse(Stats20x.df$Test >= 10, "pass", "nopass"))
# Check if our new variable `Pass.test` was generated correctly
min(Stats20x.df$Test[Stats20x.df$Pass.test == "pass"])
max(Stats20x.df$Test[Stats20x.df$Pass.test == "nopass"])
interactionPlots(Exam ~ Pass.test + Attend, data = Stats20x.df)
# Also look at the interaction plot the other way around:
interactionPlots(Exam ~ Attend + Pass.test, data = Stats20x.df)

Exam marks for students who passed the test are centred higher than for students who did not. However, this difference appears to be greater for students who attended lectures than for non-attenders, which can be seen from the non-parallel coloured lines on the interaction plots. In other words, the impact of passing the test appears to differ according to attendance group, suggesting there may be an interaction between the two predictor variables.

Model Building and Check Assumptions

Exam.fit = lm(Exam ~ Attend * Pass.test, data = Stats20x.df)
# The model formula could of also been `Exam ~ Attend + Pass.test + Attend:Past.test'
plot(Exam.fit,1)
modelcheck(Exam.fit,2:3)
anova(Exam.fit)
summary(Exam.fit)

Paiwise comparisons

library(emmeans)
Exam.pairs = pairs(emmeans(Exam.fit, ~Attend*Pass.test), infer=T)
Exam.pairs
# Get a simpler display using displayPairs:
displayPairs(Exam.pairs, c("Yes", "No"), c("pass", "nopass"))

conf1 = as.data.frame(Exam.pairs)
resultStr1 = paste0(sprintf("%.0f", abs(conf1$upper.CL)), " and ", sprintf("%.0f", abs(conf1$lower.CL)))

Methods and Assumption Checks

We have two explanatory factors, Pass.test and Attend, and one numeric response Exam, so we fitted a two-way ANOVA model with interaction term. The interaction was significant (P-value = 0.04) so it was retained in the model.

The model assumptions seem satisfied.

Our final model is $$\text{Exam}_i = \beta_0 + \beta_1 \times Test.pass_i + \beta_2 \times Attend.Yes_i + \beta_3 \times Test.pass_i \times Attend.Yes_i + \epsilon_i,$$ where $Test.pass_i$ and $Attend.Yes_i$ are dummy variables that take the value 1 if student $i$ passed the test and if student $i$ regularly attended lectures respectively, otherwise they are 0; and $\epsilon_i \sim iid ~ N(0,\sigma^2)$.

Alternatively, our final model could be written as $$\text{Exam}{ijk} = \mu + \alpha_i +\beta_j + \gamma{ij} + \epsilon_{ijk},$$ where $\text{Exam}{ijk}$ is the mark of student $k$ in test-pass group $i$ and attendance group $j$; and where $\mu$ is the overall mean exam mark, $\alpha_i$ is the effect of whether the student passed the test, $\beta_j$ is the effect of whether the student regularly attended lectures, $\gamma{ij}$ is the interaction effect for the combination of a student passing the test and attendance, and $\epsilon_{ijk} \sim iid~N(0,\sigma^2)$. Here, $i \in {\text{pass}, \text{ nopass}}$ and $j \in {\text{Yes}, \text{ No}}$.

Our model explained 39% of variability in students' exam marks.

Executive Summary

We are interested in the relationship between final exam mark and passing the test, and whether this relationship differs for those students that attended lectures regularly and those that didn't.

We have evidence that the relationship between exam mark and test-pass status does depend upon attendance practice.

We estimate that:

Among students who attended regularly, the average exam mark for those who passed the test was between r resultStr1[5] marks higher than for those who didn't pass the test.
Among students who did not attend regularly, the average exam mark for those who passed the test was between r resultStr1[2] marks higher than for those who didn't pass the test.
Among students who passed the test, the average exam mark for those who attended regularly was between r resultStr1[6] marks higher than for those who didn't attend regularly.