Case Study 2.1: Exam vs Test

# Do not delete this!
# It loads the s20x library for you. If you delete it 
# your document may not compile it.
require(s20x)

knitr::opts_chunk$set(
  dev = "png",
  fig.ext = "png",
  dpi = 96
)

Problem

We wish to quantify the relationship between test mark and exam mark, especially for the purpose of being able to predict a student's exam mark with their test mark (to aid in making decisions about aegrotat passes for students who do not sit the exam). In particular, we want to predict a student's exam mark when their test mark is either 0, 10, or 20.

The variables of interest are:

Question of Interest

We want to build a model to predict exam marks with test marks. In particular, we want to predict a student's exam mark when their test mark is either 0, 10, or 20.

load(system.file("extdata", "Stats20x.df.rda", package = "s20x"))

Read in and Inspect the Data

Stats20x.df = read.table("STATS20x.txt", header = T)
plot(Exam ~ Test, data = Stats20x.df)
plot(Exam ~ Test, data = Stats20x.df)

The plot reveals a positive linear relationship between exam marks and test marks.

Model Building and Check Assumptions

examTest.fit = lm(Exam ~ Test, data = Stats20x.df)
plot(examTest.fit, which = 1)
normcheck(examTest.fit)
cooks20x(examTest.fit)
summary(examTest.fit)
confint(examTest.fit)
cf = as.data.frame(confint(examTest.fit))
resultConf = paste0(sprintf("%.1f", cf[2,1]), " to ", sprintf("%.1f", cf[2,2]))

Prediction Output

predTest.df = data.frame(Test = c(0, 10, 20))
predict(examTest.fit, predTest.df, interval = "prediction")
p = as.data.frame(predict(examTest.fit, predTest.df, interval = "prediction"))
resultStr = paste0(sprintf("%.1f", p$lwr), " to ", sprintf("%.1f", p$upr))

Method and Assumption Checks

A scatter plot of exam marks vs test marks showed a linear association with approximately constant scatter and so a linear model was fitted.

All model assumptions appear to be satisfied - a slight trend in the residual plot was observed but does not seem to be of major concern.

Our final model is $$Exam_i=\beta_0 +\beta_1\times Test_i+\epsilon_i,$$ where $\epsilon_i \sim iid~N(0,\sigma^2)$.

Our model explained a modest 59% of the variability in the students' final exam marks.

Executive Summary

We were interested in building a model to predict exam mark from test mark.

There was a significant linear relationship between test mark and exam mark (P-value $\approx$ 0). We estimate that each additional test mark (out of 20) obtained by the student would increase their exam mark by between r resultConf[1] (out of 100) on average.

For test marks of 0, 10 and 20, we predict exam marks (for individual students) between r resultStr[1], r resultStr[2], and r resultStr[3], respectively. These intervals are very wide\footnote{Due to considerable variabilty remaining even after taking into account the test mark.} and some of these intervals have bounds that are outside of the feasible values of exam mark (0-100). The model is not reliable for prediction.



Try the s20x package in your browser

Any scripts or data that you put into this service are public.

s20x documentation built on Jan. 14, 2026, 9:07 a.m.