Case Study 9.1: Language score vs teaching method and student IQ

## Do not delete this!
## It loads the s20x library for you. If you delete it 
## your document may not compile it.
require(s20x)

knitr::opts_chunk$set(
  dev = "png",
  fig.ext = "png",
  dpi = 96
)

Problem

Educational experts were interested in which of three different teaching methods was most effective in increasing a student-tested language score for children of a range of abilities---as measured by IQ. Moreover, they wanted to know if the relative effectiveness of the methods differed according to IQ.

An experiment was conducted whereby 30 students were randomly allocated into three groups and each group was taught using a different teaching method. This randomisation was done to ensure that a range of student abilities was represented in each group. As students were in a test environment we can assume that their test scores are independent of each other.

The variables of interest were:

Question of Interest

We wish to see if the language score achieved depended on the teaching method. We want to check for any confounding effect of IQ.

Read in and Inspect the Data

data(teach.df)
head(teach.df)
str(teach.df)
# We need to convert method into a factor variable
teach.df$method = factor(teach.df$method)
plot(lang ~ IQ, main = "Language Score versus IQ (by method)",
    pch = as.character(teach.df$method), data = teach.df)

Looking at the coded scatter plot, we can see three parallel lines. It appears that the score is increasing with IQ and that method 2 is scoring highest and method 3 is scoring lowest. The variability around these individual lines is much lower than the variability seen in the separate plots.

Model Building and Check Assumptions

teach.fit = lm(lang ~ IQ * method, data = teach.df)
plot(teach.fit, which = 1)
normcheck(teach.fit)
cooks20x(teach.fit)
anova(teach.fit)
teach.fit2 = lm(lang ~ IQ + method, data = teach.df)
plot(teach.fit2, which = 1)
normcheck(teach.fit2)
cooks20x(teach.fit2)
anova(teach.fit2)
summary(teach.fit2)
confint(teach.fit2)

Visualise the Final Model

plot(lang ~ IQ, main = "Language Score versus IQ (by method)",
    pch = as.character(teach.df$method), data = teach.df)
abline(teach.fit2$coef[1], teach.fit2$coef[2], lty = 1)
abline(teach.fit2$coef[1] + teach.fit2$coef[3], teach.fit2$coef[2], lty = 2)
abline(teach.fit2$coef[1] + teach.fit2$coef[4], teach.fit2$coef[2], lty = 4)

Generate Model Output for when method's baseline is "2"

teach.df$method = relevel(teach.df$method, ref = "2")
teach.fit3 = lm(lang ~ IQ + method, data = teach.df)
confint(teach.fit3)
conf1 = as.data.frame(abs(confint(teach.fit3)[c(3,4),]))
resultStr1 = paste0(sprintf("%.1f", conf1$`97.5 %`), " and ", sprintf("%.1f", conf1$`2.5 %`))

conf2 = as.data.frame(t(abs(confint(teach.fit2)[4,])))
resultStr2 = paste0(sprintf("%.1f", conf2$`97.5 %`), " and ", sprintf("%.1f", conf2$`2.5 %`))

conf3 = as.data.frame(t(abs(confint(teach.fit2)[2,]*10)))
resultStr3 = paste0(sprintf("%.1f", conf3$`2.5 %`), " and ", sprintf("%.1f", conf3$`97.5 %`))

Method and Assumption Checks

To explain language score, we first fitted the model with explanatory variables teaching method, IQ, and their interaction. But, the interaction term was not significant (P-value = 0.37). The model was refitted with the interaction term removed.

All model assumptions were satified. [Optional: The students should be acting independent of each other as they were randomly allocated to the method taught and they students are measured under test conditions.]

Our final model is $$lang_i = \beta_0 + \beta_1 \times IQ_i + \beta_2 \times method.method2_i + \beta_3 \times method.method3_i + \epsilon_i,$$ where:

Here method 1 is our baseline.

The final model was also refitted with method 2 as the baseline. Note: When we change the baseline (to level 2), the values of the dummy variables switch, so that $method.method2_i$ becomes $method.method1_i$. Hence, $method.method1_i$ is set to one if student $i$ received method 1, otherwise it is zero.

Our model explains almost r round(100*summary(teach.fit2)$r.squared)% of the variation in language score.

Executive Summary

We were interested in comparing the effectiveness of three teaching methods on language scores acheived by students. We also wanted to see how this was effected by students IQ's.

We found that the effects of the teaching methods are the same regardless of IQ and the effect of IQ is the same regardless of teaching method.

In particular teaching method 2 is significantly better than the other two methods. Also, both methods 1 and 2 are significantly better than method 3.

Not surprisingly, students with higher IQ tended to score higher.

With 95% confidence:

\pagebreak{}

What happens if we don't adjust for IQ?

We expect the confidence intervals for the methods to get wider because we have to "absorb" the extra variation not explained by IQ.

teach.df$method = relevel(teach.df$method, ref = "1")
teach.fit5 = lm(lang ~ method, data = teach.df)
teach.df$method = relevel(teach.df$method, ref = "2")
teach.fit6 = lm(lang ~ method, data = teach.df)
ci = confint(teach.fit5)
ci2 = confint(teach.fit6)
r2 = round(100*summary(teach.fit5)$r.squared)

With 95% confidence :

Our model explains almost r r2% of the variation in language score. You should be able to see that every one of these intervals is wider than before.

Multiple comparisions adjustment (Chap 11)

This will only make sense after Chapter 11:

options(digits=4)
require(emmeans)
emmeans(teach.fit2, specs = pairwise ~ method, infer=T)$contrasts

#Compare to
cbind(coef(summary(teach.fit2)),confint(teach.fit2))
cbind(coef(summary(teach.fit3)),confint(teach.fit3))


Try the s20x package in your browser

Any scripts or data that you put into this service are public.

s20x documentation built on Jan. 14, 2026, 9:07 a.m.