Case Study 9.3: Water Hardness and Mortality
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

require(s20x)
knitr::opts_chunk$set(echo = TRUE)

The data in this case study were collected in an investigation of environmental causes of disease. They show the annual mortality rate per 100,000 for males, averaged over the years 1958--1964, and the calcium concentration (in parts per million) in the drinking water supply for 61 large towns in England and Wales. (The higher the calcium concentration, the harder the water.) Towns at least as far north as Derby are identified in the data set with the code N. In this study we will use R to investigate how are mortality and water hardness related, and if there a geographical factor in the relationship. The data is in the file water.csv on Canvas

load(system.file("extdata", "water.df.rda", package = "s20x"))

water.df = read.csv("WATER.csv")
head(water.df)

head(water.df)

It's always useful to check for missing values.

sum(is.na(water.df))

No missing values. All good. How about a plot? The ideal plot would use Location as a plotting symbol, however we can see that the towns to the South are coded as blanks. We should change that. How? How about making all the values that are not N be S?

water.df$Location[water.df$Location != 'N'] = 'S'

Hmm. That did not work, as planned. That is because Location is a factor. We have a number of choices. One is to make a new variable. The other is to make Location a character vector, make the change and, then make it a factor again. Although this second option sounds like a lot of work it is really only one line of code. It is preferable because we will not clutter our workspace with redundant information.

water.df$Location = with(water.df, {
  Location = as.character(Location);
  Location[is.na(Location)] = 'S';
  Location = factor(Location)
})
water.df$Location

Much better. Now how about that plot?

plot(Mortality~Ca, pch = as.character(water.df$Location), data = water.df)

water.fit = lm(Mortality ~ Ca * Location, data = water.df)
plot(water.fit, which = 1)

normcheck(water.fit)

cooks20x(water.fit)

All looks pretty good.

summary(water.fit)

It does not look like there is any evidence of different slopes. Do we get any additional info from the ANOVA table?

anova(water.fit)

Nope. We only have two levels so we do not need the ANOVA table for this analysis.

So we do not have interaction, but it looks like there is a relationship with hardness and it looks like there is a North/South effect (Location). We should refit the model.

water.fit2 = lm(Mortality ~ Ca + Location, data = water.df)
summary(water.fit2)

Minimal change in $R^2$. How do the fitted lines look on the plot?

b = coef(water.fit2)
plot(Mortality~Ca, pch = as.character(water.df$Location), data = water.df)
abline(b[1:2], col = "blue", lty = 2, lwd = 2)
abline(b[1] + b[3], b[2], col = "red", lty = 2, lwd = 2)
legend("topright", lty = 2, lwd = 2, col = c("blue", "red"), 
    legend = c("North", "South"), bty = "n")
# This code puts some confidence bounds around the lines.
# It's pretty complicated if you don't know R and it isn't examinable
pred.df = data.frame(Ca = rep(seq(0, 160, by = 20), 2), 
    Location = rep(c('N', 'S'), c(9, 9)))
water.pred = predict(water.fit, newdata = pred.df, interval = "confidence")
for(i in 1:2){
  idx = 1:9 + (i - 1) * 9
  for(j in c("lwr", "upr")){
      lines(pred.df$Ca[idx], water.pred[idx, j], lty = 3, lwd = 2, col = "lightgrey");
  }
}

So it looks like mortality decreases as the calcium concentration in the water increases, and that mortality is lower in the South than the North.

Any scripts or data that you put into this service are public.

s20x documentation built on Jan. 14, 2026, 9:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

s20x
Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 9.3: Water Hardness and Mortality
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

Try the s20x package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

s20x Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 9.3: Water Hardness and Mortality In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

Try the s20x package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

s20x
Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 9.3: Water Hardness and Mortality
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis