Case Study 15.2: Whio chick survival
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

## Do not delete this!
## It loads the s20x library for you. If you delete it 
## your document may not compile it.
require(s20x)

knitr::opts_chunk$set(
  dev = "png",
  fig.ext = "png",
  dpi = 96
)

## For this case study we also need the emmeans library:
require(emmeans)

Problem

The whio, or blue duck, is an endangered bird that lives in fast-flowing rivers throughout NZ. Although it is a type of duck, it doesn't quack but instead emits a soft whistle that sounds like 'whio' (pronounced 'fee-oh'). It's an iconic NZ species and is pictured on the \$10 bank note.

Rangers from the Department of Conservation monitored whio nests in various regions of NZ over several years. For each nest found, they recorded the number of chicks in the nest, and how many of them survived to the point of fledging (leaving the nest). They also recorded whether the region had suffered floods in the corresponding year, because flooded rivers are a potential cause of chick mortality.

The dataframe Whio.df has records for several regions and years, with the following columns:

Chicks: total number of chicks in monitored nests in that region and year.
Survived: number of the chicks that survived.
Died: number of the chicks that died.
Region: a categorical variable for location, with levels Fiordland, Oparara, and TeUrewera.
Flood: a categorical variable specifying whether the region was affected by floods in the corresponding year, with levels Yes and No.

Questions of interest

Does whio chick survival differ by region and/or flood status? Quantify all findings, including absolute survival probabilities and any differences found in survival among different predictor categories.

Read in and inspect the data

load(system.file("extdata", "whio.df.rda", package = "s20x"))

whio.df <- read.csv("WhioChicks.csv", stringsAsFactors=T)
head(whio.df)
par(mar=c(8, 4, 3, 2), mgp=c(7, 1, 0))
with(whio.df, boxplot(Survived/Chicks ~ Flood*Region, data = whio.df, las=2))
title("Proportion of chicks surviving by flood status & region", cex.main=1)
# Numbers of observations by predictor levels:
table(whio.df$Flood, whio.df$Region)

head(whio.df)
par(mar=c(8, 4, 3, 2), mgp=c(7, 1, 0))
with(whio.df, boxplot(Survived/Chicks ~ Flood*Region, data = whio.df, las=2))
title("Proportion of chicks surviving by flood status & region", cex.main=1)
# Numbers of observations by predictor levels:
table(whio.df$Flood, whio.df$Region)

Comment on the plot and summary information

Chick survival appears to be higher in Fiordland than in the other two regions. There is no consistent pattern of average survival with flood status. In Fiordland, average survival appears a little higher in flood years, but in Oparara there appears to be little difference, and in TeUrewera average survival is noticeably lower in flood years. However, we can see from the summary table that there are only a few observations for flood years so it might be difficult to use this data to draw conclusions about the impact of floods.

Model building and check assumptions

whio.fit1 <- glm(cbind(Survived, Died) ~ Flood * Region, family=binomial, data=whio.df)
summary(whio.fit1)
1-pchisq(257.10, 26)
whio.fit2 <- glm(cbind(Survived, Died) ~ Flood * Region, family=quasibinomial, data=whio.df)
anova(whio.fit2, test="F")
whio.fit3 <- glm(cbind(Survived, Died) ~ Flood + Region, family=quasibinomial, data=whio.df)
anova(whio.fit3, test="F")
whio.fit4 <- glm(cbind(Survived, Died) ~ Region, family=quasibinomial, data=whio.df)
anova(whio.fit4, test="F")
plot(whio.fit4, which=1)

Inference output from the final model

predictGLM(whio.fit4, type="response", newdata=data.frame(Region=c("Fiordland", "Oparara", "TeUrewera")))
# Use emmeans for pairwise comparisons:
whio.pairs <- pairs( emmeans(whio.fit4, "Region"), infer=T)
# Convert to data-frame and remove unwanted columns for display:
whio.pairs <- data.frame(whio.pairs)[,-c(3,4,7)]
whio.pairs
exp(data.frame(whio.pairs)[,c(2, 3, 4)])
100*( exp(data.frame(whio.pairs)[,c(2, 3, 4)]) - 1 )

conf1 = data.frame(predictGLM(whio.fit4, type="response", newdata=data.frame(Region=c("Fiordland", "Oparara", "TeUrewera"))) * 100)
resultStr1 = paste0(sprintf("%.0f%%", abs(conf1$lwr)), " and ", sprintf("%.0f%%", abs(conf1$upr)))

conf2 = exp(data.frame(whio.pairs)[,c(2, 3, 4)])
resultStr2 = paste0(sprintf("%.2f", abs(conf2$asymp.LCL)), " and ", sprintf("%.2f", abs(conf2$asymp.UCL)))

conf3 = 100*( exp(data.frame(whio.pairs)[,c(2, 3, 4)]) - 1 )
resultStr3 = paste0(sprintf("%.0f%%", abs(conf3$asymp.LCL)), " and ", sprintf("%.0f%%", abs(conf3$asymp.UCL)))

Methods and Assumption Checks

We fitted a logistic regression using a Binomial GLM to investigate the relationship between whio chick survival and two categorical predicators: Region, specifying which of three regions each record corresponded to; and Flood, specifying whether or not the region was affected by floods for each record.

We initially fitted the model with an interaction between the two categorical predictors. The test for overdispersion was significant so we refitted the model using the quasibinomial family. Having corrected for overdispersion, the interaction term was non-significant so it was dropped from the model. Refitting the model with main effects only, the Flood variable was non-significant so it was also dropped. The final model involved just the single predictor, Region.

The residual plot revealed no concerns with the final model. However, there might be concerns with non-independence between outcomes for different chicks, because many nests will have contained multiple chicks, which might all die together if the whole nest was destroyed. This might explain the overdispersion in the data.

The final model was:

$$ \log(\text{odds}{ij}) = \mu + \alpha_i $$ where $\text{odds}{ij}$ denotes the odds that chick $j$ in region $i$ survives to leave the nest, and where $\mu$ is the overall mean and $\alpha_i$ is the region effect for region $i$.

Executive Summary

We were interested in whether whio chick survival differs according to region and/or flood status.

There was evidence that survival differed by region; in particular, that the Fiordland region had a higher average chick survival rate than the Te Urewera region. There was no evidence that survival differed by flood status, although we note that there were only a few observations from flood years, so more data might be needed to investigate this question more thoroughly.

We estimate the probability of a chick surviving to leave the nest was:

between r resultStr1[1] in the Fiordland region;
between r resultStr1[2] in the Oparara region;
between r resultStr1[3] in the Te Urewera region.

We estimate that the odds of chick survival in the Fiordland region were between r resultStr2[2] times those in the Te Urewera region.

[Alternatively: We estimate that the odds of chick survival in the Fiordland region were between r resultStr3[2] higher than those in the Te Urewera region.]