Case Study 15.4: STATS 201/8 Extra Case Study - Binomial GLM - Binary Data.

knitr::opts_chunk$set(fig.height=3)
## Do not delete this!
## It loads the s20x library for you. If you delete it 
## your document may not compile
library(s20x)

Question 1

Snapper (Pagrus auratus) are a fish that is heavily targeted by recreational fishers in the Hauraki Gulf. The legal minimum landed size is 30cm, and all snapper less than this size must be discarded (i.e., returned to the sea).

An experiment was undertaken to determine the survival of discarded snapper. Snapper were caught by hook and line, measured for length, the hook was removed, and all snapper were placed in holding tanks for 5 days. The handling time (the duration between the fish being brought to the surface and being placed in a tank) was recorded as whether or not it was 2 minutes or less.

The resulting binary data is stored in Snapper.csv which contains the following variables:

Variable | Description -----------|-------------------------------------------------------------------------- Survive | Whether or not the snapper survived, coded as 1=alive after 5 days, 0=dead. Length | The length of the snapper (cm). Handled | The amount of time the snapper was handled (classified as either Less if 2 minutes or less, or More if more than 2 minutes).

It is of interest to learn about the survival probability of discarded snapper. In particular:

Instructions:

Questions of Interest

It is of interest to learn about the survival probability of discarded snapper, in particular, how fish length and handling time affect survival.

Read in and inspect the data:

load(system.file("extdata", "Snapper1.df.rda", package = "s20x"))
Snapper.df <- Snapper1.df
Snapper.df=read.csv("Snapper.csv",header=T,stringsAsFactors=TRUE)
plot(Survive~Length,data=Snapper.df,ylab='Snapper Survival: 1= survived, 0 = died)',
     xlab='Snapper Length (cm)',
     main = "Snapper Survival by Length",
     pch  = ifelse(Handled == 'Less', 1, 4),
     col = ifelse(Handled == 'Less', 'blue', 'Red'))
legend('center',c('Less','More'),col=c("blue","red"),pch=c(1,4))

# Second plot where points in data set have had a jitter effect added (a small amount of 
# random noise to shift them so you can see replications in a plot).
plot(jitter(Survive,0.25)~jitter(Length,1),data=Snapper.df,
     ylab='Snapper Survival: 1= survived, 0 = died)',
     xlab='Snapper Length (cm)',
     main = "Snapper Survival by Length, points jittered to show repitition.",
     pch  = ifelse(Handled == 'Less', 1, 4),
     col = ifelse(Handled == 'Less', 'blue', 'Red'))
legend('center',c('Less','More'),col=c("blue","red"),pch=c(1,4))

Comment on plot.

Most of the 0's (fish which didn't survive) are for Snapper with more time out of the water. Of the smaller number of fish with less time out of water that didn't survive, most tended to be smaller in length.

Fit an appropriate GLM:

Surv.glm1=glm(Survive~Length*Handled,family=binomial,data=Snapper.df)
summary(Surv.glm1)

Surv.glm2=glm(Survive~Length+Handled,family=binomial,data=Snapper.df)
plot(Surv.glm2,which=1)

summary(Surv.glm2)
confint(Surv.glm2)
exp(confint(Surv.glm2))
100*(exp(confint(Surv.glm2))-1)

Calculate survival probabilities for 25cm snapper with handling times of both "Less" and "More".

Pred.df=data.frame(Length=c(25,25),Handled=c("Less","More"))
predict(Surv.glm2,newdata=Pred.df,type="response")
conf1 = data.frame(100*(exp(confint(Surv.glm2))-1))
names(conf1) = c("lower", "upper")
resultStr1 = paste0(sprintf("%.1f", abs(conf1[2,]$lower)), " and ", sprintf("%.1f", abs(conf1[2,]$upper)))
resultStr2 = paste0(sprintf("%.0f", abs(conf1[3,]$upper)), " and ", sprintf("%.0f", abs(conf1[3,]$lower)))

Methods and Assumption Checks

Our response variable is a binary response variable - whether or not the snapper survived. We have therefore fitted a generalised linear model with a binomial response distribution (i.e. a logistic regression model). We have two explanatory variables: Length (numeric) and Handled (categoric). We want to know if any length effect depends on Handled so have fitted a model with an interaction term. The interaction term is not significant, so we have dropped this (P-value = 0.7882). Both the explanatory variables were then found to be significant so the model was not simplified any further. (Since we do not have grouped data the deviance statistic is not checked.) The residuals do not reveal any major issues.

Our model is $log(odds_i) = \beta_0 + \beta_1 \times Length_i + \beta_2 \times HandledMore_i$.

Where $odds_i$ are the odds of the i'th snapper with combinations of variables $Length_i$ and $HandledMore_i$ surviving. $HandledMore_i$ is 1 if the time the snapper was out of the water was more than 2 minutes and 0 otherwise.

Executive Summary

It is of interest to learn how fish length and handling time affect survival of snapper that are discarded after being caught and found to be below the legal size limit. An experiment was conducted to gather data on this.

Both snapper length and handling time are associated with the survival probability of discarded snapper. However, there is no evidence that the effect of length on survival differs depending on the handling time.

For the same handling time, we estimate that every 1 cm increase in length of the Snapper increases the odds of survival by between r resultStr1[1] percent.

For the same length snapper, we estimate that having a handling time greater than 2 minutes decreases the odds of survival by between r resultStr2[1] percent when compared to having a handling time of 2 minutes or less.

We estimate that the survival probability of 25 cm snapper is 94% if handling time is 2 minutes or less. This drops to 46% if the handling time is more than 2 minutes.



Try the s20x package in your browser

Any scripts or data that you put into this service are public.

s20x documentation built on Jan. 14, 2026, 9:07 a.m.