208 Data Analysis

knitr::opts_chunk$set(fig.height=3)

## Do not delete this!
## It loads the s20x library for you. If you delete it 
## your document may not compile
library(s20x)

Question 1

A study was carried out into the mortality of people who have sepsis. Sepsis is a life-threatening condition that arises when the body's response to infection causes injury to its own tissues and organs. We wish to see if their mortality increases depending on how healthy they are when they are admitted to an intensive care unit (ICU). In order to measure their health we will use a diagnostic measurement called APACHE II.

APACHE II ("Acute Physiology And Chronic Health Evaluation II") is a severity-of-disease classification system (Knaus et al., 1985), one of several ICU scoring systems. It is applied within 24 hours of admission of a patient to an ICU. An integer score from 0 to 71 is computed based on several measurements; higher scores correspond to more severe disease and a higher risk of death.

A sample of septic patients admitted to ICU’s was analysed.

The resulting grouped data is stored in Apache.csv which contains the following variables:

Variable | Description -------------------|-------------------------------------------------------------------------- apache_score | The APACHE II score for patients in the group. number_of_patients | The number of patients in the group. deaths | The number of patients in the group who died within 30 days of being admitted to ICU.

It was of interest to see how increasing APACHE II scores related to the mortality rate. In particular, to assess the effect of a 1 point or 5 point difference in APACHE II scores and to estimate mortality rates for APACHE II scores of 10, 20, 30 and 40.

Instructions:

Comment on the plot of the data.(Some of the data is listed to help clarify the plot.)
Fit an appropriate GLM model to the data. Generate appropriate confidence interval output for this model.
Get intervals to predict the probability of a patient admitted to ICU dying with 30 days with APACHE II scores of 10, 20, 30 or 40.
Write appropriate Methods and Assumption Checks.
Write sentences, as if for an Executive Summary, interpreting your four prediction intervals.
Write sentences, as if for an Executive Summary, quantifying the effect of a 1 point and 5 point increase in APPACHE II scores.
Superimpose the model onto the original plot of the data, including confidence interval limits.

Questions of Interest

Read in and inspect the data:

load(system.file("extdata", "Apache.df.rda", package = "s20x"))

Apache.df=read.csv("Apache.csv",header=TRUE)
plot(deaths/number_of_patients~apache_score, 
     ylab="prob. of death", 
     xlab="APACHE II score",data=Apache.df)
head(Apache.df)
tail(Apache.df,10)

plot(deaths/number_of_patients~apache_score, 
     ylab="prob. of death", 
     xlab="APACHE II score",data=Apache.df)
head(Apache.df)
tail(Apache.df,10)

Comment on plot.

The scatter plot clearly shows that the probability of death increase as the APACHE II score increases. The patient with the highest APACHE II score (41) survived, while all the patients with scores between 31 and 40 died.

Fit an appropriate GLM:

fit=glm(I(deaths/number_of_patients)~apache_score,weights=number_of_patients,family=binomial,data=Apache.df)

plot(fit,which=1)
p.value = 1 - pchisq(summary(fit)$deviance, summary(fit)$df.residual) 
p.value

summary(fit)
confint(fit)
exp(confint(fit))

exp(confint(fit)[2,])
100*(exp(confint(fit)[2,])-1)

exp(confint(fit)[2,]*5)
100*(exp(confint(fit)[2,]*5)-1)

Get intervals to predict the probability of a patient admitted to ICU dying with 30 days with APACHE II scores of 10, 20, 30 or 40.

pred.df=data.frame(apache_score = c(10,20,30,40))
predictGLM(fit, newdata=pred.df, type="response")

conf1 = data.frame(predictGLM(fit, newdata=pred.df, type="response"))
resultStr1 = paste0(sprintf("%.2f", abs(conf1$lwr)), " and ", sprintf("%.2f", abs(conf1$upr)))

conf2 = t(exp(confint(fit)[2,]))
conf2 = data.frame(conf2)
names(conf2) = c("lower", "upper")
resultStr2 = paste0(sprintf("%.2f", abs(conf2$lower)), " and ", sprintf("%.2f", abs(conf2$upper)))

conf3 = t(100*(exp(confint(fit)[2,])-1))
conf3 = data.frame(conf3)
names(conf3) = c("lower", "upper")
resultStr3 = paste0(sprintf("%.0f%%", abs(conf3$lower)), " and ", sprintf("%.0f%%", abs(conf3$upper)))

conf4 = t(exp(confint(fit)[2,]*5))
conf4 = data.frame(conf4)
names(conf4) = c("lower", "upper")
resultStr4 = paste0(sprintf("%.2f", abs(conf4$lower)), " and ", sprintf("%.2f", abs(conf4$upper)))

conf5 = t((100*(exp(confint(fit)[2,]*5)-1)))
conf5 = data.frame(conf5)
names(conf5) = c("lower", "upper")
resultStr5 = paste0(sprintf("%.0f%%", abs(conf5$lower)), " and ", sprintf("%.0f%%", abs(conf5$upper)))

Method and Assumption Checks

We fitted a logistic regression to the APACHE II data as we had a binary response variable (whether or not the patient died). We had one explanatory variable, the APACHE II score, which was significant.

There was no evidence of overdispersion and the residual plot didn't reveal any major problems. (Observation 38 is unusual, but not too extreme.)

The fitted model is approximately $\log(odds_i) = \beta_0 + \beta_1 \times apache_score_i$

where the odds are the odds of a person in the i'th group with $apache_score_i$ dying in the 30 days after being admitted to ICU.

Write sentences, as if for an Executive Summary, interpret your four predictions intervals.

We estimate that the probability that a patient who was admitted to ICU with sepsis dying within 30 days is:

between r resultStr1[1] if their APACHE II score was 10.
between r resultStr1[2] if their APACHE II score was 20.
between r resultStr1[3] if their APACHE II score was 30.
between r resultStr1[4] if their APACHE II score was 40.

Write sentences, as if for an Executive Summary, quantifying the effect of a 1 point and 5 point increase in APPACHE II scores.

We estimate an increase in a typical patients APACHE II score by 1 point is associated with increasing the odds of the patient dying in the next 30 days by between r resultStr3[1].

We estimate an increase in a typical patients APACHE II score by 5 points is associated with increasing the odds of the patient dying in the next 30 days by between r resultStr5[1].

Alternatively:

We estimate an increase in a typical patients APACHE II score by 1 point is associated with multiplying the odds of the patient dying in the next 30 days by between r resultStr2[1].

We estimate an increase in a typical patients APACHE II score by 5 points is associated with multiplying the odds of the patient dying in the next 30 days by between r resultStr4[1].

Superimpose the model onto the original plot of the data, including confidence interval limits.

# Create predictions for before over range 0-42.
pred2.df = data.frame(apache_score = seq(0,42))
Preds=predictGLM(fit, newdata=pred2.df, type="response")

# Redo plot from above and add curve to the data.
plot(deaths/number_of_patients~apache_score, ylab="prob. of death",  xlab="APACHE II score",
     main="Plot with model superimposed",data=Apache.df)
lines(seq(0,42),Preds[,1])
lines(seq(0,42),Preds[,2],col="Red",lty=2 )
lines(seq(0,42),Preds[,3],col="Red",lty=2 )

Any scripts or data that you put into this service are public.

s20x documentation built on Jan. 14, 2026, 9:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

s20x
Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 15.3: STATS 201/8 Extra Case Study - Binomial GLM - Grouped Data.
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

Question 1

Questions of Interest

Read in and inspect the data:

Comment on plot.

Fit an appropriate GLM:

Get intervals to predict the probability of a patient admitted to ICU dying with 30 days with APACHE II scores of 10, 20, 30 or 40.

Method and Assumption Checks

Write sentences, as if for an Executive Summary, interpret your four predictions intervals.

Write sentences, as if for an Executive Summary, quantifying the effect of a 1 point and 5 point increase in APPACHE II scores.

Superimpose the model onto the original plot of the data, including confidence interval limits.

Try the s20x package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

s20x Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 15.3: STATS 201/8 Extra Case Study - Binomial GLM - Grouped Data. In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

Question 1

Questions of Interest

Read in and inspect the data:

Comment on plot.

Fit an appropriate GLM:

Get intervals to predict the probability of a patient admitted to ICU dying with 30 days with APACHE II scores of 10, 20, 30 or 40.

Method and Assumption Checks

Write sentences, as if for an Executive Summary, interpret your four predictions intervals.

Write sentences, as if for an Executive Summary, quantifying the effect of a 1 point and 5 point increase in APPACHE II scores.

Superimpose the model onto the original plot of the data, including confidence interval limits.

Try the s20x package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

s20x
Functions for University of Auckland Course STATS 201/208 Data Analysis

Case Study 15.3: STATS 201/8 Extra Case Study - Binomial GLM - Grouped Data.
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis