Biostatistics III in R
In biostat3: Utility Functions, Datasets and Extended Examples for Survival Analysis

Exercise 6. Diet data: tabulating incidence rates and modelling with Poisson regression

library('knitr')
read_chunk('../q6.R')
opts_chunk$set(cache=FALSE)

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

Load the diet data using time-on-study as the timescale. We look at the first six rows of the data using head and look at a summary for the variables using summary:

(a)

We see that individuals with a high energy intake have a lower CHD incidence rate. The estimated crude incidence rate ratio is 0.52 (95% CI: 0.27, 0.97).

(b)

The code is:

The point estimate for the IRR calculated by the Poisson regression is the same as the IRR calculated in 6(a). The regression model can be defined by:

\begin{align} E(\text{chd}) &= \frac{y}{1000}\exp\left(\beta_0 + \beta_1I(\text{hieng}="high") \right) \ &= \exp\left(\beta_0 + \beta_1I(\text{hieng}="high") + \log(y/1000) \right) \end{align}

where $E(\text{chd})$ is the expected count for CHD, $\beta_0$ is the intercept parameter for the log rate and $\beta_1$ is the parameter for the log rate ratio for high energy diets. We have also used $I(\text{condition})$ as an indicator function, which will take a value of 1 when the condition is true and 0 otherwise.

A theoretical observation: if we consider the data as being cross-classified solely by hieng then the Poisson regression model with one parameter is a saturated model (that is, the number of observations is equal to the number of parameters) so the IRR estimated from the model will be identical to the ‘observed’ IRR. That is, the model is a perfect fit.

(c)

The histogram and density curve gives us an idea of the distribution of energy intake. The normal curve suggests that a normal approximation may be acceptable. We can also tabulate moments and percentiles of the distribution.

(d)

This could also be done using dplyr::mutate.

(e)

We see that the CHD incidence rate decreases as the level of total energy intake increases. These calculations are usually better done using a regression model as they are less fiddly and can be adjusted for other covariates.

(f)

(g)

(h)

Level 1 of the categorized total energy is the reference category. The estimated rate ratio comparing level 2 to level 1 is 0.6452 and the estimated rate ratio comparing level 3 to level 1 is 0.2886.

The regression equation can be represented by:

\begin{align} E(\text{chd}) &= \frac{y}{1000}\exp\left(\beta_0 + \beta_2 X_2 + \beta_3 X_3 \right) \ &= \exp\left(\beta_0 + \beta_2 X_2 + \beta_3 X_3 + \log(y/1000) \right) \end{align} where $\beta_2$ and $\beta_3$ are the log rate ratios for $X_2=\text{X2}$ and $X_3=\text{X3}$, respectively.

(i)

Now use level 2 as the reference (by omitting X2 but including X1 and X3). The estimated rate ratio comparing level 1 to level 2 is 1.5498 and the estimated rate ratio comparing level 3 to level 2 is 0.4473.

(j)

The estimates are identical (as we would hope) when we have R create indicator variables for us.

(k)

Somehow (there are many different alternatives) you’ll need to calculate the total number of events and the total person-time at risk and then calculate the incidence rate as events/person-time. As examples,

The estimated incidence rate is 0.00999 events per person-year.

Any scripts or data that you put into this service are public.

biostat3 documentation built on Oct. 29, 2024, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

biostat3
Utility Functions, Datasets and Extended Examples for Survival Analysis

Biostatistics III in R
In biostat3: Utility Functions, Datasets and Extended Examples for Survival Analysis

Exercise 6. Diet data: tabulating incidence rates and modelling with Poisson regression

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

Try the biostat3 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

biostat3 Utility Functions, Datasets and Extended Examples for Survival Analysis

Biostatistics III in R In biostat3: Utility Functions, Datasets and Extended Examples for Survival Analysis

Exercise 6. Diet data: tabulating incidence rates and modelling with Poisson regression

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

Try the biostat3 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

biostat3
Utility Functions, Datasets and Extended Examples for Survival Analysis

Biostatistics III in R
In biostat3: Utility Functions, Datasets and Extended Examples for Survival Analysis