library('knitr') read_chunk('../q6.R') opts_chunk$set(cache=FALSE)
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
Load the diet data using time-on-study as the timescale. We look at the first six rows of the data using head
and look at a summary for the variables using summary
:
We see that individuals with a high energy intake have a lower CHD incidence rate. The estimated crude incidence rate ratio is 0.52 (95% CI: 0.27, 0.97).
The code is:
The point estimate for the IRR calculated by the Poisson regression is the same as the IRR calculated in 6(a). The regression model can be defined by:
\begin{align} E(\text{chd}) &= \frac{y}{1000}\exp\left(\beta_0 + \beta_1I(\text{hieng}="high") \right) \ &= \exp\left(\beta_0 + \beta_1I(\text{hieng}="high") + \log(y/1000) \right) \end{align}
where $E(\text{chd})$ is the expected count for CHD, $\beta_0$ is the intercept parameter for the log rate and $\beta_1$ is the parameter for the log rate ratio for high energy diets. We have also used $I(\text{condition})$ as an indicator function, which will take a value of 1 when the condition is true and 0 otherwise.
A theoretical observation: if we consider the data as being
cross-classified solely by hieng
then the Poisson regression model
with one parameter is a saturated model (that is, the number of
observations is equal to the number of parameters) so the IRR
estimated from the model will be identical to the ‘observed’ IRR. That
is, the model is a perfect fit.
The histogram and density curve gives us an idea of the distribution of energy intake. The normal curve suggests that a normal approximation may be acceptable. We can also tabulate moments and percentiles of the distribution.
This could also be done using dplyr::mutate
.
We see that the CHD incidence rate decreases as the level of total energy intake increases. These calculations are usually better done using a regression model as they are less fiddly and can be adjusted for other covariates.
Level 1 of the categorized total energy is the reference category. The estimated rate ratio comparing level 2 to level 1 is 0.6452 and the estimated rate ratio comparing level 3 to level 1 is 0.2886.
The regression equation can be represented by:
\begin{align} E(\text{chd}) &= \frac{y}{1000}\exp\left(\beta_0 + \beta_2 X_2 + \beta_3 X_3 \right) \ &= \exp\left(\beta_0 + \beta_2 X_2 + \beta_3 X_3 + \log(y/1000) \right) \end{align} where $\beta_2$ and $\beta_3$ are the log rate ratios for $X_2=\text{X2}$ and $X_3=\text{X3}$, respectively.
Now use level 2 as the reference (by omitting X2 but including X1 and X3). The estimated rate ratio comparing level 1 to level 2 is 1.5498 and the estimated rate ratio comparing level 3 to level 2 is 0.4473.
The estimates are identical (as we would hope) when we have R create indicator variables for us.
Somehow (there are many different alternatives) you’ll need to calculate the total number of events and the total person-time at risk and then calculate the incidence rate as events/person-time. As examples,
The estimated incidence rate is 0.00999 events per person-year.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.