# Set knitr options to not evaluate anything
knitr::opts_chunk$set(eval=FALSE, warning=FALSE, message=FALSE, error=FALSE, comment=FALSE)

Introduction

Scenario: On 3 June 2009, a local Public Health office in Greece informed the national Centre for Diseases Control and Prevention (HCDCP) in Athens about an unusual increase in C. jejuni cases among children residing in rural areas around the town. The local hospital informed HCDCP that 31 stool samples among children <15 years of age had been tested positive for C. jejuni since 29 May 2009.

Question 1: What do you know about Campylobacter? Is it known to be causing outbreaks?

Question 2: What information would you need to be sure whether this is an outbreak?

You check the notified Campylobacter cases in the statutory notification system and see that you have never in the last years seen such a high number of Campylobacter notifications; in fact, there have never been more than 10 notifications per month. No major changes in laboratory practices or the personnel working at the lab have been reported, nor has the mandatory notification system undergone any changes recently.

Question 3: How do you proceed to investigate the outbreak? What may have caused it?

You are also reminded that an outbreak of Salmonella Typhimurium had been identified in the same area five years earlier. You look at the old outbreak report and find out that tap water had been suspected as the vehicle of the outbreak back then. Given the fact that there is already a sharp decline in cases, you fear that Campylobacter may already be absent from the source(s) of the. Since environmental investigations are likely to show no proof of Campylobacter presence, you need epidemiological evidence for the source of the outbreak.

Descriptive epidemiology

The outbreak looks to have stopped by the time you go to the outbreak area. You decide to first describe the cases in time, place and person. For this reason, you ask the hospital to inform you about all people with a laboratory diagnosis of Campylobacter since 1 May 2009. A few hours later, you receive information on all these cases in Excel format (campy.xls). Based on the information in this Excel file, describe all cases by time, place and person.

Task 1: Fill in Tables 1 and 2, and produce an epidemic curve.

Table 1 Figure 1 Table 2

Help Task 1

Solution Table 1 Solution Figure 1

Question 4: Based on these descriptive results by time, place and person, is there any more information you would like to have? Do you have any hypotheses already?

Two different water supply systems exist in the area; one for the town municipality and one for the seven adjacent rural municipalities. The 2004 Salmonella outbreak had affected the exact same rural municipalities, leaving the capital municipality unaffected.

You conduct a case-control study in order to identify the mode and vehicle of transmission: you define cases as laboratory-confirmed Campylobacter cases diagnosed at the local hospital in May-June 2009 and choose controls from the population registry of the affected municipalities, frequency matching for sex and age by one-year intervals (6-month intervals for cases aged below 1 year). In other words, you want to achieve the same age distribution among controls like among cases.

For the rest of this exercise, you will be presented and working with a set of the variables used in the outbreak investigation. All variables presented here have to do with water consumption. The file campy.dta contains information on all cases (like with the Excel file we were using previously), as well as the selected controls.

Analytical epidemiology

The rest of the case study will be dealing with the analytical epidemiology as part of the outbreak investigation. This consists of univariate and stratified analysis of the case-control study. The dataset campy.xls contains twenty variables. The first few ones contain demographic information of participants (cases and controls) and the next ones provide information on different exposures directly or indirectly linked to tap water consumption. For your convenience, here’s a list of the variables in the dataset:

Task 2: Load the dataset

Task 3: How many observations does your dataset have? How many cases and how many controls does it contain?

Task 4: Can you think of any variables you could generate based on the ones you already have?

Task 5: Perform a descriptive analysis:

  • What is the age distribution among cases and controls?
  • Do cases and controls differ in terms of age and gender distribution?
    • Would you use a parametric or a non-parametric test to answer this question?
  • Draw an epidemic curve.

Help Task 2

# Set the correct directory
#setwd("~/Documents/Projekte/epiet")

# Initalize the epiet package
library(epiet)

# Load data
load(campy)

Help, Task 3

You can browse the dataset in order to see how it looks like and how many variables it includes. An indirect way of looking how many observations your dataset contains is the str command. You can use it for any kind of variable

str(campy)

Help, Task 4

Both powder milk and concentrated milk are types of milk that need to be diluted with water. For this reason, it may be of interest to combine the two to a single variable. For example:

campy$diluted <- ifelse(campy$concentrated==1 | campy$powder==1, 1, 0)

# stata commands 
# generate diluted = 1 if concentrated==1 | powder==1
# replace diluted = 0 if concentrated==0 & powder==0

The new variable, diluted, contains the information whether a child has been consuming milk that needs to be diluted with water.

Help, Task 5

To see if the age distribution between cases and controls differs (which you might not expect, since you have frequency-matched for age), you can use either the t-test or the Wilcoxon’s ranksum test (also called the Mann-Whitney test). The first one can be used only when the distribution of age in both groups is normal. The latter one can be used otherwise.

tabulate case
bysort case: summarize age, detail
To test whether the age distribution is normal among both cases and controls, you can run:
bysort case: swilk age //Shapiro-Wilk test1 for both cases and controls swilk age if case==1 //Shapiro-Wilk test for cases
swilk age if case==0 //Shapiro-Wilk test for controls
You can also visualise the distribution of age among cases and controls:
histogram age if case==1, frequency
histogram age if case==0, frequency
Age does not appear to be very normally distributed.
Now that you are absolutely sure that the hypothesis of normality in the variable age is not really the case, you choose to run Wilcoxon’s ranksum test2:
ranksum age, by(case)
If you had gone for the t-test, the command would have been:
ttest age, by(case)

We would also like to construct an epidemic curve. There are a couple of ways to do that in R.

plotEpicurve()

Packages are commands that have been written by R users, like you. This means that they are not included in Stata when you first install the program. Every time you need to run them for the first time on a new computer, you have to install them. If you know the name of the command you’re interested in, the easiest way is to type findit and the name of the command in the command line while you are online. Then, follow the instructions on your screen. If you are not allowed to do that, run sysdir in Stata, make a note of the “personal” folder and copy the .ado and .hlp files there (you may need to create the folder yourself). User- written commands you may be familiar with are: cctable, cstable, ccinter, csinter, epicurve, etc.

Univariate analysis

Now that you have already explored the data and performed some descriptive analysis, you should be wondering what really lies in the dataset.

Task 6: Conduct the univariate analysis; fill in Tables 3a-3e for the first four individual exposures with your results. Then, fill in the summary table 4 on the next page.

Help, Task 6

The appropriate measure of impact for a case-control study is the odds ratio (OR). This can be calculated with two different commands; cc and cctable3. The first one calculates the odds ratio, 95% confidence interval and the attributable fraction among the exposed and in the population. However, the 2x2 table is arranged in a different way than the one epidemiologists are used to working with (exposure in the rows, case/control in the columns).

# cc case breastfeeding
# cc case dishwasher
# cc case supply
# cctable case supply
# cctable case tap-diluted
# cctable case tap-diluted, or
#  the latter command runs cctable for all variables between tap and diluted in the dataset.  the option or in the cctable command sorts the results by the odds ratio.
# Another way to see whether there is an association between two dichotomous variables is, of course, the χ2 test. This does not give you the odds ratio, though.
# tab case bottled, chi2
#  Adding the options col or row, which would provide you the percentages by column or row
# respectively, can help you identify the direction of the association.
# The results (odds ratios and 95% confidence intervals) for the univariate analysis are shown in Table 5 on the next page.

Copyright and License

This case study was first designed and written by Ioannis Karagiannis for the training needs of EPIET, PAE, NorFETP and Austrian FETP fellows of Cohort 16 (2010).

Revisions:

You are free:



jakobschumacher/epietr documentation built on May 18, 2019, 10:12 a.m.