incidence2

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  fig.width = 7,
  fig.height = 5
)

What does it do?

{incidence2} is an R package that implements functions to compute, handle and visualise incidence data. It aims to be intuitive to use for both interactive data exploration and as part of more robust outbreak analytic pipelines.

The package is based around objects of the namesake class, <incidence2>. These objects are a data frame subclass with some additional invariants. That is, an <incidence2> object must:

To create and work with <incidence2> objects we provide a number of functions:

Usage

The following sections give an overview of the package utilising two different data sets. The first of these datasets comes from the {outbreaks} package and is a synthetic linelist generated from a simulated Ebola Virus Disease (EVD) outbreak. The second data set is available within {incidence2} and represents a pre-aggregated time-series of Covid cases, tests, hospitalisations, and deaths for UK regions that was obtained using the {covidregionaldata} package (extracted on 2021-06-03).

library(outbreaks)  # for the underlying data
library(ggplot2)    # For custom plotting later
library(incidence2) 

ebola <- ebola_sim_clean$linelist
str(ebola)
covid <- covidregionaldataUK
str(covid)

Computing incidence from a linelist

To compute daily incidence we pass to incidence() a linelist of observation data. This input should be in the form of a data frame and we must also pass the name of a variable in the data that we can use to index the input. Note that whilst we we refer to this index as the date_index there is no restriction on it's type, save it needing represent the relative time of an observation (i.e. it has an ordering).

daily <- incidence(ebola, date_index = "date_of_onset")
daily
plot(daily)

The daily data is quite noisy, so we may want to pre group dates prior to calculating the incidence. One way to do this is to utilise functions from the {grates} package. Here we use the as_isoweek() function to convert the 'date of onset' to an isoweek (a week starting on a Monday) before calculating the incidence incidence:

# isoweek incidence
weekly_ebola <- transform(ebola, date_of_onset = as_isoweek(date_of_onset))
inci <- incidence(weekly_ebola, date_index = "date_of_onset")
inci
plot(inci, border_colour = "white")

By grouping dates prior to calling incidence() it makes it clear to future readers of your code (including yourself) which transformations are being applied to your input data. This grouping, however, is such a common and useful operation that we have chosen to integrate much of {grates} functionality directly in to incidence2. This integration is done via an interval parameter in the incidence() call. This can take values:

As an example, the following is equivalent to the inci output above:

# isoweek incidence using the interval parameter
inci2 <- incidence(ebola, date_index = "date_of_onset", interval = "isoweek")
inci2

# check equivalent
identical(inci, inci2)

If we wish to aggregate by specified groups we can use the groups argument. For instance, computing incidence by gender:

inci_by_gender <- incidence(
    ebola,
    date_index = "date_of_onset",
    groups = "gender",
    interval = "isoweek"
)
inci_by_gender

For grouped data, the plot method will create a faceted plot across groups unless a fill variable is specified:

plot(inci_by_gender, border_colour = "white", angle = 45)
plot(inci_by_gender, border_colour = "white", angle = 45, fill = "gender")

incidence() also supports multiple date inputs:

grouped_inci <- incidence(
    ebola,
    date_index = c(
        onset = "date_of_onset",
        infection = "date_of_infection"
    ), 
    interval = "isoweek",
    groups = "gender"
)
grouped_inci

When multiple date indices are given, they are used for rows of the resultant plot, unless the resultant variable is used to fill:

plot(grouped_inci, angle = 45, border_colour = "white")
plot(grouped_inci, angle = 45, border_colour = "white", fill = "count_variable")

Computing incidence from pre-aggregated data

The Covid data set is in a wide format with multiple count values given for each day. To convert this to long form incidence we specify similar variables to before but also include the count variables we are interested in:

monthly_covid <- 
    covid |> 
    subset(!region %in% c("England", "Scotland", "Northern Ireland", "Wales")) |> 
    incidence(
        date_index = "date",
        groups = "region",
        counts = c("cases_new"),
        interval = "yearmonth"
    )
monthly_covid
plot(monthly_covid, nrow = 3, angle = 45, border_colour = "white")

Plotting in style of European Programme for Intervention Epidemiology Training (EPIET)

For small datasets it is convention of EPIET to display individual cases as rectangles. We can do this by setting show_cases = TRUE in the call to plot() which will display each case as an individual square with a white border.

dat <- ebola[160:180, ]
i_epiet <- incidence(dat, date_index = "date_of_onset", date_names_to = "date")
plot(i_epiet, color = "white", show_cases = TRUE, angle = 45, n_breaks = 10)
i_epiet2 <- incidence(
    dat, date_index = "date_of_onset",
    groups = "gender", date_names_to = "date"
)
plot(
    i_epiet2, show_cases = TRUE,
    color = "white", angle = 45, n_breaks = 10, fill = "gender"
)

Modifying incidence objects

regroup()

Sometimes you may find you've created a grouped incidence but now want to change the internal grouping. Assuming you are after a subset of the grouping already generated, you can use regroup() to get the desired aggregation:

# generate an incidence object with 3 groups
x <- incidence(
    ebola,
    date_index = "date_of_onset",
    interval = "isoweek",
    groups = c("gender", "hospital", "outcome")
)

# regroup to just one group
xx <- regroup(x, c("gender", "outcome"))
xx

# drop all groups
regroup(x)

cumulate()

We also provide a helper function, cumulate() to easily generate cumulative incidences:

y <- regroup(x, "hospital")
y <- cumulate(y)
y
plot(y, angle = 45, nrow = 3)

subsetting and other manipulations

keep_first(), keep_last() and keep_peaks()

Once your data is grouped by date, you may want to select the first or last few entries based on a particular date grouping using keep_first() and keep_last():

inci <- incidence(
    ebola,
    date_index = "date_of_onset",
    interval = "isoweek",
    groups = c("hospital", "gender")
)

keep_first(inci, 3)
keep_last(inci, 3)

Similarly you may want to quickly view the incidence peaks:

keep_peaks(inci)

complete_dates()

Sometimes your incidence data does not span consecutive units of time, or different groupings may cover different periods. To this end we provide a complete_dates() function which ensures a complete and identical range of dates are given counts (by default filling with a 0 value).

dat <- data.frame(
    dates = as.Date(c("2020-01-01", "2020-01-04")),
    gender = c("male", "female")
)
i <- incidence(dat, date_index = "dates", groups = "gender")
i
complete_dates(i)

preservation of class

<incidence2> objects have been carefully constructed to preserve their structure under a range of different operations that can be applied to data frames. By this we mean that if an operation is applied to an <incidence2> object then as long as the invariants of the object are preserved (i.e. groups, interval and uniqueness of rows) then the object retain it's incidence class. If the invariants are not preserved then a <data.frame> will be returned instead.

# filtering preserves class
subset(inci, gender == "f" & hospital == "Rokupa Hospital")
inci[c(1L, 3L, 5L), ]

# Adding columns preserve class
inci$future <- inci$date_index + 999L
inci

# rename preserve class
names(inci)[names(inci) == "date_index"] <- "isoweek"
inci

# select returns a data frame unless all date, count and group variables are
# preserved in the output
str(inci[,-1L])
inci[, -6L]

Accessing variable information

We provide multiple accessors to easily access information about an <incidence2> object's structure:

# the name of the date_index variable of x
get_date_index_name(inci)

# alias for `get_date_index_name()`
get_dates_name(inci)

# the name of the count variable of x
get_count_variable_name(inci)

# the name of the count value of x
get_count_value_name(inci)

# the name(s) of the group variable(s) of x
get_group_names(inci)

# list containing date_index variable of x
str(get_date_index(inci))

# alias for get_date_index
str(get_dates(inci))

# list containing the count variable of x
str(get_count_variable(inci))

# list containing count value of x
str(get_count_value(inci))

# list of the group variable(s) of x
str(get_groups(inci)) 


Try the incidence2 package in your browser

Any scripts or data that you put into this service are public.

incidence2 documentation built on July 9, 2023, 5:35 p.m.