knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
plot_epid <- diyar:::plot_epid date <- function(x) as.Date(x, "%d/%m/%Y") dttm <- function(x) as.POSIXct(x, "UTC", format="%d/%m/%Y %H:%M:%S")
In epidemiological analyses, an exact distinction between temporal events is not always possible. Thus, static but reasonable cut-offs are used to distinguish between events. This is an important aspect of most case definitions. For example, you may want to distinguish between repeat positive infection results. Another example is to distinguish between single-pathogen infections and co-infections.
episodes()
provide a convenient but flexible way of doing this. It links events into a temporal sequence, which can you can duplicate or use for in a sub-analysis.
episode_group()
, fixed_episodes()
and rolling_episodes()
are wrapper functions of episodes()
. They exist to support code from earlier versions of diyar (< v0.2.0)
. Please consider using episodes()
moving forward.
These functions can process a variety of case definitions. See examples below.
An episode as produced by episodes()
is a set of events (dated records) within a specified period. Each episode begins with an index event and may contain duplicate or recurrent events. There are two types - "fixed"
episodes that do not include periods of recurrence, and "rolling"
episodes that do.
Before we begin, let's go over some terms which will pop up now and again.
event
- a dated record i.e. single time point.event period
- events
with a known duration i.e. start and endpoints in time.reference event/period
- index event
."lengths"
- temporal cut-offs for distinguishing between events
.case_length
- cut-off period to distinguish one event
from another.recurrence_length
- cut-off period to distinguish an event
from its recurrence.window
- all events
within the same case_length
or recurrence_length
case window
- all events
within a case_length
.recurrence window
- all events
within a recurrence_length
.episode
- all event
within the same cut-off period i.e. every overlapping window
.case event
- index event
for the entire episode
recurrent event
- first event
within a recurrence_length
or period of recurrence.duplicate event
- repeat event
of the case
or recurrent event
.episode tracking
- the process of tracking events
to episodes
.In "fixed" episodes
, each window
forms a separate episode
. Conversely, in "rolling" episodes
overlapping windows
merge into a larger one. This signifies that the episode
is still on-going or has recurred. As a result, every event
in both windows
are considered part of the same "rolling" episode
.
Firstly, let's go through what episodes
look like in diyar
. They are stored as epid
objects (S4
object class). An epid
object is an episode
identifier with slots
for additional information about each episode
.
library(diyar) ds <- c("01/04/2019", "06/04/2019", "10/04/2019", "11/04/2019") ds <- as.Date(ds, "%d/%m/%Y") ep <- episodes(date = ds, case_length = 6, group_stats = T, data_source = c("DS1","DS1","DS3", "DS1"), episode_type = "rolling")
df <- data.frame(events = ds, data_source = c("DS1","DS1","DS3", "DS1")) df$epid <- ep
For an epid
the following information will be displayed (format.epid
);
r substr(format(ep[1]),1,3)
- episode
identifier.r substr(format(ep[1]),5,28)
- episode
duration. Only shown if group_stats
is TRUE
.r paste0(unique(substr(format(ep),30,33)), collapse="/")
- type of event i.e. case
, duplicate
and recurrent events
respectivelySee ?episodes
for further details.
to_df() transforms epid
objects to data.frames
.
cat("to_df(`epid`)") to_df(df$ep)
Your first consideration should be a cut-off point to distinguish between episodes
. Here, this is the case_length
. A case_length
of 5
means that; repeat events
within 6 days (Time difference of 5 days)
of the reference event
are considered duplicates
of the reference event
. See below.
library(diyar) # Events ds <- c("01/04/2019", "03/04/2019", "13/04/2019","16/04/2019", "18/04/2019") ds <- data.frame(date= as.Date(ds, "%d/%m/%Y")) ds$date
Figure 1a: fixed episodes
# 6-day (5-day difference) episodes - fixed episodes ds$f1 <- episodes(date = ds$date, case_length = 5, display = "none") ds$f1
plot_epid(epid = ds$f1, date= ds$date, case_length = 5)
schema(ds$f1, show_labels = c("length_arrow", "length_label", "date"))
Alternatively, you can specify a cut-off range. For example, a case_length
of r format(number_line(12, 16))
means that; repeat events
within 13
and 17
days after the reference event
are considered duplicates
of the reference event
. Such ranges must be created with number_line()
. See the number line vignette.
Figure 1b: fixed episodes with a cut-off range
rng <- number_line(12, 16); rng ds$f2 <- episodes(date = ds$date, case_length = rng, display = "none") ds$f2
plot_epid(epid = ds$f2, date= ds$date, case_length = rng)
schema(ds$f2, show_labels = c("length_arrow", "length_label", "date"))
events
before the 13th
can be skipped by changing skip_if_b4_lengths
to TRUE
.
Figure 1c: fixed episodes where events before the the cut-off range are not skipped
ds$f3 <- episodes(date = ds$date, case_length = rng, skip_if_b4_lengths = T, display = "none") ds$f3
plot_epid(epid = ds$f3, date= ds$date, case_length = rng)
schema(ds$f3, show_labels = c("length_arrow", "length_label", "date"))
Finally, you can use multiple case_lengths
by supplying these as a list
. For example, a case_length
of list(5, number_line(15, 17))
means that; repeat events within 6 days
, and within 16
and 18 days
after the index event are considered duplicates.
Figure 1d: fixed episodes with multiple case_lengths
rngs <- list(5, number_line(15, 17)) ds$f4 <- episodes(date = ds$date, case_length = rngs, display = "none") ds$f4
Note: case_lengths
and recurrence_lengths
are always calculated from the endpoint of an event. This has no practical implication when working with time points because the start and end points are the same. However, this is not the case for event period
. See interval grouping for more information about this.
Your next consideration should be the index (reference
) event
. It's common practice for this to be the earliest event
. However, there are times when the most recent event
will be more appropriate. For example, the most recent positive infection result before a patient dies. This can then be the index event
. You achieve this by changing from_last
to TRUE
. This starts episode tracking
at the most recent event
and proceed backwards in time.
Figure 2: fixed episodes where episode tracking was backwards in time
ds$f4 <- episodes(date = ds$date, case_length = 5, from_last = T, display = "none") ds$f4
plot_epid(epid = ds$f4, date= ds$date, case_length = 5, from_last = T)
schema(ds$f4, show_labels = c("length_arrow", "length_label", "date"))
Sometimes the type of event
is more relevant to determining if it's an index event
. For example, determining the index infection among co-infections. Below is an example with a co-infection involving human immunodeficiency virus (HIV), hepatitis B virus (HBV) and Mycobacterium tuberculosis (MyCt).
ds2 <- data.frame(date= as.Date(c("13/03/2020", "01/04/2020","15/08/2020", "11/12/2020", "31/12/2020"), "%d/%m/%Y"), diag = c("HBV","HIV","MyCt","HBV", "MyCt"), stringsAsFactors = F) # First, make the HIV diagnosis the index event ds2$user_ord <- ifelse(ds2$diag=="HIV", 1,2) # Then track concurrent infections up to 6 months AFTER the HIV diagnosis ds2$ep1 <- episodes(date = ds2$date, custom_sort = ds2$user_ord, data_source = ds2$diag, case_length = 6, episode_unit = "months", display = "none") # Using `data_source` populates the `epid_dataset` slot of the `epid` object. # This is useful when you're working with different datasets ds2$ep1_ds <- ds2$ep1@epid_dataset # Track preceding infections up to 6 months BEFORE the HIV diagnosis ds2$ep2 <- episodes(date = ds2$date, custom_sort = ds2$user_ord, data_source = ds2$diag, case_length = -6, episode_unit = "months", display = "none") ds2$ep2_ds <- ds2$ep2@epid_dataset ds2
Notice that in ep1
, the HIV diagnosis and the HBV diagnosis before it are not linked. This also happens with the HIV and MyCt diagnoses in ep2
. This is because episode tracking
occurs in one direction of time (from_last
). If required, you can track events
in both directions for time i.e. before and after an index event
. You can do this by supplying an applicable cut-off range e.g. number_line(-6, 6)
or list(number_line(-6, 0), number_line(0, 6))
. To some extent, you can also do this with the old bi_direction
argument.
See examples of each approach below.
# Track preceding and concurrent infections up to 6 months BEFORE OR AFTER the HIV diagnosis # Method 1a # Track diagnoses 6 months AFTER the HIV diagnosis. # `bi_direction` then makes the function to also check 6 months BEFORE the HIV diagnosis ds2$ep3a <- fixed_episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = 6, bi_direction = T, episode_unit = "months", display = "none") # Method 1b # Track diagnoses 6 months BEFORE the HIV diagnosis # bi_direction` then makes the function to also check 6 months AFTER the HIV diagnosis ds2$ep3b <- fixed_episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = -6, bi_direction = T, episode_unit = "months", data_source = ds2$diag, display = "none") # Method 2a # Track diagnoses 6 months BEFORE or AFTER the HIV diag. # There's no need for `bi_direction`. rng <- number_line(-6, 6) ds2$ep4 <- episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = rng, episode_unit = "months", data_source = ds2$diag, display = "none") # Method 2b # Track diagnoses within 1-6 months AFTER the HIV diagnosis. #` bi_direction` then makes the function to also check within 1-6 months BEFORE the HIV diagnosis # Diagnoses before 2 months on either side are skipped as mentioned in the "cut-offs" section rng2 <- number_line(1, 6) ds2$ep5 <- fixed_episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = rng2, bi_direction = T, episode_unit = "months", display = "none") # Method 3a # Track diagnoses within 1-4 months AFTER the HIV diagnosis, AND # between 4-6 months # bi_direction` then makes the function to also check 2-4, and 4-6 month BEFORE the HIV diagnosis rngs <- list(number_line(1, 4), number_line(4, 6)) ds2$ep6 <- fixed_episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = rngs, bi_direction = T, episode_unit = "months", display = "none") # Method 3b # Track diagnoses within 2-4 and 4-6 months BEFORE OR AFTER the HIV diagnosis # There's no need for `bi_direction`. rngs2 <- c(rngs, lapply(rngs, invert_number_line)) rngs2 ds2$ep7 <- episodes(date = ds2$date, custom_sort = ds2$user_ord, case_length = rngs2, episode_unit = "months", display = "none") ds2
Ties in preference for index events
(custom_sort
values) will be broken with the chronological order of those events. This will either be the earliest or most recent event
depending on from_last
. Alternatively, you can break such ties with extra levels of custom_sort
. See ?custom_sort()
.
See the example below for tracking episodes
using 2 and 3 levels of custom_sort
.
dbf <- infections[c("date", "infection")] # 3 levels: "UTI" > "BSI" > "RTI" dbf$c_sort1 <- ifelse(dbf$infection == "UTI", 1, ifelse(dbf$infection == "BSI", 2, 3)) # 2 levels: "UTI" > ("BSI" OR "RTI") dbf$c_sort2 <- ifelse(dbf$infection == "UTI", 1, 2) # 2 levels: "BSI" > ("UTI" OR "RTI") dbf$c_sort3 <- ifelse(dbf$infection == "BSI", 1, 2) dbf$ep_1 <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort1, display = "none") dbf$ep_2 <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort2, display = "none") dbf$ep_3 <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort3, display = "none") dbf
custom_sort()
is convenience function for nested sorting. This will help you merge custom_sort
orders into one atomic vector. See the examples below.
dbf$uti <- ifelse(dbf$infection == "UTI", 1, 2) dbf$bsi <- ifelse(dbf$infection == "BSI", 1, 2) dbf$rti <- ifelse(dbf$infection == "RTI", 1, 2) # 3 levels: "UTI" > "BSI" > "RTI" dbf$c_sort1b <- custom_sort(dbf$uti, dbf$bsi, dbf$rti) dbf$ep_1b <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort1b, display = "none") # Same results dbf[c("ep_1", "ep_1b")]
Some case definition may specify that events
recur ("rolling" episodes
). For example, relapses of the same infection after successful treatment. It may be important to treat this as the same episode
of the initial infection. This differs from a "fixed" episode
, where the relapse is handled as a separate episode
.
The recurrence_length
argument provides the cut-off period for recurrent events
. For example, a recurrence_length
of 10
means that; repeat events
within 11
days of the last duplicate event
(of the initial index event
) are considered a recurrence of the index event
.
If recurrence_length
is not supplied, it's assumed to be the same as the case_length
.
Figure 3a: rolling episodes
# 6-day (5-day difference) episodes with 11-days recurrence periods - rolling episodes ds$r1 <- episodes(date = ds$date, case_length = 5, recurrence_length = 10, episode_type = "rolling", display = "none") ds$r1
plot_epid(epid = ds$r1, date= ds$date, case_length = 5, recurrence_length = 10)
schema(ds$r1, show_labels = c("length_arrow", "length_label", "date"))
Similarly, for recurrence_length
, you can use a cut-off range. For example, a recurrence_length
of r format(number_line(13, 15))
means that; repeat events
between 14
and 16
days after the last duplicate event
are considered a recurrence of the index event
.
Figure 3b: rolling episodes with a cut-off range
rng <- number_line(13, 15); rng ds$r2 <- episodes(date = ds$date, case_length = 5, recurrence_length = rng, episode_type = "rolling", display = "none") ds$r2
plot_epid(epid = ds$r2, date= ds$date, case_length = 5, recurrence_length = rng)
schema(ds$r2, show_labels = c("length_arrow", "length_label", "date"))
You can skip events
before the 14th
by changing skip_if_b4_lengths
to TRUE
.
Figure 3c: rolling episodes where events before the the cut-off range are not skipped
rng <- number_line(13, 15); rng ds$r3 <- episodes(date = ds$date, case_length = 5, recurrence_length = rng, skip_if_b4_lengths = F, episode_type = "rolling", display = "none") ds$r3
plot_epid(epid = ds$r3, date= ds$date, case_length = 5, recurrence_length = rng)
schema(ds$r3, show_labels = c("length_arrow", "length_label", "date"))
As with case_length
, you can supply multiple recurrence_lengths
as a list.
Figure 3d: rolling episodes with multiple recurrence_lengths
rngs <- list(number_line(13, 14), number_line(15,18)); rngs ds$r4 <- episodes(date = ds$date, case_length = 5, recurrence_length = rngs, skip_if_b4_lengths = F, episode_type = "rolling", display = "none") ds$r4
Each "rolling" episode
has at least two index-events
. one for the whole episode
and one for each period of recurrence. By default, the last event
of the previous window
is the index event
for the next period of recurrence. Changing reference_event
to FALSE
, specifies that the first event
in the previous window is used. See a demonstration of this below.
If the case_length
is the same the recurrence_length
and you'll create a "fixed" episode
.
df <- c("01/04/2019", "11/04/2019", "16/04/2019","21/04/2019", "07/05/2019") df <- data.frame(date= as.Date(df, "%d/%m/%Y")) # 15-day fixed episodes are the same as 15-day rolling episodes where `reference_event` is FALSE df$f1 <- episodes(date = df$date, case_length = 14, display = "none") df$r1 <- episodes(date = df$date, case_length = 14, reference_event = F, episode_type = "rolling", display = "none") df
Figure 4a: Last event in a window taken as the reference event for the next recurrence window (default)
df$r2 <- episodes(date = df$date, case_length = 14, recurrence_length = 16, episode_type = "rolling", display = "none") df$r2
plot_epid(epid = df$r2, date = df$date, case_length =14, recurrence_length = 16)
schema(df$r2, show_labels = c("length_arrow", "length_label", "date"))
Figure 4b: First event in a window taken as the reference event for the next recurrence window
df$r3 <- episodes(date = df$date, case_length = 14, recurrence_length = 16, reference_event = F, episode_type = "rolling", display = "none") df$r3
plot_epid(epid = df$r3, date = df$date, case_length =14, recurrence_length = 16)
schema(df$r3, show_labels = c("length_arrow", "length_label", "date"))
You can use the case_for_recurrence
argument to specify that both case
and recurrence events
should have a case window
. The default behaviour is that only case events
have a case window
. See a demonstration of this below.
The wind_id
slot shows the window
of every event
, while wind_nm
shows what type of window
it is.
Example 1
df <- c("01/04/2019", "11/04/2019", "16/04/2019","21/04/2019","25/04/2019", "07/05/2019") df <- data.frame(date= as.Date(df, "%d/%m/%Y"))
Figure 5a: Case windows for case events only (default)
df$r4 <- episodes(date = df$date, case_length = 10, episode_type = "rolling", display = "none") df$r4
plot_epid(epid = df$r4, date = df$date, case_length =10, recurrence_length = 10)
schema(df$r4, show_labels = c("length_arrow", "length_label", "date"))
Figure 5b: Case windows for case and recurrence events
df$r5 <- episodes(date = df$date, case_length = 10, episode_type = "rolling", case_for_recurrence = T, display = "none") df$r5
plot_epid(epid = df$r5, date = df$date, case_length =10, recurrence_length = 10)
schema(df$r5, show_labels = c("length_arrow", "length_label", "date"))
Example 2
date <- seq.Date(as.Date("01/04/2019", "%d/%m/%Y"), as.Date("21/04/2019","%d/%m/%Y"), by="3 day") # Example 2 df2 <- data.frame(date = date, sn = 1:length(date)) # dates df2$date
Figure 5c: Case windows for case events only (default)
df2$r1 <- episodes(date = df2$date, case_length = 6, recurrence_length = 4, sn=df2$sn, episode_type = "rolling", display = "none") df2$r1
plot_epid(epid = df2$r1, date = df2$date, case_length =6, recurrence_length = 4)
schema(df2$r1, show_labels = c("length_arrow", "length_label", "date"))
Figure 5d: Case windows for case and recurrence events
df2$r2 <- episodes(date = df2$date, case_length = 6, recurrence_length = 4, case_for_recurrence = T, sn=df2$sn, episode_type = "rolling", display = "none") df2$r1
plot_epid(epid = df2$r2, date = df2$date, case_length =6, recurrence_length = 4)
schema(df2$r2, show_labels = c("length_arrow", "length_label", "date"))
Notice that in the 4 examples, all events
have been assigned to the same episode
but the number of recurrent events
differs.
Some case definitions may only need you to track the first n
-number of episodes
or specify that they only recur n
-number of times. You can incorporate these requirements into your analyses using episodes_max
and rolls_max
respectively. When each strata
(see stratified episode tracking) of the dataset has the maximum number of episodes
as specified by episodes_max
, subsequent events
are skipped. The default option for episodes_max
is Inf
i.e. continue episode tracking
until every event
is linked to an episode
. See the examples below.
data("infections_3"); dbs <- infections_3[c("pid","date")]; # dates dbs$date
Figure 6a: Maximum of one fixed episode per strata
dbs$eps_1 <- episodes(strata = dbs$pid, date = dbs$date, case_length = 3, episodes_max = 1, display = "none") dbs$eps_1
plot_epid(epid = dbs$eps_1, date = dbs$date, case_length =3)
schema(dbs$eps_1, show_labels = c("length_arrow", "length_label", "date"))
Figure 6b: Maximum of two fixed episodes per strata
dbs$eps_2 <- episodes(strata = dbs$pid, date = dbs$date, case_length = 3, episodes_max = 2, display = "none") dbs$eps_2
plot_epid(epid = dbs$eps_2, date = dbs$date, case_length = 3)
schema(dbs$eps_2, show_labels = c("length_arrow", "length_label", "date"))
rolls_max
is the maximum number of times an event
can recur. This is when a "rolling" episode
ends. A new one will only begin if the number of episodes
in that strata
is less than episode_max
. The default option for rolls_max
is Inf
i.e. continue tracking recurrent events
until there are no more events
within the recurrence period
of the reference event
. See the examples below.
Figure 6c: Infinite recurrence periods per episode per strata (default)
dbs$eps_4 <- episodes(strata = dbs$pid, date = dbs$date, case_length = 2, episode_type = "rolling", display = "none") dbs$eps_4
plot_epid(epid = dbs$eps_4, date = dbs$date, case_length =2, recurrence_length = 2)
schema(dbs$eps_4, show_labels = c("length_arrow", "length_label", "date"))
Figure 6d: Maximum of one recurrence period per episode per strata
dbs$eps_6 <- episodes(strata = dbs$pid, date = dbs$date, case_length = 2, episode_type = "rolling", rolls_max = 1, display = "none") dbs$eps_6
plot_epid(epid = dbs$eps_6, date = dbs$date, case_length =2, recurrence_length = 2)
schema(dbs$eps_6, show_labels = c("length_arrow", "length_label", "date"))
Note that after tracking the last recurrent event
, the function will also check for duplicates
within the case window
of the last recurrent event
. See below.
Figure 6e: Maximum of one recurrence period per episode per strata when case_for_recurrence is TRUE
dbs$eps_7 <- episodes(strata = dbs$pid, date = dbs$date, case_length = 2, episode_type = "rolling", rolls_max = 1, case_for_recurrence = T, display = "none") dbs$eps_7
plot_epid(epid = dbs$eps_7, date = dbs$date, case_length =2, recurrence_length = 2)
schema(dbs$eps_7, show_labels = c("length_arrow", "length_label", "date"))
Some case definitions can involve events
that occur over a defined period e.g. hospital admissions. episodes()
can track such event periods
into episodes
. This type of episode tracking
is recommended when the endpoint of an event
is known. For instance, it's near impossible to know the exact point in time an individual diagnosed with the common cold recovers however, it's a reasonable assumption that another diagnosis 3 months after the initial one is not the same infection. For situations like this, the recommended approach is to use the methods we've covered so far. In contrast, hospital admissions typically have recorded admission and discharge times or dates, and so interval grouping is recommended for tracking such events
. This being said, in several situations both approaches can be used interchangeably. See below.
# Dates dates <- seq(as.Date("01/04/2019", "%d/%M/%Y"), as.Date("20/04/2019", "%d/%M/%Y"), "4 days") # Periods periods <- number_line(dates, dates + 4) dates periods # Track fixed episodes from events with a 5 day cut-off mth1 <- episodes(date = dates, case_length = 4, display = "none") # Track fixed episodes from periods that are 5 days long mth2 <- episodes(date = periods, case_length = 0, display = "none") # Same results mth1; mth2 # Track rolling episodes from events using a 5 day cut-off mth3 <- episodes(date = dates, case_length = 4, episode_type = "rolling", display = "none") # Track rolling episode from periods that are 5 days long mth4 <- episodes(date = periods, case_length = 0, episode_type = "rolling", display = "none") # Same results mth3; mth4
As mentioned earlier, case_length
and recurrence_lengths
are calculated from the endpoints of events period
. Therefore, a case_length
of 5 is actually the period within 6 days (day difference of 5) of the endpoint. See examples of this below.
event_period <- number_line(as.Date("01/04/2019", "%d/%m/%Y"), as.Date("10/04/2019", "%d/%m/%Y")) event_period # Case_length of 5 translates to; epid_windows(event_period, 5) # Case_length of 0 translates to; epid_windows(event_period, 0) # Case_length of -1 translates to; epid_windows(event_period, -1) # Case_length of -5 translates to; epid_windows(event_period, -5) # Case_length of `index_window(x)` translates to; epid_windows(event_period, index_window(event_period))
In interval grouping, event_period
that overlap with the index event period
are linked together as an episode
.
hos_admin <- diyar::hospital_admissions hos_admin$admin_period <- number_line(hos_admin$admin_dt, hos_admin$discharge_dt) # Hospital stay hos_admin$admin_period
Figure 7a: Tracking periods of hospital stay (admission -> discharge)
hs_epids_a <- episodes(date = hos_admin$admin_period, case_length = list(0, index_window(hos_admin$admin_period)), group_stats = T, display = "none")
plot_epid(epid = hs_epids_a, date = hos_admin$admin_period, case_length = 0)
schema(hs_epids_a, show_labels = c("length_arrow", "length_label", "date"))
Figure 7b: Tracking periods of hospital stay up to 20 days after the first discharge
hs_epids_b <- episodes(date=hos_admin$admin_period, list(20, index_window(hos_admin$admin_period)), group_stats = T, display = "none")
plot_epid(epid = hs_epids_b, date = hos_admin$admin_period, case_length = 20)
schema(hs_epids_b, show_labels = c("length_arrow", "length_label", "date"))
Figure 7c: Tracking periods of continuous hospital stay i.e. re-admission within 15 days
hs_epids_c <- episodes(date=hos_admin$admin_period, case_length = list(0, index_window(hos_admin$admin_period)), recurrence_length = list(15, index_window(hos_admin$admin_period)), episode_type = "rolling", group_stats = T, display = "none")
plot_epid(epid = hs_epids_c, date = hos_admin$admin_period, case_length = 0, recurrence_length = 15)
schema(hs_epids_c, show_labels = c("length_arrow", "length_label", "date"))
There are 7 ways event periods
can overlap (case_overlap_methods
/recurrence_overlap_methods
). These are "exact"
, "reverse"
, "across"
, "inbetween"
, "chain"
, "aligns_start"
and "aligns_end"
. These are covered in more detail in an accompanying vignette. You can specify that only event periods
which overlap in a certain way should be linked as episodes
. See below for a demonstration of this.
The 7 overlap_methods
are mutually exclusive.
There are 2 convenience methods - "none"
and "overlap"
. These are mutually inclusive with the others, and so are prioritised.
epi_grp_func <- function(x){ epids <- episodes(date=hos_admin$admin_period, sn=hos_admin$rd_id, case_length = index_window(hos_admin$admin_period), case_overlap_methods = x, group_stats = T, display = "none") # for this demonstration, limit to overlapped periods epids[epids@epid_total>1] } # Methods methods <- list( # Identical intervals exact = "exact", # Overlapping intervals across= "across", # Intervals with aligned start points aligns_start = "aligns_start", # Intervals with aligned endpoints aligns_end = "aligns_end", # Chained intervals chain = "chain", # Intervals occurring completely within others inbetween = "inbetween", # Chained intervals and those occurring completely within others cb1 = "chain|inbetween", # Chained intervals, identical intervals and those occurring completely within others cb2 = "exact|chain|inbetween", # Chained intervals, overlapping intervals and those with aligned endpoint cb3 = "across|chain|aligns_end" ) epids <- lapply(methods, epi_grp_func) names(epids) <- names(methods) # Results epids["chain"] epids["cb2"]
Only showing events
that have been linked as episodes
Figure 8a: Episodes when considering only "exact" overlap method
eps <- epids[["exact"]] plot_epid(epid = eps, date = hos_admin$admin_period[eps@sn], case_length = number_line(-hos_admin$admin_period@.Data[eps@sn], 0))
schema(epids[["exact"]], show_labels = c("length_arrow", "length_label", "date"))
Figure 8b: Episodes when considering only "chain" overlap method
eps <- epids[["chain"]] plot_epid(epid = eps, date = hos_admin$admin_period[eps@sn], case_length = number_line(-hos_admin$admin_period@.Data[eps@sn], 0))
schema(epids[["chain"]], show_labels = c("length_arrow", "length_label", "date"))
Figure 8c: Episodes when considering "chain" or "inbetween" overlap method
eps <- epids[["cb1"]] plot_epid(epid = eps, date = hos_admin$admin_period[eps@sn], case_length = number_line(-hos_admin$admin_period@.Data[eps@sn], 0))
schema(epids[["cb1"]], show_labels = c("length_arrow", "length_label", "date"))
By default, overlap_methods
is a record-level
argument. This means that the option that'll be used is that of each event
(record). See below.
x <- c(number_line(1, 5), number_line(1, 5), number_line(2, 3), 1) m <- c("aligns_start", "exact", "inbetween", "aligns_start") dfe <- data.frame(x = x, m = m, stringsAsFactors = FALSE) dfe$eps_1 <- episodes(date = dfe$x, case_length = index_window(dfe$x), case_overlap_methods = dfe$m) dfe
However, you can use it as an episode-level
argument. This means that the option that'll be used is that of the index event
. Finally, you can use it as both a record-level
and episode-level
argument. You do these by supplying overlap_methods
as a named list
- r
for record-level
, e
for record-level
or b
for both
. See below.
dfe$eps_2 <- episodes(date = dfe$x, case_length = index_window(dfe$x), case_overlap_methods = list(e = dfe$m)) dfe$eps_3 <- episodes(date = dfe$x, case_length = index_window(dfe$x), case_overlap_methods = list(b = dfe$m)) dfe
pa <- c("28/03/2019", "01/04/2019", "03/04/2019", "07/04/2019","12/04/2019") pz <- c("31/03/2019", "10/04/2019", "05/04/2019", "09/04/2019","13/04/2019") pa <- as.Date(pa, "%d/%m/%Y") pz <- as.Date(pz, "%d/%m/%Y") periods <- number_line(pa, pz) periods
As seen earlier, a positive "length" means that; event periods
within n
-days AFTER the end_point()
of the index event period
will be linked to the same episode
. See another example below.
Figure 9a: Fixed episodes with a case_length of 6
eps_a <- episodes(date = periods, case_length = 6, group_stats = T, display = "none") eps_a
plot_epid(epid = eps_a, date = periods, case_length = 6)
schema(eps_a, show_labels = c("length_arrow", "length_label", "date"))
In contrast, a negative "length" means that; only event periods
within n
-days BEFORE the end_point()
of the index event period
will be linked to the same episode
.
Figure 9b: Fixed episodes with a case_length of -2
eps_b <- episodes(date = periods, case_length = -2, group_stats = T, display = "none") eps_b
plot_epid(epid = eps_b, date = periods, case_length = -2)
schema(eps_b, show_labels = c("length_arrow", "length_label", "date"))
To include those that overlap with the index event itself, supply the corresponding case_length
or recurrence_lenght
. index_window()
is handy for doing this. You can also achieve this with the old include_index_period
argument. See below.
Figure 9c: Fixed episodes with a case_length of -2
eps_c <- episodes(date = periods, case_length = list(-2, index_window(periods)), group_stats = T, display = "none") eps_c
plot_epid(epid = eps_c, date = periods, case_length = -2)
schema(eps_c, show_labels = c("length_arrow", "length_label", "date"))
Figure 9d: Fixed episodes with a case_length of -6
eps_d <- episodes(date = periods, case_length =-6, group_stats = T, display = "none") eps_d
plot_epid(epid = eps_d, date = periods, case_length = -6)
schema(eps_d, show_labels = c("length_arrow", "length_label", "date"))
You'll usually want to track episodes
separately within different groups (strata
). For example patients, pathogens, sources of infection or any combination of these. This results in episodes
from one strata
being different from others.
episodes
from different strata
can have different case_lengths
and/or recurrence_lengths
.
links()
is useful for creating group identifiers which can then be used as a strata
. See the links vignette for further details.
As an example, using the infections
dataset, a case definition may specify the following;
episodes
regardless of when they occurepisodes
have a cut-off of 8 days and a recurrence period of 31 daysepisodes
have a cut-off of 15 days and no recurrence periodBy adding the infection
column to the strata
argument, "UTI"
and "BSI"
diagnoses will not be linked to the same episode
. Furthermore, the case_length
and recurrence_length
for each will be different as specified in the epi
and recur
columns respectively.
dbs <- diyar::infections[c("date","infection")] dbs <- dbs[dbs$infection%in% c("UTI","BSI"),] dbs$epi <- ifelse(dbs$infection=="UTI", 7, 14) dbs$recur <- ifelse(dbs$infection=="UTI", 30, 0) dbs$epids <- episodes(date = dbs$date, case_length = dbs$epi, episode_type = "rolling", strata = dbs$infection, recurrence_length = dbs$recur, group_stats = TRUE, display = "none") dbs
Sub-strata
is when events
within the same strata
have different cut-offs. The case definition below demonstrates how this can be beneficial.
episodes
regardless of when they occur treated
. In both situations, there is no recurrence period.In this example, whether the infection is treated
should be viewed as the sub-strata
i.e. in a patient (strata
), an episode
is expected to last longer if it's not treated
.
dbs <- diyar::infections_4 dbs$epids <- episodes(strata= paste(dbs$pid, dbs$organism, dbs$source), date= dbs$date, case_length = dbs$epi, episode_type = "rolling", recurrence_length = dbs$recur, display = "none") dbs
There are a few things to note with stratified episode tracking;
case_length
and recurrence_length
should be consistent within each strata
otherwise, you'll inadvertently create a sub-strataEpisode tracking
with and without a sub-strata are different and could lead to different resultssub-strata
is not the same as adding that sub-strata
to the strata
argument. Adding treated
to the strata
argument will track treated
infections separately from untreated infections. While this could be the desired outcome in certain situations, the case definition above only requires that treated
infections last longer.Some case definitions are so specific that they only cover certain kinds of episodes
. In such situations, you can save time by skipping events
from episodes
that you won't need. There are 3 ways to skip such events
.
Skipped events
are assigned unique episode IDs and are flagged as "Skipped"
.
You can skip an entire strata
of the dataset by changing it to NA
. Assuming you are not interested in tracking "GP1"
and "GP2"
records, you can skip these to save time.
dbs <- head(hourly_data[c("datetime","category")], 10) dbs$subset <- ifelse(dbs$category!="GP3", NA, "group") # To skip OR dbs$epids_a <- episodes(strata= dbs$subset, date = dbs$datetime, case_length = 5, episode_unit = "hours", recurrence_length = 9, episode_type = "rolling", display = "none") # OR # Not to skip dbs$epids_b <- episodes(strata= dbs$category, date = dbs$datetime, case_length = 5, episode_unit = "hours", recurrence_length = 9, episode_type = "rolling", display = "none") dbs
You can skip episodes
when there are different levels of custom_sort
. A skip_order
of 1
means that; after every event
with the first level of your custom_sort
has been linked to episodes
, the tracking process will end.
This is handy when you're tracking episodes
from rare and common events
, the rare events
are the index events
and it's a hassle to subset the dataset.
dbf <- dbf[c("date", "infection")] dbf$c_sort <- ifelse(dbf$infection == "UTI", 1, ifelse(dbf$infection == "BSI", 2, 3)) # Skip episodes that would begin with BSI/UTI records dbf$ep_a <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort, skip_order = 1, display = "none") dbf$ep_a_d <- dbf$ep_a@epid_dataset # Don't skip any episodes dbf$ep_b <- episodes(date = dbf$date, case_length = 15, data_source = dbf$infection, custom_sort = dbf$c_sort, display = "none") dbf$ep_b_d <- dbf$ep_b@epid_dataset dbf
Notice that in the skipped example (ep_a
), no episode
begins with either a "BSI"
or "RTI"
record.
The third mechanism is partly a misnomer. episodes
that do not contain records from certain data_sources
are unlinked not skipped. A strata
without the required combination of data_sources
will be skipped.
dttm <- function(x) as.POSIXct(x, "UTC", nformat="%d/%m/%Y %H:%M:%S") dbg <- data.frame(date = seq.POSIXt(dttm("01/04/2018 00:00:00"), dttm("31/05/2018 00:00:00"), by = "3 days")) dbg <- head(dbg, 11) dbg$recurrence <- 2 dbg$case_len <- 6 dbg$dataset <- paste("DS",c(1:3, rep(c(1:2),2), rep(3,4)), sep="") # Don't skip any dbg$ep_a <- episodes(date = dbg$date, case_length = dbg$case_len, episode_type ="rolling", recurrence_length = dbg$recurrence, data_source = dbg$dataset, display = "none") # Breakup episodes without events from the `DS3` `data_source` # The `DS3` event MUST BE linked to events from other `data_sources` dbg$ep_b <- episodes(date = dbg$date, case_length = dbg$case_len, episode_type ="rolling", recurrence_length = dbg$recurrence, data_source = dbg$dataset, data_links = "DS3", display = "none") # Breakup episodes without events from the `DS3` `data_source` # The `DS3` event DOESN'T HAVE to be linked to events from other `data_sources` dbg$ep_c <- episodes(date = dbg$date, case_length = dbg$case_len, episode_type ="rolling", recurrence_length = dbg$recurrence, data_source = dbg$dataset, data_links = list(g="DS3"), display = "none") # Breakup episodes without events from the `DS3` `data_source` # The `DS3` event MUST BE linked to events from the `DS1` `data_source` dbg$ep_d <- episodes(date = dbg$date, case_length = dbg$case_len, episode_type ="rolling", recurrence_length = dbg$recurrence, data_source = dbg$dataset, data_links = list(l=c("DS3","DS1")), display = "none") dbg[c("date", "dataset", "ep_a", "ep_b", "ep_c", "ep_d")]
episodes
in other units of timeepisodes
can be tracked in other units of time (episode_unit
). Acceptable options are "seconds"
, "minutes"
, "hours"
, "days"
, "weeks"
, "months"
or "years"
. Below is an example of tracking episodes
by the hour.
dbs <- diyar::hourly_data # Each unit is relative to a predefined number of seconds. diyar::episode_unit # 1-day fixed episodes episodes(date = dbs$datetime, case_length = 1, episode_unit = "days", group_stats = TRUE, display = "none") # 5-hr fixed episodes episodes(date = dbs$datetime, case_length = 5, episode_unit = "hours", group_stats = TRUE, display = "none") # 5-hr rolling episodes episodes(date = dbs$datetime, case_length = 5, episode_unit = "hours", group_stats = TRUE, episode_type = "rolling", display = "none")
strata
from links()
You can used pid
objects from links()
as a strata
.
dbs <- diyar::infections[c("date", "infection")]; dbs # familiar unique record ids for easy reference - optional dbs$rd_id <- c(640,17,58,21,130,79,45,300,40,13,31) # `strata` based on matching sources of infection dbs$pids <- links(sn = dbs$rd_id, criteria = dbs$infection, display = "none") # stratified episode tracking dbs$epids <- episodes(sn = dbs$rd_id, date = dbs$date, strata = dbs$pids, case_length = 10, display = "none") dbs
You use numeric
values instead of date
or datetime
values for the date
argument. For instance, if you use 0-2
, it does not matter what they represent i.e. the number of days from a particular date or number of seconds from a particular time. It only matters that 0
is before 1
, which in turn is before 2
. When using the function this way, episode_unit
is ignored.
vals <- c(8.1,6,12,8.5,12,3,8,15,5,7) vals episodes(date = vals, case_length = .5, group_stats = T, display = "none") episodes(date = vals, case_length = 5, group_stats = T, display = "none") episodes(date = vals, case_length = 100, group_stats = T, display = "none")
case_lengths
or recurrence_lengths
You can incorporate missing and infinite cut-offs into your analyses. Events with missing cut-off will not be used as an index event however, they can be tagged as a duplicate of another index event. Index events with an infinite cut-off will be linked to every other event in its strata
.
vals <- 1:10 episodes(date = vals, case_length = Inf, display = "none") episodes(date = vals, case_length = NA_real_, display = "none")
There are a variety of ways to use episodes()
. In general, the following steps can serve as a quick guide on what to do;
strata
in your dataset."fixed"
or "rolling"
episodes.case_length
, and/or recurrence_length
. The values should be unique to each strata
unless you require a sub-strata.from_last
to TRUE
if you want the index event
to be the earliest or most recent event
.event
to be the earliest or most recent event of a particular type of event, use custom_sort
in combination with from_last
. See user-defined index eventsepisodes
by the minute, hour or day e.t.c., and set episode_unit
accordingly. Be sure to supply a datetime
to the date
argument if episode_unit
is less than "days"
.These functions should prove useful for epidemiological analyses. However, it's can also be useful for analyses in any field where the aim is to track events into temporal groups.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.