View source: R/time_episodes.R
time_episodes | R Documentation |
This function assigns episodes to events based on a pre-defined threshold of a chosen time unit.
time_episodes(
data,
time,
time_by = NULL,
window = 1,
roll_episode = TRUE,
switch_on_boundary = TRUE,
fill = 0,
.add = FALSE,
event = NULL,
.by = NULL
)
data |
A data frame. |
time |
Date or datetime variable to use for the episode calculation.
Supply the variable using |
time_by |
Time units used to calculate episode flags.
If
|
window |
Single number defining the episode threshold.
When |
roll_episode |
Logical.
Should episodes be calculated using a rolling or fixed window?
If |
switch_on_boundary |
When an exact amount of time
(specified in |
fill |
Value to fill first time elapsed value. Only applicable when
|
.add |
Should episodic variables be added to the data? |
event |
(Optional) List that encodes which rows are events,
and which aren't.
By default |
.by |
(Optional). A selection of columns to group by for this operation.
Columns are specified using |
time_episodes()
calculates the time elapsed (rolling or fixed) between
successive events, and flags these events as episodes or not based on how much
time has passed.
An example of episodic analysis can include disease infections over time.
In this example, a positive test result represents an event and
a new infection represents a new episode.
It is assumed that after a pre-determined amount of time, a positive result represents a new episode of infection.
To perform simple time-since-event analysis, which means one
is not interested in episodes, simply use time_elapsed()
instead.
To find implicit missing gaps in time, set window
to 1
and
switch_on_boundary
to FALSE
. Any event classified as an
episode in this scenario is an event following a gap in time.
The data are always sorted before calculation and then sorted back to the input order.
4 Key variables will be calculated:
ep_id - An integer variable signifying
which episode each event belongs to.
Non-events are assigned NA
.
ep_id
is an increasing integer starting at 1.
In the infections scenario, 1 are positives within the
first episode of infection,
2 are positives within the second episode of infection and so on.
ep_id_new - An integer variable signifying the first instance of each new episode. This is an increasing integer where 0 signifies within-episode observations and >= 1 signifies the first instance of the respective episode.
t_elapsed - The time elapsed since the last event.
When roll_episode = FALSE
, this becomes the time elapsed since the
first event of the current episode.
Time units are specified in the by argument.
ep_start - Start date/datetime of the episode.
data.table
and collapse
are used for speed and efficiency.
A data.frame
in the same order as it was given.
time_elapsed time_seq_id
library(timeplyr)
library(dplyr)
library(nycflights13)
library(lubridate)
library(ggplot2)
# Say we want to flag origin-destination pairs
# that haven't seen departures or arrivals for a week
events <- flights %>%
mutate(date = as_date(time_hour)) %>%
group_by(origin, dest) %>%
time_episodes(date, "week", window = 1)
events
episodes <- events %>%
filter(ep_id_new > 1)
nrow(fastplyr::f_distinct(episodes, origin, dest)) # 55 origin-destinations
# As expected summer months saw the least number of
# dry-periods
episodes %>%
ungroup() %>%
time_by(ep_start, "week", .name = "ep_start") %>%
count(ep_start = interval_start(ep_start)) %>%
ggplot(aes(x = ep_start, y = n)) +
geom_bar(stat = "identity")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.