assign_episode | R Documentation |
assign_episode
will create an episode grouping (a vector) for rows in a cohort that are within a threshold difference (usually in days).
assign_episode(data, grp_id, date, threshold = 1, preserve_id = FALSE)
data |
A data object (tibble or data.frame). |
grp_id |
Unique ID for each member of the cohort (unquoted). |
date |
Date format (e.g. YYYY-mm-dd) for entry point for record (unquoted). |
threshold |
Integer value for acceptable difference in days between successive record (defaults to |
preserve_id |
Logical value, if set to |
Data when organized as a cohort will typically have a long-format with multiple entries for an individual monitored over time. Often, subsequent entries between these records are very close in time
should be assigned to a episode group. The logic involves comparing the time differences in adjacent entries within each grouping. Based upon the threshold provided and the initial date entry, individuals
are rolled-up into episodes that fall within the threshold time interval. In order to compare the cohort, the data provided is sorted by id and date. Consequently, the output will also be in that order; if joining
back to the original data-set, ensure the data is sorted by the provided columns. Since the logic requires looping by individuals, the function is written using data.table
.
This function is similar to collapse_timesteps
; however, instead of comparing data formatted in time steps (i.e. with entry and exit dates), assign_episode
operates on
data with a single date column reference to determine how to assign individuals to various episode groupings. Where the former may be used to collapse similar time steps, the output
from this function will likely be used to analyze differences between and within episode groupings for an individual. If the threshold value needs to change through time, this feature
is not directly supported but by sub-setting the data based upon the date ranges the threshold changes, this is possible to include (see example).
An integer vector (ordered by grp_id and dates) or a list containing the original id and collapse id.
# Load libraries
library(dplyr); library(data.table); library(lubridate); library(magrittr); library(tibble);
# Create fake data for scenarios
test_data <- tribble(~grp_id, ~date,
1, '2020-01-01',
1, '2020-01-01',
1, '2020-01-03',
1, '2020-01-04',
2, '2020-01-01',
2, '2020-09-10',
2, '2020-09-21',
3, '2020-01-01',
3, '2020-01-02',
3, '2020-01-21',
3, '2020-01-22',
3, '2020-04-22',
3, '2021-06-09') %>%
dplyr::mutate_at(vars(contains('date')), ymd)
# Create vector of outputs (ensure original dataset is sorted)
test_data$episode_group <- assign_episode(data = test_data,
grp_id = grp_id,
date = date,
threshold = 10)
# Assign the max/min of episodes
test_data %>%
group_by(grp_id, episode_group) %>%
mutate(min = min(date),
max = max(date))
# With changing thresholds, assign episodes
test_data %>%
mutate(epi_thresh_chg = case_when(date < ymd('2020-01-21') ~ assign_episode(., grp_id, date, threshold = 10),
TRUE ~ assign_episode(., grp_id, date, threshold = 100)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.