assign_episode: Assign episode periods for cohorts

View source: R/calculate.R

assign_episodeR Documentation

Assign episode periods for cohorts

Description

assign_episode will create an episode grouping (a vector) for rows in a cohort that are within a threshold difference (usually in days).

Usage

assign_episode(data, grp_id, date, threshold = 1, preserve_id = FALSE)

Arguments

data

A data object (tibble or data.frame).

grp_id

Unique ID for each member of the cohort (unquoted).

date

Date format (e.g. YYYY-mm-dd) for entry point for record (unquoted).

threshold

Integer value for acceptable difference in days between successive record (defaults to 1).

preserve_id

Logical value, if set to TRUE will output list of original ID to ensure column merges back correctly.

Details

Data when organized as a cohort will typically have a long-format with multiple entries for an individual monitored over time. Often, subsequent entries between these records are very close in time should be assigned to a episode group. The logic involves comparing the time differences in adjacent entries within each grouping. Based upon the threshold provided and the initial date entry, individuals are rolled-up into episodes that fall within the threshold time interval. In order to compare the cohort, the data provided is sorted by id and date. Consequently, the output will also be in that order; if joining back to the original data-set, ensure the data is sorted by the provided columns. Since the logic requires looping by individuals, the function is written using data.table.

This function is similar to collapse_timesteps; however, instead of comparing data formatted in time steps (i.e. with entry and exit dates), assign_episode operates on data with a single date column reference to determine how to assign individuals to various episode groupings. Where the former may be used to collapse similar time steps, the output from this function will likely be used to analyze differences between and within episode groupings for an individual. If the threshold value needs to change through time, this feature is not directly supported but by sub-setting the data based upon the date ranges the threshold changes, this is possible to include (see example).

Value

An integer vector (ordered by grp_id and dates) or a list containing the original id and collapse id.

Examples

# Load libraries
library(dplyr); library(data.table); library(lubridate); library(magrittr); library(tibble);
# Create fake data for scenarios
test_data <- tribble(~grp_id, ~date,
                     1, '2020-01-01',
                     1, '2020-01-01',
                     1, '2020-01-03',
                     1, '2020-01-04',
                     2, '2020-01-01',
                     2, '2020-09-10',
                     2, '2020-09-21',
                     3, '2020-01-01',
                     3, '2020-01-02',
                     3, '2020-01-21',
                     3, '2020-01-22',
                     3, '2020-04-22',
                     3, '2021-06-09') %>%
  dplyr::mutate_at(vars(contains('date')), ymd)

# Create vector of outputs (ensure original dataset is sorted)
test_data$episode_group <- assign_episode(data = test_data,
                                          grp_id = grp_id,
                                          date = date,
                                          threshold = 10)
# Assign the max/min of episodes
test_data %>%
  group_by(grp_id, episode_group) %>%
  mutate(min = min(date),
         max = max(date))

# With changing thresholds, assign episodes
test_data %>%
  mutate(epi_thresh_chg = case_when(date < ymd('2020-01-21') ~ assign_episode(., grp_id, date, threshold =  10),
                                    TRUE ~ assign_episode(., grp_id, date, threshold = 100)))


al-obrien/farrago documentation built on April 14, 2023, 6:20 p.m.