collapse_timesteps: Collapse timesteps for cohorts
In al-obrien/farrago: A random collection of helpful code baubles

collapse_timesteps

R Documentation

Collapse timesteps for cohorts

Description

collapse_timesteps will create a vector that flags rows in a cohort that have subsequent steps within the threshold difference (usually in days).

Usage

collapse_timesteps(
  data,
  grp_id,
  date_start,
  date_end,
  threshold = 1,
  preserve_id = FALSE
)

Arguments

`data`	A data object (tibble or data.frame).
`grp_id`	Unique ID for each member of the cohort (unquoted).
`date_start`	Date format (e.g. YYYY-mm-dd) for entry point for record (unquoted).
`date_end`	Date format (e.g. YYYY-mm-dd) for exit point for record (unquoted).
`threshold`	Integer value for acceptable difference in days between successive record (defaults to `1`).
`preserve_id`	Logical value, if set to `TRUE` will output list of original ID to ensure column merges back correctly.

Details

Data when organized as a cohort will typically have a long-format with repeated records for an individual, each with a particular date-span for that instance. Often, subsequent steps between these records are very close in time and can be collapse into a single record to simplify the cohort. The logic involves comparing the previous records date_end compared to the subsequent date_start. If the difference in those two dates is more than a specific threshold, they will be flagged as a different group in a progression of cohort steps, otherwise the two records will be flagged as the same group to collapse. In order to compare the cohort, the data provided is sorted by id and dates. Consequently, the output will also be in that order; if joining back to the original data-set, ensure the data is sorted by the provided columns. Since the logic requires looping by individuals, the function is written using data.table; however, other versions using dplyr and Rcpp were trialed as well.

Value

An integer vector (ordered by grp_id and dates) or a list containing the original id and collapse id.

Examples

# Load libraries
library(dplyr); library(data.table); library(lubridate); library(magrittr); library(tibble)
# Create fake data for scenarios
test_data <- tribble(~grp_id, ~date_start, ~date_end,
                     1, '2020-01-01', '2020-01-02',
                     1, '2020-01-03', '2020-01-04',
                     1, '2020-01-04', '2020-09-02',
                     2, '2020-01-01', '2020-09-02',
                     2, '2020-09-10', '2020-09-20',
                     2, '2020-09-21', '2020-09-22',
                     3, '2020-01-01', '2020-01-02',
                     3, '2020-01-02', '2020-01-20',
                     3, '2020-01-21', '2020-01-22',
                     3, '2020-01-22', '2020-04-02',
                     3, '2020-04-22', '2021-04-22',
                     3, '2021-06-09', '2021-06-22') %>%
  dplyr::mutate_at(vars(contains('date')), ymd)

# Create vector of outputs (ensure original dataset is sorted)
test_data$timestep_group <- collapse_timesteps(data = test_data,
                                             grp_id = grp_id,
                                             date_start = date_start,
                                             date_end = date_end,
                                             threshold = 1)

test_data %>%
  group_by(grp_id, timestep_group) %>%
  mutate(min = min(date_start),
         max = max(date_end))

al-obrien/farrago documentation built on April 14, 2023, 6:20 p.m.