collapse_episode: Group records no more than n days apart as episodes

View source: R/collapse_episode.R

collapse_episodeR Documentation

Group records no more than n days apart as episodes

Description

This function is useful for collapsing, e.g., medication dispensation or hospitalization, records into episodes if the records' dates are no more than n days gap apart. The length of the gap can be relaxed by another grouping variable.

Usage

collapse_episode(
  data,
  clnt_id,
  start_dt,
  end_dt = NULL,
  gap,
  overwrite = NULL,
  gap_overwrite = 99999,
  .dt_trans = data.table::as.IDate,
  ...
)

Arguments

data

A data.frame or remote table that contains the id and date variables.

clnt_id

Column name of subject/person ID.

start_dt

Column name of the starting date of records.

end_dt

Column name of the end date of records. The default is NULL assuming the record last one day and only the start date will be used to calculate the gaps between records.

gap

A number in days that will be used to separate episodes. For example, gap = 7 means collapsing records no more than 7 days apart. Note that the number of days apart will be calculated as numeric difference between two days, so that 2020-01-07 and 2020-01-01 is considered as 6 days apart.

overwrite

Column name of a grouping variable determining whether the consecutive records are related and should have a different gap value. For example, dispensing records may have the same original prescription number, and a different gap value can be assigned for such situation, e.g., the days between two records is > gap, but these records still belong to the same prescription.

gap_overwrite

A different gap value used for related records. The default is 99999, which practically means all records with the same overwrite variable will be collapsed.

.dt_trans

Function to transform start_dt/end_dt. For data.frame input only. Default is data.table::as.IDate().

...

Additional arguments passing to the .dt_trans function. For data.frame input only.

Value

The original data.frame or remote table with new columns indicating episode grouping. The new variables include:

  • epi_id: unique identifier of episodes across the whole data set

  • epi_no: identifier of episodes within a client/group

  • epi_seq: identifier of records within an episode

  • epi_start/stop_dt: start and end dates corresponding to epi_id

Examples

# make toy data
df <- make_test_dat() %>%
dplyr::select(clnt_id, dates)

head(df)

# collapse records no more than 90 days apart
# end_dt could be absent then it is assumed to be the same as start_dt
collapse_episode(df, clnt_id, start_dt = dates, gap = 90)

healthdb documentation built on April 11, 2025, 5:43 p.m.