collapse_episode: Group records no more than n days apart as episodes
In healthdb: Working with Healthcare Databases

collapse_episode

R Documentation

Group records no more than n days apart as episodes

Description

This function is useful for collapsing, e.g., medication dispensation or hospitalization, records into episodes if the records' dates are no more than n days gap apart. The length of the gap can be relaxed by another grouping variable.

Usage

collapse_episode(
  data,
  clnt_id,
  start_dt,
  end_dt = NULL,
  gap,
  overwrite = NULL,
  gap_overwrite = 99999,
  .dt_trans = data.table::as.IDate,
  ...
)

Arguments

`data`	A data.frame or remote table that contains the id and date variables.
`clnt_id`	Column name of subject/person ID.
`start_dt`	Column name of the starting date of records.
`end_dt`	Column name of the end date of records. The default is NULL assuming the record last one day and only the start date will be used to calculate the gaps between records.
`gap`	A number in days that will be used to separate episodes. For example, gap = 7 means collapsing records no more than 7 days apart. Note that the number of days apart will be calculated as numeric difference between two days, so that 2020-01-07 and 2020-01-01 is considered as 6 days apart.
`overwrite`	Column name of a grouping variable determining whether the consecutive records are related and should have a different gap value. For example, dispensing records may have the same original prescription number, and a different gap value can be assigned for such situation, e.g., the days between two records is > gap, but these records still belong to the same prescription.
`gap_overwrite`	A different gap value used for related records. The default is 99999, which practically means all records with the same overwrite variable will be collapsed.
`.dt_trans`	Function to transform start_dt/end_dt. For data.frame input only. Default is `data.table::as.IDate()`.
`...`	Additional arguments passing to the .dt_trans function. For data.frame input only.

Value

The original data.frame or remote table with new columns indicating episode grouping. The new variables include:

epi_id: unique identifier of episodes across the whole data set
epi_no: identifier of episodes within a client/group
epi_seq: identifier of records within an episode
epi_start/stop_dt: start and end dates corresponding to epi_id

Examples

# make toy data
df <- make_test_dat() %>%
dplyr::select(clnt_id, dates)

head(df)

# collapse records no more than 90 days apart
# end_dt could be absent then it is assumed to be the same as start_dt
collapse_episode(df, clnt_id, start_dt = dates, gap = 90)

healthdb documentation built on April 11, 2025, 5:43 p.m.