change_id: Switch between id types

View source: R/data-utils.R

change_idR Documentation

Switch between id types


ICU datasets such as MIMIC-III or eICU typically represent patients by multiple ID systems such as patient IDs, hospital stay IDs and ICU admission IDs. Even if the raw data is available in only one such ID system, given a mapping of IDs alongside start and end times, it is possible to convert data from one ID system to another. The function change_id() provides such a conversion utility, internally either calling upgrade_id() when moving to an ID system with higher cardinality and downgrade_id() when the target ID system is of lower cardinality


change_id(x, target_id, src, ..., keep_old_id = TRUE, id_type = FALSE)

upgrade_id(x, target_id, src, cols = time_vars(x), ...)

downgrade_id(x, target_id, src, cols = time_vars(x), ...)

## S3 method for class 'ts_tbl'
upgrade_id(x, target_id, src, cols = time_vars(x), ...)

## S3 method for class 'id_tbl'
upgrade_id(x, target_id, src, cols = time_vars(x), ...)

## S3 method for class 'ts_tbl'
downgrade_id(x, target_id, src, cols = time_vars(x), ...)

## S3 method for class 'id_tbl'
downgrade_id(x, target_id, src, cols = time_vars(x), ...)



icu_tbl object for which to make the id change


The destination id name


Passed to as_id_cfg() and as_src_env()


Passed to upgrade_id()/downgrade_id()


Logical flag indicating whether to keep the previous ID column


Logical flag indicating whether target_id is specified as ID name (e.g. icustay_id on MIMIC) or ID type (e.g. icustay)


Column names that require time-adjustment


In order to provide ID system conversion for a data source, the (internal) function id_map() must be able to construct an ID mapping for that data source. Constructing such a mapping can be expensive w.r.t. the frequency it might be re-used and therefore, id_map() provides caching infrastructure. The mapping itself is constructed by the (internal) function id_map_helper(), which is expected to provide source and destination ID columns as well as start and end columns corresponding to the destination ID, relative to the source ID system. In the following example, we request for mimic_demo, with ICU stay IDs as source and hospital admissions as destination IDs.

id_map_helper(mimic_demo, "icustay_id", "hadm_id")
## # An `id_tbl`: 136 x 4
## # Id var:      `icustay_id`
##     icustay_id hadm_id hadm_id_start hadm_id_end
##          <int>   <int> <drtn>        <drtn>
##   1     201006  198503  -3290 mins    9114 mins
##   2     201204  114648     -2 mins    6949 mins
##   3     203766  126949  -1336 mins    8818 mins
##   4     204132  157609     -1 mins   10103 mins
##   5     204201  177678   -368 mins    9445 mins
## ...
## 132     295043  170883 -10413 mins   31258 mins
## 133     295741  176805     -1 mins    3153 mins
## 134     296804  110244  -1294 mins    4599 mins
## 135     297782  167612     -1 mins     207 mins
## 136     298685  151323     -1 mins   19082 mins
## # ... with 126 more rows

Both start and end columns encode the hospital admission windows relative to each corresponding ICU stay start time. It therefore comes as no surprise that most start times are negative (hospital admission typically occurs before ICU stay start time), while end times are often days in the future (as hospital discharge typically occurs several days after ICU admission).

In order to use the ID conversion infrastructure offered by ricu for a new dataset, it typically suffices to provide an id_cfg entry in the source configuration (see load_src_cfg()), outlining the available ID systems alongside an ordering, as well as potentially a class specific implementation of id_map_helper() for the given source class, specifying the corresponding time windows in 1 minute resolution (for every possible pair of IDs).

While both up- and downgrades for id_tbl objects, as well as downgrades for ts_tbl objects are simple merge operations based on the ID mapping provided by id_map(), ID upgrades for ts_tbl objects are slightly more involved. As an example, consider the following setting: we have data associated with hadm_id IDs and times relative to hospital admission:

               1      2       3        4       5       6        7      8
data        ---*------*-------*--------*-------*-------*--------*------*---
               3h    10h     18h      27h     35h     43h      52h    59h

            0h     7h                26h        37h             53h      62h
hadm_id     |-------------------------------------------------------------|
icustay_id         |------------------|          |---------------|
                   0h                19h         0h             16h
                           ICU_1                       ICU_2

The mapping of data points from hadm_id to icustay_id is created as follows: ICU stay end times mark boundaries and all data that is recorded after the last ICU stay ended is assigned to the last ICU stay. Therefore data points 1-3 are assigned to ICU_1, while 4-8 are assigned to ICU_2. Times have to be shifted as well, as timestamps are expected to be relative to the current ID system. Data points 1-3 therefore are assigned to time stamps -4h, 3h and 11h, while data points 4-8 are assigned to -10h, -2h, 6h, 15h and 22h. Implementation-wise, the mapping is computed using an efficient data.table rolling join.


An object of the same type as x with modified IDs.


if (require(mimic.demo)) {
tbl <- mimic_demo$labevents
dat <- load_difftime(tbl, itemid == 50809, c("charttime", "valuenum"))

change_id(dat, "icustay_id", tbl, keep_old_id = FALSE)

ricu documentation built on July 12, 2022, 5:06 p.m.