load_src: Low level functions for loading data

Description Usage Arguments Details Value Examples

View source: R/data-load.R

Description

Data loading involves a cascade of S3 generic functions, which can individually be adapted to the specifics of individual data sources. A the lowest level, load_scr is called, followed by load_difftime(). Functions up the chain, are described in load_id().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
load_src(x, ...)

## S3 method for class 'src_tbl'
load_src(x, rows, cols = colnames(x), ...)

## S3 method for class 'character'
load_src(x, src, ...)

load_difftime(x, ...)

## S3 method for class 'mimic_tbl'
load_difftime(
  x,
  rows,
  cols = colnames(x),
  id_hint = id_vars(x),
  time_vars = ricu::time_vars(x),
  ...
)

## S3 method for class 'eicu_tbl'
load_difftime(
  x,
  rows,
  cols = colnames(x),
  id_hint = id_vars(x),
  time_vars = ricu::time_vars(x),
  ...
)

## S3 method for class 'hirid_tbl'
load_difftime(
  x,
  rows,
  cols = colnames(x),
  id_hint = id_vars(x),
  time_vars = ricu::time_vars(x),
  ...
)

## S3 method for class 'aumc_tbl'
load_difftime(
  x,
  rows,
  cols = colnames(x),
  id_hint = id_vars(x),
  time_vars = ricu::time_vars(x),
  ...
)

## S3 method for class 'miiv_tbl'
load_difftime(
  x,
  rows,
  cols = colnames(x),
  id_hint = id_vars(x),
  time_vars = ricu::time_vars(x),
  ...
)

## S3 method for class 'character'
load_difftime(x, src, ...)

Arguments

x

Object for which to load data

...

Generic consistency

rows

Expression used for row subsetting (NSE)

cols

Character vector of column names

src

Passed to as_src_tbl() in order to determine the data source

id_hint

String valued id column selection (not necessarily honored)

time_vars

Character vector enumerating the columns to be treated as timestamps and thus returned as base::difftime() vectors

Details

A function extending the S3 generic load_src() is expected to load a subset of rows/columns from a tabular data source. While the column specification is provided as character vector of column names, the row subsetting involves non-standard evaluation (NSE). Data-sets that are included with ricu are represented by prt objects, which use rlang::eval_tidy() to evaluate NSE expressions. Furthermore, prt objects potentially represent tabular data split into partitions and row-subsetting expressions are evaluated per partition (see the part_safe flag in prt::subset.prt()). The return value of load_src() is expected to be of type data.table.

Timestamps are represented differently among the included data sources: while MIMIC-III and HiRID use absolute date/times, eICU provides temporal information as minutes relative to ICU admission. Other data sources, such as the ICU dataset provided by Amsterdam UMC, opt for relative times as well, but not in minutes since admission, but in milliseconds. In order to smoothen out such discrepancies, the next function in the data loading hierarchy is load_difftime(). This function is expected to call load_src() in order to load a subset of rows/columns from a table stored on disk and convert all columns that represent timestamps (as specified by the argument time_vars) into base::difftime() vectors using mins as time unit.

The returned object should be of type id_tbl, with the ID vars identifying the ID system the times are relative to. If for example all times are relative to ICU admission, the ICU stay ID should be returned as ID column. The argument id_hint may suggest an ID type, but if in the raw data, this ID is not available, load_difftime() may return data using a different ID system. In MIMIC-III, for example, data in the labevents table is available for subject_id (patient ID) pr hadm_id (hospital admission ID). If data is requested for icustay_id (ICU stay ID), this request cannot be fulfilled and data is returned using the ID system with the highest cardinality (among the available ones). Utilities such as change_id() can the later be used to resolve data to icustay_id.

Value

A data.table object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
if (require(mimic.demo)) {
tbl <- mimic_demo$labevents
col <- c("charttime", "value")

load_src(tbl, itemid == 50809)

colnames(
  load_src("labevents", "mimic_demo", itemid == 50809, cols = col)
)

load_difftime(tbl, itemid == 50809)

colnames(
  load_difftime(tbl, itemid == 50809, col)
)

id_vars(
  load_difftime(tbl, itemid == 50809, id_hint = "icustay_id")
)

id_vars(
  load_difftime(tbl, itemid == 50809, id_hint = "subject_id")
)
}

ricu documentation built on Oct. 7, 2021, 9:06 a.m.