id_tbl: Tabular ICU data classes

View source: R/tbl-class.R

id_tblR Documentation

Tabular ICU data classes

Description

In order to simplify handling or tabular ICU data, ricu provides S3 classes, id_tbl, ts_tbl, and win_tbl. These classes essentially consist of a data.table object, alongside some meta data and S3 dispatch is used to enable more natural behavior for some data manipulation tasks. For example, when merging two tables, a default for the by argument can be chosen more sensibly if columns representing patient ID and timestamp information can be identified.

Usage

id_tbl(..., id_vars = 1L)

is_id_tbl(x)

as_id_tbl(x, id_vars = NULL, by_ref = FALSE)

ts_tbl(..., id_vars = 1L, index_var = NULL, interval = NULL)

is_ts_tbl(x)

as_ts_tbl(x, id_vars = NULL, index_var = NULL, interval = NULL, by_ref = FALSE)

win_tbl(..., id_vars = NULL, index_var = NULL, interval = NULL, dur_var = NULL)

is_win_tbl(x)

as_win_tbl(
  x,
  id_vars = NULL,
  index_var = NULL,
  interval = NULL,
  dur_var = NULL,
  by_ref = FALSE
)

## S3 method for class 'id_tbl'
as.data.table(x, keep.rownames = FALSE, by_ref = FALSE, ...)

## S3 method for class 'id_tbl'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

validate_tbl(x)

Arguments

...

forwarded to data.table::data.table() or generic consistency

id_vars

Column name(s) to be used as id column(s)

x

Object to query/operate on

by_ref

Logical flag indicating whether to perform the operation by reference

index_var

Column name of the index column

interval

Time series interval length specified as scalar-valued difftime object

dur_var

Column name of the duration column

keep.rownames

Default is FALSE. If TRUE, adds the input object's names as a separate column named "rn". keep.rownames = "id" names the column "id" instead.

row.names

NULL or a character vector giving the row names for the data frame. Missing values are not allowed.

optional

logical. If TRUE, setting row names and converting column names (to syntactic names: see make.names) is optional. Note that all of R's base package as.data.frame() methods use optional only for column names treatment, basically with the meaning of data.frame(*, check.names = !optional). See also the make.names argument of the matrix method.

Details

The introduced classes are designed for several often encountered data scenarios:

  • id_tbl objects can be used to represent static (with respect to relevant time scales) patient data such as patient age and such an object is simply a data.table combined with a non-zero length character vector valued attribute marking the columns tracking patient ID information (id_vars). All further columns are considered as data_vars.

  • ts_tbl objects are used for grouped time series data. A data.table object again is augmented by attributes, including a non-zero length character vector identifying patient ID columns (id_vars), a string, tracking the column holding time-stamps (index_var) and a scalar difftime object determining the time-series step size interval. Again, all further columns are treated as data_vars.

  • win_tbl: In addition to representing grouped time-series data as does a ts_tbl, win_tbl objects also encode a validity interval for each time-stamped measurement (as dur_var). This can for example be useful when a drug is administered at a certain infusion rate for a given time period.

Owing to the nested structure of required meta data, ts_tbl inherits from id_tbl and win_tbl from ts_tbl. Furthermore, both classes inherit from data.table. As such, data.table reference semantics are available for some operations, indicated by presence of a by_ref argument. At default, value, by_ref is set to FALSE as this is in line with base R behavior at the cost of potentially incurring unnecessary data copies. Some care has to be taken when passing by_ref = TRUE and enabling by reference operations as this can have side effects (see examples).

For instantiating ts_tbl objects, both index_var and interval can be automatically determined if not specified. For the index column, the only requirement is that a single difftime column is present, while for the time step, the minimal difference between two consecutive observations is chosen (and all differences are therefore required to be multiples of the minimum difference). Similarly, for a win_tbl, exactly two difftime columns are required where the first is assumed to be corresponding to the index_var and the second to the dur_var.

Upon instantiation, the data might be rearranged: columns are reordered such that ID columns are moved to the front, followed by the index column and a data.table::key() is set on meta columns, causing rows to be sorted accordingly. Moving meta columns to the front is done for reasons of convenience for printing, while setting a key on meta columns is done to improve efficiency of subsequent transformations such as merging or grouped operations. Furthermore, NA values in either ID or index columns are not allowed and therefore corresponding rows are silently removed.

Coercion between id_tbl and ts_tbl (and win_tbl) by default keeps intersecting attributes fixed and new attributes are by default inferred as for class instantiation. Each class comes with a class-specific implementation of the S3 generic function validate_tbl() which returns TRUE if the object is considered valid or a string outlining the type of validation failure that was encountered. Validity requires

  1. inheriting from data.table and unique column names

  2. for id_tbl that all columns specified by the non-zero length character vector holding onto the id_vars specification are available

  3. for ts_tbl that the string-valued index_var column is available and does not intersect with id_vars and that the index column obeys the specified interval.

  4. for win_tbl that the string-valued dur_var corresponds to a difftime vector and is not among the columns marked as index or ID variables

Finally, inheritance can be checked by calling is_id_tbl() and is_ts_tbl(). Note that due to ts_tbl inheriting from id_tbl, is_id_tbl() returns TRUE for both id_tbl and ts_tbl objects (and similarly for win_tbl), while is_ts_tbl() only returns TRUE for ts_tbl objects.

Value

Constructors id_tbl()/ts_tbl()/win_tbl(), as well as coercion functions as_id_tbl()/as_ts_tbl()/as_win_tbl() return id_tbl/ts_tbl/win_tbl objects respectively, while inheritance testers is_id_tbl()/is_ts_tbl()/is_win_tbl() return logical flags and validate_tbl() returns either TRUE or a string describing the validation failure.

Relationship to data.table

Both id_tbl and ts_tbl inherit from data.table and as such, functions intended for use with data.table objects can be applied to id_tbl and ts_tbl as well. But there are some caveats: Many functions introduced by data.table are not S3 generic and therefore they would have to be masked in order to retain control over how they operate on objects inheriting form data.table. Take for example the function data.table::setnames(), which changes column names by reference. Using this function, the name of an index column of an id_tbl object can me changed without updating the attribute marking the column as such and thusly leaving the object in an inconsistent state. Instead of masking the function setnames(), an alternative is provided as rename_cols(). In places where it is possible to seamlessly insert the appropriate function (such as base::names<-() or base::colnames<-()) and the responsibility for not using data.table::setnames() in a way that breaks the id_tbl object is left to the user.

Owing to data.table heritage, one of the functions that is often called on id_tbl and ts_tbl objects is base S3 generic [base::[()]. As this function is capable of modifying the object in a way that makes it incompatible with attached meta data, an attempt is made at preserving as much as possible and if all fails, a data.table object is returned instead of an object inheriting form id_tbl. If for example the index column is removed (or modified in a way that makes it incompatible with the interval specification) from a ts_tbl, an id_tbl is returned. If however the ID column is removed the only sensible thing to return is a data.table (see examples).

Examples

tbl <- id_tbl(a = 1:10, b = rnorm(10))
is_id_tbl(tbl)
is_ts_tbl(tbl)

dat <- data.frame(a = 1:10, b = hours(1:10), c = rnorm(10))
tbl <- as_ts_tbl(dat, "a")
is_id_tbl(tbl)
is_ts_tbl(tbl)

tmp <- as_id_tbl(tbl)
is_ts_tbl(tbl)
is_ts_tbl(tmp)

tmp <- as_id_tbl(tbl, by_ref = TRUE)
is_ts_tbl(tbl)
is_ts_tbl(tmp)

tbl <- id_tbl(a = 1:10, b = rnorm(10))
names(tbl) <- c("c", "b")
tbl

tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(data.table::setnames(tbl, c("c", "b")))

tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(rename_cols(tbl, c("c", "b")))

tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))
tbl[, c("a", "c"), with = FALSE]
tbl[, c("b", "c"), with = FALSE]
tbl[, list(a, b = as.double(b), c)]


ricu documentation built on Sept. 8, 2023, 5:45 p.m.