tbl_utils: ICU class data utilities

Description Usage Arguments Details Value Examples

Description

Several utility functions for working with id_tbl and ts_tbl objects are available, including functions for changing column names, removing columns, as well as aggregating or removing rows. An important thing to note is that as id_tbl (and consequently ts_tbl) inherits from data.table, there are several functions provided by the data.table package that are capable of modifying id_tbl in a way that results in an object with inconsistent state. An example for this is data.table::setnames(): if an ID column or the index column name is modified without updating the attribute marking the column as such, this leads to an invalid object. As data.table::setnames() is not an S3 generic function, the only way to control its behavior with respect to id_tbl objects is masking the function. As such an approach has its own down-sides, a separate function, rename_cols() is provided, which is able to handle column renaming correctly.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
rename_cols(
  x,
  new,
  old = colnames(x),
  skip_absent = FALSE,
  by_ref = FALSE,
  ...
)

rm_cols(x, cols, skip_absent = FALSE, by_ref = FALSE)

change_interval(x, new_interval, cols = time_vars(x), by_ref = FALSE)

change_dur_unit(x, new_unit, by_ref = FALSE)

rm_na(x, cols = data_vars(x), mode = c("all", "any"))

## S3 method for class 'id_tbl'
sort(
  x,
  decreasing = FALSE,
  by = meta_vars(x),
  reorder_cols = TRUE,
  by_ref = FALSE,
  ...
)

is_sorted(x)

## S3 method for class 'id_tbl'
duplicated(x, incomparables = FALSE, by = meta_vars(x), ...)

## S3 method for class 'id_tbl'
anyDuplicated(x, incomparables = FALSE, by = meta_vars(x), ...)

## S3 method for class 'id_tbl'
unique(x, incomparables = FALSE, by = meta_vars(x), ...)

is_unique(x, ...)

## S3 method for class 'id_tbl'
aggregate(
  x,
  expr = NULL,
  by = meta_vars(x),
  vars = data_vars(x),
  env = NULL,
  ...
)

dt_gforce(
  x,
  fun = c("mean", "median", "min", "max", "sum", "prod", "var", "sd", "first", "last",
    "any", "all"),
  by = meta_vars(x),
  vars = data_vars(x),
  na_rm = !fun %in% c("first", "last")
)

replace_na(x, val, type = "const", ...)

Arguments

x

Object to query

new, old

Replacement names and existing column names for renaming columns

skip_absent

Logical flag for ignoring non-existent column names

by_ref

Logical flag indicating whether to perform the operation by reference

...

Ignored

cols

Column names of columns to consider

new_interval

Replacement interval length specified as scalar-valued difftime object

new_unit

New difftime unit for the dur_var column

mode

Switch between all where all entries of a row have to be missing (for the selected columns) or any, where a single missing entry suffices

decreasing

Logical flag indicating the sort order

by

Character vector indicating which combinations of columns from x to use for uniqueness checks

reorder_cols

Logical flag indicating whether to move the by columns to the front.

incomparables

Not used. Here for S3 method consistency

expr

Expression to apply over groups

vars

Column names to apply the function to

env

Environment to look up names in expr

fun

Function name (as string) to apply over groups

na_rm

Logical flag indicating how to treat NA values

val

Replacement value (if type is "const")

type

character, one of "const", "locf" or "nocb". Defaults to "const".

Details

Apart from a function for renaming columns while respecting attributes marking columns a index or ID columns, several other utility functions are provided to make handling of id_tbl and ts_tbl objects more convenient.

Sorting

An id_tbl or ts_tbl object is considered sorted when rows are in ascending order according to columns as specified by meta_vars(). This means that for an id_tbl object rows have to be ordered by id_vars() and for a ts_tbl object rows have to be ordered first by id_vars(), followed by the index_var(). Calling the S3 generic function base::sort() on an object that inherits form id_tbl using default arguments yields an object that is considered sorted. For convenience (mostly in printing), the column by which the table was sorted are moved to the front (this can be disabled by passing FALSE as reorder_cols argument). Internally, sorting is handled by either setting a data.table::key() in case decreasing = FALSE or be calling data.table::setorder() in case decreasing = TRUE.

Uniqueness

On object inheriting form id_tbl is considered unique if it is unique in terms of the columns as specified by meta_vars(). This means that for an id_tbl object, either zero or a single row is allowed per combination of values in columns id_vars() and consequently for ts_tbl objects a maximum of one row is allowed per combination of time step and ID. In order to create a unique id_tbl object from a non-unique id_tbl object, aggregate() will combine observations that represent repeated measurements within a group.

Aggregating

In order to turn a non-unique id_tbl or ts_tbl object into an object considered unique, the S3 generic function stats::aggregate() is available. This applied the expression (or function specification) passed as expr to each combination of grouping variables. The columns to be aggregated can be controlled using the vars argument and the grouping variables can be changed using the by argument. The argument expr is fairly flexible: it can take an expression that will be evaluated in the context of the data.table in a clean environment inheriting from env, it can be a function, or it can be a string in which case dt_gforce() is called. The default value NULL chooses a string dependent on data types, where numeric resolves to median, logical to sum and character to first.

As aggregation is used in concept loading (see load_concepts()), performance is important. For this reason, dt_gforce() allows for any of the available functions to be applied using the GForce optimization of data.table (see data.table::datatable.optimize).

Value

Most of the utility functions return an object inheriting from id_tbl, potentially modified by reference, depending on the type of the object passed as x. The functions is_sorted(), anyDuplicated() and is_unique() return logical flags, while duplicated() returns a logical vector of the length nrow(x).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
tbl <- id_tbl(a = rep(1:5, 4), b = rep(1:2, each = 10), c = rnorm(20),
              id_vars = c("a", "b"))
is_unique(tbl)
is_sorted(tbl)

is_sorted(tbl[order(c)])

identical(aggregate(tbl, list(c = sum(c))), aggregate(tbl, "sum"))

tbl <- aggregate(tbl, "sum")
is_unique(tbl)
is_sorted(tbl)

ricu documentation built on Oct. 7, 2021, 9:06 a.m.