extract_timevarying: Extract & Reshape Timevarying Dateitems

Description Usage Arguments Details Value Examples

View source: R/extract_timevarying.R

Description

This is the workhorse function that transcribes 2d data from CC-HIC to a table with 1 column per dataitem (and any metadata if relevent) and 1 row per time per patient.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
extract_timevarying(
  connection,
  episode_ids = NULL,
  code_names,
  rename = as.character(NA),
  coalesce_rows = dplyr::first,
  chunk_size = 5000,
  cadance = 1,
  time_boundaries = c(-Inf, Inf)
)

Arguments

connection

a CC-HIC database connection

episode_ids

an integer vector of episode_ids or NULL. If NULL (the default) then all episodes are extracted. If working with the public dataset where episode ids are given as a character string of hashed values please use NULL.

code_names

a vector of CC-HIC codes names to be extracted

rename

a character vector of names you want to relabel CC-HIC codes as, or NULL (the default) if you do not want to relabel. Given in the same order as code_names

coalesce_rows

a function vector of the summary functions that you want to summarise data that is contributed higher than your set cadance. Given in the same order as code_names

chunk_size

a chunking parameter to help speed up the function and manage memory constaints. The defaults work well for most desktop computers.

cadance

a numerical scalar >= 0 or "timestamp". If a numerical scalar is used, it will describe the base time unit to build each row, in divisions of an hour. For example: 1 = 1 hour, 0.5 = 30 mins, 2 = 2 hourly. If multiple events occur within the specified time, then duplicate rows are created. If cadance = 0, then the pricise datetime will be used to generate the time column. This is likely to generate a large table, so use cautiously.

time_boundaries

an integer vector of length 2 containing the start and end times (in hours) relative to the ICU admission time, for which you want data extraction to occur. For example, c(0, 24) will return the first 24 hours of data after admission. The default c(-Inf, Inf) will return all data.

Details

The time unit is user definable, and set by the "cadance" argument. The default behaviour is to produce a table with 1 row per hour per patient. If there are duplicates/conflicts (e.g more than 1 event for a given hour), then only the first result for that hour is returned. One can override this behvaiour by supplying a vector of summary functions directly to the 'coalesce_rows' argument.

Many events inside CC-HIC occur on a greater than hourly basis. Depending upon the chosen analysis, you may which to increase the cadance. 0.5 for example will produce a table with 1 row per 30 minutes per patient.

Where you are extacting at a resolution lower than is recorded in the database, you can specify a summary function with the coalesce_rows argument. This argument takes a summary function as an argument, for example, 'mean' and will apply this behaviour to the specified data items in the database.

Choose what variables you want to pull out wisely. This function is actually quite efficient considering what it needs to do, but it can take a very long time if extracting lots of data. It is a strong recomendation that you optimise the database with indexes prior to using this function. You may want to test your extraction with 100 or so patients to make sure it is doing what you want.

It is perfectly possible for this function to produce negative time rows. If, for example a patient had a measure taken in the hours before they were admitted, then this would be added to the table with a negative time value. As a concrete example, if a patient had a sodium measured at 08:00, and they were admitted to the ICU at 20:00 the same day, then the sodium would be displayed at time = -12. This is normal behaviour it is left to the end user to determine how best they wish to account for this.

Value

sparse tibble with hourly cadance as rows, and unique hic events as columns. Data items that contain metadata are reallocated to their own columns.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# DB Connection
db_pth <- system.file("testdata/synthetic_db.sqlite3", package = "inspectEHR")
ctn <- connect(sqlite_file = db_pth)

# Extract Heart Rates for 5 episodes with default settings
hr_default <- extract_timevarying(ctn, episode_ids = 13639:13643, code_names = "NIHR_HIC_ICU_0108")
head(hr_default)
# Extract Heart Rates for 5 episodes with custom settings
hr_custom <- extract_timevarying(ctn, episode_ids = 13639:13643, code_names = "NIHR_HIC_ICU_0108", cadance = 2, coalesce_rows = mean)
head(hr_custom)
DBI::dbDisconnect(ctn)

CC-HIC/inspectEHR documentation built on Jan. 16, 2020, 11:24 p.m.