tidy_data: Transform to a tidy data set

Description Usage Arguments Details Examples

Description

tidy_data transforms raw EDW data into a tidy format

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
tidy_data(x, ...)

## Default S3 method:
tidy_data(x, ...)

## S3 method for class 'diagnosis'
tidy_data(x, ...)

## S3 method for class 'labs'
tidy_data(x, censor = TRUE, ...)

## S3 method for class 'locations'
tidy_data(x, ...)

## S3 method for class 'meds_cont'
tidy_data(x, sched, ref = NULL, ...)

## S3 method for class 'meds_inpt'
tidy_data(x, ref = NULL, ...)

## S3 method for class 'meds_sched'
tidy_data(x, ref = NULL, ...)

## S3 method for class 'services'
tidy_data(x, ...)

## S3 method for class 'vent_times'
tidy_data(x, dc, ...)

Arguments

x

A data frame with an edw class type

...

additional arguments passed on to individual methods

censor

A logical, if TRUE will add a column indicating the data was censored (default)

sched

A data frame with intermittent medications

ref

A data frame with three columns: name, type, and group. See details below.

dc

A data frame with discharge date/times

Details

This is an S3 generic function for tidying EDW data read in using read_data. The function invokes the appropriate method based on the type of data being transformed (i.e., lab results, medication data, etc.).

The data frame passed to ref should contain three character columns: name, type, and group. The name column should contain either generic medication names or medication classes. The type column should specify whether the value in name is a "class" or "med". The group column should specify whether the medication is a continous ("cont") or scheduled ("sched") medication.

For diagnosis, checks to see whether the code is a valid ICD-9-CM or ICD-10-CM code. For codes that are valid for both (i.e., "E" and "V" codes), then it looks to see if the code matches a defined ICD-9-CM or ICD-10-CM code. For codes that are defined in both, then the designated code type from the source is used.

For locations, this function accounts for incorrect departure time from raw EDW data by calculating the departure time using the arrival time of the next unit (unless it was the patient's last unit during the hospitalization in which case the recorded departure time is used). It also combines multiple rows of data when the patient did not actually leave that unit.

For services, this function accounts for incorrect end times from raw EDW data by calculating the end time using the start time of the next service (unless it was the patient's last service during the hospitalization). It also combines multiple rows of data when the patient did not actually leave that service.

For vent_times, this function accounts for incorrect start and stop times from raw EDW data. If there is not a recorded stop time then the discharge time will be used as the stop time.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# tidy lab data; non-numeric results will be converted to NA
suppressWarnings(print(head(
  tidy_data(labs)
)))

# tidy labs without marking censored data (will be converted to NA's)
suppressWarnings(print(head(
  tidy_data(labs, censor = FALSE)
)))

# make a reference data frame for tidying meds
ref <- tibble::tibble(
  name = c("heparin", "warfarin", "antiplatelet agents"),
  type = c("med", "med", "class"),
  group = c("cont", "sched", "sched")
)

# tidy continuous medications; will keep only heparin drips
print(head(
  tidy_data(meds_cont, meds_sched, ref)
))

# tidy intermittent medications; will keep warfarin and antiplatelet agents
print(head(
  tidy_data(meds_sched, ref)
))

bgulbis/edwr documentation built on May 12, 2019, 8:22 p.m.