EHR Vignette for Structured Data

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(R.options = list(width = 100))

library(EHR)
library(lubridate)
library(pkdata)
options(stringsAsFactors = FALSE)
findbreaks <- function(x, char = '[ /\\]', charlen = 75) {
  if(length(x) > 1) {
    out <- vapply(x, findbreaks, character(1), char, charlen, USE.NAMES = FALSE)
    ix <- !grepl('\n[[:space:]]*$', out)
    ix[length(ix)] <- FALSE
    out[ix] <- paste0(out[ix], '\n')
    return(paste(out, collapse = ''))
  }
  cur <- x
  nbuf <- ceiling(nchar(x) / charlen)
  if(nbuf == 1) {
    return(cur)
  }
  strings <- character(nbuf)
  i <- 1
  while(nchar(cur) > charlen) {
    loc <- c(gregexpr(char, cur)[[1]])
    b <- loc[max(which(loc < charlen))]
    strings[i] <- substr(cur, 1, b)
    cur <- substring(cur, b + 1)
    i <- i + 1
  }
  strings[i] <- cur
  paste(c(strings[1], paste0('     ', strings[-1])), collapse = '\n')
}

co <- function(expr) {
  txt <- capture.output(expr)
  cat(findbreaks(txt))
  cat("\n")
}

Introduction

The EHR package provides several modules to perform diverse medication-related studies using data from electronic health record (EHR) databases. Especially, the package includes modules to perform pharmacokinetic/pharmacodynamic (PK/PD) analyses using EHRs, as outlined in Choi et al.$^{1}$, and additional modules will be added in the future. This vignette describes four modules for processing data (Pro-Demographic, Pro-Med-Str, Pro-Drug Level, Pro-Laboratory) and one module for PK data building (Build-PK-IV) for intravenously administered medications, when data are typically obtained from a structured database. The Pro-Med-Str module consists of two parts for processing structured medication data, one for intravenous (IV) infusion or bolus dose given to inpatients, one for electronic (e)-prescription medication data.

The process starts with structured data extracted by Structured Query Language (SQL) from EHRs or provided by a user, then moves through two phases: data processing which standardizes and combines the input data (Pro-Med-Str, Pro-Drug Level, etc.) and data building which creates the final PK data (Build-PK-IV).

The vignette has two examples. The first example demonstrates how to build PK data without using the data processing modules when cleaned concentration, drug dose, demographic and laboratory datasets are available in an appropriate data form. The second example shows how to use several data processing modules to standardize and combine more complex datasets, and build PK data using Build-PK-IV.

To begin we load the EHR package, the pkdata package, and the lubridate package.

# load EHR package and dependencies
library(EHR)
library(pkdata)
library(lubridate)

Example 1: Quick Data Building with Processed Datasets

The data for example 1 includes a demographic file, a concentration file, an IV dosing file, and a laboratory file, which are all cleaned and formatted appropriately. We also define a directory for the raw data and a directory for interactive checking output files.

# define directories
td <- tempdir()
checkDir <- file.path(td, 'check1')
rawDataDir <- system.file("examples", "str_ex1", package="EHR")
dir.create(checkDir)

# pre-processed demographic data 
demo <- read.csv(file.path(rawDataDir,"Demographics_DATA_simple.csv"))
head(demo)

conc.data <- read.csv(file.path(rawDataDir,"Concentration_DATA_simple.csv"))
head(conc.data)

ivdose.data <- read.csv(file.path(rawDataDir,"IVDose_DATA_simple.csv"))
head(ivdose.data)

creat.data <- read.csv(file.path(rawDataDir,"Creatinine_DATA_simple.csv"))
head(creat.data)

The EHR package modules use a standardized naming convention for patient identification (ID) variables. We rename the unique patient-level ID from patient_id to mod_id and the visit-level ID from patient_visit_id to mod_id_visit. The visit-level ID can be used to distinguish different visits (i.e., occasions) when the same patient has multiple hospitalizations. If there is only a single visit per subject the unique patient-level ID and visit-level ID can be the same.

names(conc.data)[1:2] <- names(demo)[1:2] <- c("mod_id", "mod_id_visit")
names(creat.data)[1] <- names(ivdose.data)[1] <- "mod_id"

Using the four datasets, we can build a final PK dataset with the function run_Build_PK_IV(). Additional details for this function are provided below in the Build-PK-IV subsection of Example 2: Complete Data Processing and Building from Raw Extracted Data to PK Data.

simple_pk_dat <- run_Build_PK_IV(
    conc=conc.data,
    conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'mod_id_visit'),
    dose=ivdose.data,
    dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real', 
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', 
                        gap = 'maxint', weight = 'weight'),
    demo.list = demo,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
    lab.list = list(creat.data),
    lab.columns = list(id = 'mod_id', datetime = 'date.time'),
    drugname='fent',
    check.path=checkDir)
co({
simple_pk_dat <- run_Build_PK_IV(
    conc=conc.data,
    conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'mod_id_visit'),
    dose=ivdose.data,
    dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real', 
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', 
                        gap = 'maxint', weight = 'weight'),
    demo.list = demo,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
    lab.list = list(creat.data),
    lab.columns = list(id = 'mod_id', datetime = 'date.time'),
    drugname='fent',
    check.path=checkDir)
})
head(simple_pk_dat,15)

Example 2: Complete Data Processing and Building from Raw Extracted Data to PK Data

To begin example 2 we define directories for the raw data, the processed data, and files used for interactive checking. If a file path for interactive checking is not provided, the interactive checking will not be performed.

dataDir <- file.path(td, 'data2')
checkDir <- file.path(td, 'check2')
rawDataDir <- system.file("examples", "str_ex2", package="EHR")
dir.create(dataDir)
dir.create(checkDir)

Pre-Processing for Raw Extracted Data

The raw data for example 2 includes a demographic file for use with the Pro-Demographic module; two files for the Pro-Drug Level module; two dosing files for the Pro-Med-Str module; and two lab files for use with the Pro-Laboratory module.

The structured datasets extracted by SQL must go through a pre-processing stage which creates new ID variables and datasets that can be used by the data processing modules. The following annotated example demonstrates the three main steps of pre-processing: (1) read and clean raw data; (2) merge raw data to create new ID variables; (3) make new data for use with modules.

Each raw dataset should contain a subject unique ID, a subject visit ID, or both ids. In this example the subject unique ID is called subject_uid and the subject visit ID is called subject_id. The subject visit ID is a combination of subject and visit/course -- e.g., subject_id 14.0 is the first course for subject 14, subject_id 14.1 is the second course for subject 14, and so on. subject_uid is a unique ID that is the same for all subject records. The integer part of subject_id has a 1-to-1 correspondence with subject_uid -- for this example, subject_uid 62734832 is associated with both subject_id 14.0 and subject_id 14.1. If there is only a single visit/course per subject only the subject unique ID is needed.

(1) Read and clean raw data

The demographics data file contains ID variables subject_id and subject_uid, in addition to demographic variables such as gender, date of birth, height, weight, etc. The Demographics_DATA.csv file is read in using the readTransform() function.

# demographics data
demo.in <- readTransform(file.path(rawDataDir, "Demographics_DATA.csv"))
head(demo.in)

The example concentration data consists of two files, SampleTimes_DATA.csv and SampleConcentration_DATA.csv containing the concentration sampling times and values, respectively.

The sampling times data csv file is read in with read.csv(). Then the function dataTransformation() is used to rename the variable Study.ID to subject_id and to create a new variable called samp, which indexes the sample number, using the modify= argument.

# concentration sampling times data
# read in raw data
samp.raw <- read.csv(file.path(rawDataDir, "SampleTimes_DATA.csv"))
head(samp.raw)

# transform data
samp.in0 <- dataTransformation(samp.raw,
    rename = c('Study.ID' = 'subject_id'),
    modify = list(samp = expression(as.numeric(sub('Sample ', '', Event.Name)))))
head(samp.in0)

Equivalently, the function readTransform() can be used to read in and transform the data with a single function call.

# read in and transform data
samp.in <- readTransform(file.path(rawDataDir, "SampleTimes_DATA.csv"),
    rename = c('Study.ID' = 'subject_id'),
    modify = list(samp = expression(as.numeric(sub('Sample ', '', Event.Name)))))
head(samp.in)

The same steps can be used for the sample values data csv file. It is read in using read.csv(). Then using dataTransformation() the subject_id variable is created from the name variable using a call to the helper function sampId() in the modify= argument.

# concentration sample values data
# read in raw data
conc.raw <-read.csv(file.path(rawDataDir, "SampleConcentration_DATA.csv"))
head(conc.raw)

# helper function used to make subject_id
sampId <- function(x) {
  # remove leading zeroes or trailing periods
  subid <- gsub('(^0*|\\.$)', '', x)
  # change _ to .
  gsub('_([0-9]+[_].*)$', '.\\1', subid)
}

# transform data
conc.in0 <- dataTransformation(conc.raw,
                    modify = list(
                    subid = expression(sampId(name)),
                    subject_id = expression(as.numeric(sub('[_].*', '', subid))),
                    samp = expression(sub('[^_]*[_]', '', subid)),
                    name = NULL,
                    data_file = NULL,
                    subid = NULL
                    )
                  )
head(conc.in0)

Again, we can perform the same two steps with a single call to readTransform().

# equivalent using readTransform()
conc.in <- readTransform(file.path(rawDataDir, "SampleConcentration_DATA.csv"),
  modify = list(
    subid = expression(sampId(name)),
    subject_id = expression(as.numeric(sub('[_].*', '', subid))),
    samp = expression(sub('[^_]*[_]', '', subid)),
    name = NULL,
    data_file = NULL,
    subid = NULL
    )
  )
head(conc.in)

The example drug dosing data consists of files FLOW_DATA.csv and MAR_DATA.csv containing two sources of IV dose information (for details about these data sources, see Pro-Med-Str module below). The FLOW data csv file contains aliases for both ID variables; it is read in with the readTransform() function which renames the variables Subject.Id to subject_id and Subject.Uniq.Id to subject_uid, and creates the required date.time, unit, and rate variables.

# FLOW dosing data
flow.in <- readTransform(file.path(rawDataDir, "FLOW_DATA.csv"),
 rename = c('Subject.Id' = 'subject_id',
            'Subject.Uniq.Id' = 'subject_uid'),
 modify=list(
  date.time = expression(pkdata::parse_dates(EHR:::fixDates(Perform.Date))),
  unit = expression(sub('.*[ ]', '', Final.Rate..NFR.units.)),
  rate = expression(as.numeric(sub('([0-9.]+).*', '\\1', Final.Rate..NFR.units.)))
  )
 ) 
head(flow.in)

The MAR data csv file contains several variables with a colon (:) character. To preserve the colon in these variable names, the data can be read in without checking for syntactically valid R variable names. The data is read in using read.csv() with the argument check.names = FALSE and then passed to the dataTransformation() function which renames Uniq.Id to subject_uid.

# MAR dosing data
mar.in0 <- read.csv(file.path(rawDataDir, "MAR_DATA.csv"), check.names = FALSE)
mar.in <- dataTransformation(mar.in0, rename = c('Uniq.Id' = 'subject_uid'))
head(mar.in)

The example laboratory data consists of files Creatinine_DATA.csv and Albumin_DATA.csv. Both files are read in using the readTransform() function and Subject.uniq is renamed to subject_uid.

# Serum creatinine lab data
creat.in <- readTransform(file.path(rawDataDir, "Creatinine_DATA.csv"),
    rename = c('Subject.uniq' = 'subject_uid'))
head(creat.in)

# Albumin lab data
alb.in <- readTransform(file.path(rawDataDir, "Albumin_DATA.csv"),
    rename = c('Subject.uniq' = 'subject_uid'))
head(alb.in)

(2) Merge data to create new ID variables

The function idCrosswalk() merges all of the cleaned input datasets and creates new IDs. The data= argument of this function accepts a list of input datasets and the idcols= argument accepts a list of vectors or character strings that identify the ID variables in the corresponding input dataset.

The output of idCrosswalk() is a crosswalk dataset between the original ID variables (subject_id, subject_uid) and the new ID variables (mod_id, mod_visit, and mod_id_visit). The new variable mod_id_visit has a 1-to-1 correspondence to variable subject_id and uniquely identifies each subjects' visit/course; the new variable mod_id has a 1-to-1 correspondence to variable subject_uid and uniquely identifies each subject.

# merge all ID datasets
data <-  list(demo.in,
              samp.in,
              conc.in,
              flow.in,
              mar.in,
              creat.in,
              alb.in)

idcols <-  list(c('subject_id', 'subject_uid'), # id vars in demo.in
                'subject_id', # id var in samp.in
                'subject_id', # id var in conc.in
                c('subject_id', 'subject_uid'), # id vars in flow.in
                'subject_uid', # id var in mar.in
                'subject_uid', # id var in creat.in
                'subject_uid') # id var in creat.in

mod.id <- idCrosswalk(data, idcols, visit.id="subject_id", uniq.id="subject_uid")
saveRDS(mod.id, file=file.path(dataDir,"Fentanyl_module_id.rds"))

mod.id

(3) Make new data for use with modules

The function pullFakeId() replaces the original IDs -- subject_id and subject_uid -- with new IDs -- mod_id, mod_visit, and mod_id_visit -- to create datasets which can be used by the data processing modules. Generally, the function call to pullFakeId() is

pullFakeId(dat, xwalk, firstCols = NULL, orderBy = NULL)

The dat= argument should contain the cleaned input data.frame from pre-processing step (1) and the xwalk= argument should contain the crosswalk data.frame produced in step (2). Additional arguments firstCols= and orderBy= control which variables are in the first columns of the output and the sort order, respectively. The cleaned structured data are saved as R objects for use with the modules.

## demographics data
demo.cln <- pullFakeId(demo.in, mod.id,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
    uniq.id = 'subject_uid')
head(demo.cln)
saveRDS(demo.cln, file=file.path(dataDir,"Fentanyl_demo_mod_id.rds"))

## drug level data
# sampling times
samp.cln <- pullFakeId(samp.in, mod.id,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit', 'samp'), 
    orderBy = c('mod_id_visit','samp'),
    uniq.id = 'subject_uid')
head(samp.cln)
saveRDS(samp.cln, file=file.path(dataDir,"Fentanyl_samp_mod_id.rds"))

# sampling concentrations
conc.cln <- pullFakeId(conc.in, mod.id,
    firstCols = c('record_id', 'mod_id', 'mod_visit', 'mod_id_visit', 'samp'),
    orderBy = 'record_id',
    uniq.id = 'subject_uid')
head(conc.cln)
saveRDS(conc.cln, file=file.path(dataDir,"Fentanyl_conc_mod_id.rds"))

## dosing data
# flow
flow.cln <- pullFakeId(flow.in, mod.id,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
    uniq.id = 'subject_uid')
head(flow.cln)
saveRDS(flow.cln, file=file.path(dataDir,"Fentanyl_flow_mod_id.rds"))

# mar
mar.cln <- pullFakeId(mar.in, mod.id, firstCols = 'mod_id', uniq.id = 'subject_uid')
head(mar.cln)
saveRDS(mar.cln, file=file.path(dataDir,"Fentanyl_mar_mod_id.rds"))

## laboratory data
creat.cln <- pullFakeId(creat.in, mod.id, 'mod_id',uniq.id = 'subject_uid')
head(creat.cln)

alb.cln <- pullFakeId(alb.in, mod.id, 'mod_id', uniq.id = 'subject_uid')
head(alb.cln)

saveRDS(creat.cln, file=file.path(dataDir,"Fentanyl_creat_mod_id.rds"))
saveRDS(alb.cln, file=file.path(dataDir,"Fentanyl_alb_mod_id.rds"))

Before running the processing modules, it is necessary to define several options and parameters. Using options(pkxwalk =) allows the modules to access the crosswalk file. We also create a drugname stub and define the lower limit of quantification (LLOQ) for the drug of interest, which is optional.

# set crosswalk option 
xwalk <- readRDS(file.path(dataDir, "Fentanyl_module_id.rds"))
options(pkxwalk = 'xwalk')

# define parameters
drugname <- 'fent'
LLOQ <- 0.05

Pro-Demographic

The Pro-Demographic module accepts the cleaned structured demographic dataset and a user-defined set of exclusion criteria and returns a formatted list with the demographic data and records meeting the exclusion criteria suitable for integration with the other modules. For this example, we exclude subjects with a value of 1 for in_hospital_mortality or add_ecmo and create a new variable called length_of_icu_stay.

The demographic data can be processed by the run_Demo() function using:

# helper function
exclude_val <- function(x, val=1) { !is.na(x) & x == val }

demo.out <- run_Demo(demo.path = file.path(dataDir, "Fentanyl_demo_mod_id.rds"),
    toexclude = expression(exclude_val(in_hospital_mortality) | exclude_val(add_ecmo)),
    demo.mod.list = list(length_of_icu_stay = 
                        expression(daysDiff(surgery_date, date_icu_dc))))

head(demo.out$demo)
demo.out$exclude

See the run_Demo() function documentation for more examples.

Pro-Med-Str

The Pro-Med-Str module processes structured medication data. Part I handles IV dose data and Part II handles e-prescription dose data.

Part I: IV dose data

Part I handles IV dose data from two sources, Flow data and Medication Administration Records (MAR) data. In this example, the Flow data are patient flow sheets which record infusion rates and changes outside of the operating room, while the MAR data record all bolus and infusion doses administered in the operating room before 11/01/2017. After 11/01/2017 when a new EHR system -- Epic -- was implemented, MAR data recorded all types of medications for inpatients. Thus, while MAR data is required, the use of Flow sheet data is optional for this module. The module can be semi-interactive for data checking (although it is not required, we recommend using this feature); if check.path is provided (the default is NULL), it can generate several files to check potential data errors and get feedback from an investigator. If corrected information ('fix' files) are provided, the module should be re-run to incorporate the corrections. The major functions of this module are:

The IV dose data can be processed by the run_MedStrI() function using:

ivdose.out <- run_MedStrI(
    mar.path=file.path(dataDir,"Fentanyl_mar_mod_id.rds"),
    mar.columns = list(id = 'mod_id', datetime = c('Date','Time'), dose = 'med:dosage', 
                       drug = 'med:mDrug', given = 'med:given'),
    medGivenReq = TRUE,
    flow.path=file.path(dataDir,"Fentanyl_flow_mod_id.rds"),
    flow.columns = list(id = 'mod_id', datetime = 'date.time', finalunits = 'Final.Units', 
                        unit = 'unit', rate = 'rate', weight = 'Final.Wt..kg.'),
    medchk.path = file.path(rawDataDir, sprintf('medChecked-%s.csv', drugname)),
    demo.list = NULL,
    demo.columns = list(),
    missing.wgt.path = NULL,
    wgt.columns = list(),
    check.path = checkDir,
    failflow_fn = 'FailFlow',
    failunit_fn = 'Unit',
    failnowgt_fn = 'NoWgt',
    infusion.unit = 'mcg/kg/hr',
    bolus.unit = 'mcg',
    bol.rate.thresh = Inf,
    rateunit = 'mcg/hr',
    ratewgtunit = 'mcg/kg/hr',
    weightunit = 'kg',
    drugname = drugname)
co({
ivdose.out <- run_MedStrI(
    mar.path=file.path(dataDir,"Fentanyl_mar_mod_id.rds"),
    mar.columns = list(id = 'mod_id', datetime = c('Date','Time'), dose = 'med:dosage', drug = 'med:mDrug', given = 'med:given'),
    medGivenReq = TRUE,
    flow.path=file.path(dataDir,"Fentanyl_flow_mod_id.rds"),
    flow.columns = list(id = 'mod_id', datetime = 'date.time', finalunits = 'Final.Units', unit = 'unit', rate = 'rate', weight = 'Final.Wt..kg.'),
    medchk.path = file.path(rawDataDir, sprintf('medChecked-%s.csv', drugname)),
    demo.list = NULL,
    demo.columns = list(),
    missing.wgt.path = NULL,
    wgt.columns = list(),
    check.path = checkDir,
    failflow_fn = 'FailFlow',
    failunit_fn = 'Unit',
    failnowgt_fn = 'NoWgt',
    infusion.unit = 'mcg/kg/hr',
    bolus.unit = 'mcg',
    bol.rate.thresh = Inf,
    rateunit = 'mcg/hr',
    ratewgtunit = 'mcg/kg/hr',
    weightunit = 'kg',
    drugname = drugname)
    })
head(ivdose.out)

Part II: e-prescription data

Part II handles e-prescription data. To use this module, all prescriptions must be for only one drug. Different names, such as brand names and generic names, for the same drug are allowed (e.g., Lamictal and lamotrigine). The data used in this module must include columns for ID, date, strength, dose amount, and frequency. The major tasks the module performs are as follows:

There are two underlying functions used in this module. processErx performs the basic cleaning described above. processErxAddl performs some additional processing for more complicated dose expressions.

Below is example e-prescription data including columns for ID, drug name, dose, frequency, date, strength, and description.

(eRX <- read.csv(file.path(rawDataDir,"e-rx_DATA.csv"),stringsAsFactors = FALSE))

The e-prescription data can be processed by the run_MedStrII function using:

eRX.out <- run_MedStrII(file.path(rawDataDir,"e-rx_DATA.csv"),
    dat.columns = list(id = 'GRID', dose = 'RX_DOSE', freq = 'FREQUENCY', date = 'ENTRY_DATE', 
                       str = 'STRENGTH_AMOUNT', desc = 'DESCRIPTION')
)

eRX.out

The following arguments are used in the run_MedStrII function:

In the above example, daily dose was calculated for the first 5 patients by multiplying strength $\times$ dose $\times$ freq.num, and a redundant daily dose was removed for the patient with ID2. In order to calculate a daily dose for the patient with ID3, the strength of 100 from the description was used because STRENGTH_AMOUNT was missing. For the patient with ID6, the dose amounts of 1.5, 1, and 1.5 are added together to get a dose of 4, and the daily dose is calculated as strength $\times$ dose.

Pro-Drug Level

Pro-Drug Level module processes drug concentration data that can be merged with medication dose data and other types of data. This module can be semi-interactive for data checking (although it is not required, we recommend using this feature); if check.path is provided (the default is NULL), the module will generate several files to check missing data and potential data errors, and get feedback from an investigator. If corrected information ('fix' files) are provided, the module should be re-run to incorporate the corrections. The major functions of this module are:

The drug concentration data can be processed by the run_DrugLevel function using:

conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"Fentanyl_conc_mod_id.rds"),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"Fentanyl_samp_mod_id.rds"),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-',
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out)
co({
conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"Fentanyl_conc_mod_id.rds"),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"Fentanyl_samp_mod_id.rds"),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-',
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out)
})
head(conc.out)

The output provides a message that 3 rows are missing concentration date. The file 'failMissingConcDate-fent.csv' contains the 3 records with missing values for the date.time variable.

( fail.miss.conc.date <- read.csv(file.path(checkDir,"failMissingConcDate-fent.csv")) )

We can correct the missing dates by providing an updated file called 'fixMissingConcDate-fent.csv' that contains the missing data.

fail.miss.conc.date[,"date.time"] <- c("9/30/2016 09:32","10/1/2016 19:20","10/2/2016 02:04")
fail.miss.conc.date

write.csv(fail.miss.conc.date, file.path(checkDir,"fixMissingConcDate-fent.csv"))

After providing the updated file, the same run_DrugLevel() function should be re-run. The output now contains an additional message below the first message saying "fixMissingConcDate-fent.csv read with failures replaced". The conc.out data.frame also contains 3 additional rows with the corrected data.

conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"Fentanyl_conc_mod_id.rds"),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp = 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"Fentanyl_samp_mod_id.rds"),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-', 
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out)
co({
conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"Fentanyl_conc_mod_id.rds"),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp = 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"Fentanyl_samp_mod_id.rds"),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-', 
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out)
})
# remove fix file, so running vignette produces warning with first run of run_DrugLevel()
fx <- file.path(checkDir,"fixMissingConcDate-fent.csv")
if (file.exists(fx)) file.remove(fx)

# remove multiplesetsconc file
ms <- file.path(checkDir,paste0("multipleSetsConc-", drugname, Sys.Date(),".csv"))
if (file.exists(ms)) file.remove(ms)

Before discussing the function arguments, we should first describe the expected data: either a single concentration data set, or concentration data plus sampling time data. If the concentration data set includes a date-time variable, then sampling time data is not necessary.

Concentration data should include the following named columns:

  1. mod_id: patient-level ID
  2. mod_id_visit: visit-level ID
  3. event: sample event name
  4. conc.level: drug concentration level
  5. mod_id_event: unique identifier for subject, visit, and event number. As an example, this can be created by pasting together "mod_id_visit" with "event".
  6. date.time (unless provided with sampling time): date-time of concentration sample

If sampling time is provided, it should include the following named columns:

  1. mod_id_event: unique identifier for subject, visit, and event number
  2. Sample.Collection.Date.and.Time: date-time of sample collection

If conc.path and samp.path specify data sets that are formatted as described above, then the conc.select, conc.rename, conc.mod.list, and samp.mod.list arguments can be ignored (and set to NULL). Otherwise they should be used to help create the proper format.

Pro-Laboratory

The Pro-Laboratory module processes laboratory data that can be merged with data from other modules. The laboratory data can be processed using:

creat.out <- run_Labs(lab.path=file.path(dataDir,"Fentanyl_creat_mod_id.rds"),
    lab.select = c('mod_id','date.time','creat'),
    lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))

alb.out <- run_Labs(lab.path=file.path(dataDir,"Fentanyl_alb_mod_id.rds"),
    lab.select = c('mod_id','date.time','alb'),
    lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))

lab.out <- list(creat.out, alb.out)

str(lab.out)

Build-PK-IV

The Build-PK-IV module creates PK data for IV medications. Both dose data from the Pro-Med-Str1 module and concentration data from the Pro-DrugLevel module are required. Demographic data from the Pro-Demographic module and laboratory data from the Pro-Laboratory module may optionally be included. Each dataset including dose and concentration datasets can be provided by users without being generated by our modules as long as they are a correctly formatted data.frame. This module can be semi-interactive for data checking (although it is not required, we recommend using this feature); if check.path is provided (the default is NULL), it can generate several files to check potential data errors, and get feedback from an investigator. If corrected information (‘fix’ files) are provided, the module should be re-run to incorporate the corrections. The major functions this module performs are:

If demographic data is provided, the demographic variables will also be included.

If pk.vars includes ‘date’, the output generates its original date-time to which the ‘time’ is mapped. Users can use pk.vars to include variables for demographics or labs that are already merged with the concentration dataset when they prefer to provide a single concentration data file (required). But a separate dose data file is still required.

PK data with IV dosing can be built by the run_Build_PK_IV function using:

pk_dat <- run_Build_PK_IV(
    conc=conc.out,
    conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'mod_id_visit'),
    dose=ivdose.out,
    dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real',
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', 
                        gap = 'maxint', weight = 'weight'),
    demo.list = demo.out,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
    lab.list = lab.out,
    lab.columns = list(id = 'mod_id', datetime = 'date.time'),
    pk.vars=c('date'),
    drugname=drugname,
    check.path=checkDir,
    missdemo_fn='-missing-demo',
    faildupbol_fn='DuplicateBolus-',
    date.format="%m/%d/%y %H:%M:%S",
    date.tz="America/Chicago")
co({
pk_dat <- run_Build_PK_IV(
    conc=conc.out,
    conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'mod_id_visit'),
    dose=ivdose.out,
    dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real',  bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', gap = 'maxint', weight = 'weight'),
    demo.list = demo.out,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
    lab.list = lab.out,
    lab.columns = list(id = 'mod_id', datetime = 'date.time'),
    pk.vars=c('date'),
    drugname=drugname,
    check.path=checkDir,
    missdemo_fn='-missing-demo',
    faildupbol_fn='DuplicateBolus-',
    date.format="%m/%d/%y %H:%M:%S",
    date.tz="America/Chicago")
})

The function pullRealId() appends the original IDs -- subject_id and subject_uid to the data. The parameter remove.mod.id=TRUE can be used to also remove any module IDs -- mod_id, mod_visit, and mod_id_visit.

# convert id back to original IDs
pk_dat <- pullRealId(pk_dat, remove.mod.id=TRUE)

head(pk_dat)
# normally, you would not delete these files
# CRAN policy states that a package should do proper cleanup
to_delete <- c(file.path(td, 'check1'), file.path(td, 'check2'), file.path(td, 'data2'))
unlink(to_delete, recursive = TRUE)

References

  1. Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X, Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM. Development of a System for Post-marketing Population Pharmacokinetic and Pharmacodynamic Studies using Real-World Data from Electronic Health Records. Clinical Pharmacology & Therapeutics. 2020 Apr; 107(4): 934-943.


Try the EHR package in your browser

Any scripts or data that you put into this service are public.

EHR documentation built on Oct. 7, 2021, 9:28 a.m.