Description

The proposed open-source dxpr package is a software tool aimed at expediting an integrated analysis of electronic health records (EHRs). By implementing dxpr package, it is easier to integrate, analyze, and visualize clinical data.

In this part, the instruction of how dxpr package workes with procedure records is provided.

Development version

install.packages("remotes")
# Install development version from GitHub
remotes::install_github("DHLab-TSENG/dxpr")
library(dxpr)
knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
#remotes::install_github("DHLab-TSENG/dxpr")
library(dxpr)

Data Format

dxpr (procedure part) is used to pre-process procedure codes of EHRs. To execute functions in dxpr, the EHR data input should be a data frame object in R, and contain at least three columns: patient ID, ICD procedure codes and date.

Column names or column order of these three columns does not need to necessarily follow a rule. Each column name will be pass as argument. Detailed information of required data type of every column and argument of functions can be found in the reference section.

Also, in the R ecosystem, DBI, odbc, and other packages provide access to databases within R. As long as the data is retrieved from databases to a data frame in R, the processes are the same as the following example.

con <- DBI::dbConnect(odbc::odbc(),
                      Driver   = "[your driver's name]",
                      Server   = "[your server's path]",
                      Database = "[your database's name]",
                      UID      = "[Database user]",
                      PWD      = "[Database password]")
dxDataFile <- dbSendQuery(con, "SELECT * FROM example_table WHERE ID = 'sampleID'")
dbFetch(dxDataFile)

Sample file

An example of the data shows as below:
This data is a simulated medical data set of 3 patients with 170 records.

head(samplePrFile)

ICD-PCS code with two format

dxpr package uses ICD-PCS codes as procedure standard. There are two formats of ICD-9 procedure codes, decimal (with a decimal point separating the code) and short format. And ICD-10 is only with short format.

ICD-9-PCS

# ICD-9-PCS_Short
head(ICD9PrwithTwoFormat$Short)

# ICD-9-PCS_Decimal
head(ICD9PrwithTwoFormat$Decimal)

ICD-10-PCS: only short format

# ICD-10-PCS_Short
head(prICD10$ICD)

I. Code standardization

A. ICD-9-PCS code format transformation:Short <-> Decimal

dxpr package helps users to standardize the ICD-9 procedure codes into a uniform format before further code grouping. The formats used for different grouping methods are shown as Table 1.

Table 1 Format of code classification methods

| |ICD format| |--------|----| |Clinical Classifications Software (CCS)|short format| |Procedure class|short format|

Since formats of ICD-9 codes used within a dataset could be different, users can standardize the codes through this function.

The function only standardizes ICD-9 codes. There are two ways to distinguish the version of ICD code (ICD-9/ICD-10) used in data: one is a specific extra column that records version used (data type in this column should be numeric 9 or 10), the other is a specific date that is the starting date of using ICD-10 in the dataset. For example, reimbursement claims with a date required to use ICD-10 codes in the United States and Taiwan are October 1st, 2015 and January 1st, 2016, respectively.

A-1. Uniform decimal format

decimal <- icdPrShortToDecimal(prDataFile = samplePrFile,
                               icdColName = ICD, 
                               dateColName = Date,
                               icd10usingDate = "2015/10/01")
head(decimal$ICD)

A-2. Uniform short format

icdPrShortToDecimal function converts the procedure codes to the short format, which can be used for grouping to the other classification functions (icdPrToCCS, icdPrToCCSLvl and icdPrToProcedureClass).

short <- icdPrDecimalToShort(prDataFile = samplePrFile,
                             icdColName = ICD, 
                             dateColName = Date,
                             icd10usingDate = "2015/10/01")
head(short$ICD)

Warning message

Besides, code standardization functions generate data of procedure codes with potential error to help researchers identify the potential coding mistake that may affect the result of following clinical data analysis.

There are two error type:wrong format and wrong version. The former one means the ICD code does not exist (maybe because ICD is wrongly coded or with a wrong place of decimal point). And the latter one means the version is wrong (still use ICD 9 after icd10usingDate, etc.).

Users can check data after receiving the warning message. researcher identify the potential coding mistake that may affect clinical data analysis.

II. Data integration

Functions stated below collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD procedure codes are.

dxpr package supports two strategies to group EHR procedure codes, including CCS and procedure classes.

A. Clinical Classifications Software (CCS)

The CCS classification for ICD-9 and ICD-10 codes is a procedure categorization scheme that can employ in many types of projects analyzing data on procedures.

1) single-level: icdPrToCCS

Both ICD-9-PCS and ICD-10-PCS code contains 231 single-level CCS categories which can be corresponded with each other.

## ICD to CCS category
CCS <- icdPrToCCS(prDataFile = samplePrFile,
                  idColName = ID,
                  icdColName = ICD,        
                  dateColName = Date,
                  icd10usingDate = "2015-10-01",
                  isDescription = FALSE)

head(CCS$groupedDT, 5)

2) multi-level: icdPrToCCSLvl

Multi-level CCS in ICD-9-PCS has 3 levels, and multi-level CCS in ICD-10-PCS has two levels.

## ICD to CCS multiple level 2 description
CCSLvl <- icdPrToCCSLvl(prDataFile = samplePrFile,
                       idColName = ID,
                       icdColName = ICD,        
                       dateColName = Date,
                       icd10usingDate = "2015-10-01",
                       CCSLevel = 2,
                       isDescription = TRUE)

head(CCSLvl$groupedDT, 5)

B. Procedure Class

Procedure Classes are part of the family of databases and software tools developed as part of the Healthcare Cost and Utilization Project (HCUP) by AHRQ.
The Procedure Classes assign all ICD procedure codes to one of four categories:

ProcedureClass <- icdPrToProcedureClass(prDataFile = samplePrFile,
                                        idColName = ID,
                                        icdColName = ICD,      
                                        dateColName = Date,
                                        icd10usingDate = "2015-10-01",
                                        isDescription = FALSE)

head(ProcedureClass$groupedDT, 5)

Reference

I. Code standardization

ICD-9-PCS code (2014): https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html

ICD-10-PCS code (2019):https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-PCS.html

https://www.findacode.com/search/search.php

https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Downloads/HospitalAppendix_F.pdf

II. Data integration

Clinical Classifications Software (CCS)

ICD-9-PCS CCS (2015): https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Single_Level_CCS_2015.zip

https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Multi_Level_CCS_2015.zip

ICD-10-PCS CCS (2019): https://www.hcup-us.ahrq.gov/toolssoftware/procedureicd10/procedure_icd10.jsp

Procedure Class

ICD-9-Procedure Class (2015): https://www.hcup-us.ahrq.gov/toolssoftware/procedure/pc2015.csv

ICD-10-Procedure Class (2019): https://www.hcup-us.ahrq.gov/toolssoftware/procedureicd10/procedure_icd10.jsp



DHLab-TSENG/dxpr documentation built on Sept. 1, 2023, 8:12 a.m.