knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(hcup) library(dplyr) library(tibble) library(purrr) library(tidyr)
The Clinical Classifications Software Refined (CCSR) software for ICD-10-CM
diagnosis codes provides both one-to-one mapping (classify_ccsr_dx1()
) and one-to-many mapping
(ccsr_dx()
) to CCSR categories. The one-to-many mapping is necessary because many ICD-10 codes
span multiple meaningful categories, and the identification of all conditions related to a
diagnosis code would be lost by categorizing some ICD-10-CM codes into a single category.
For example, consider the code I11.0 (Hypertensive heart disease with heart failure) which encompasses both heart failure (CCSR category CIR019) and hypertension with complications (CIR008). Classifying this code as heart failure alone would miss meaningful information, but on the other hand, some analyses require mutually exclusive categorization.
The ccsr_dx()
function identifies all CCSR categories associated with an ICD-10 diagnosis code.
Each ICD-10 code can have one or more CCSR category. In the example used above, I11.0 has two CSSR
categories, so the function returns a character vector of two CCSR categories
library(hcup) library(dplyr) ccsr_dx("I110") ccsr_dx("I110") %>% explain_ccsr()
If we need to do one-to-one mapping (e.g. descriptive statistics on the number of hospital discharges), we use classify_ccsr_dx1()
. Importantly, this algorithm is only valid for the principal diagnosis (for inpatient
records) or the first listed diagnosis code (for outpatient data)
classify_ccsr_dx1("I110", setting = "inpatient")
The classify_ccsr_dx1()
function is similar to the other classify_
functions used in this
package, so we will begin here.
Although a comprehensive review of the CCSR methodology is beyond the scope of this article (I'd highly recommend reading the appendices of the CCSR DX user guide), I do want to highlight a few important points here.
This algorithm was developed for use with the principal diagnosis (for inpatient records) or
the first listed diagnosis code (for outpatient data). Consider the sample of two patients
below, with their principal diagnosis listed in DX1
, and subsequent diagnoses listed in
DX_n
.
library(dplyr) df <- tibble::tribble( ~Pt_id, ~DX_1, ~DX_2, ~DX_3, "A", "I10", "F17210", NA, "B", "D62", "Z370", "E876" ) ## GOOD df %>% mutate(prin_CCSR = classify_ccsr_dx1(DX_1, setting="ip")) ## BAD df %>% mutate(prin_CCSR = classify_ccsr_dx1(DX_2, setting="ip"))
Caution should be used when working with tidy data (long-format) to avoid classifying the non-principal diagnoses
df_tidy <- df %>% pivot_longer(cols = -Pt_id, names_to = "DX_NUM", values_to = "ICD") ## GOOD df_tidy %>% filter(DX_NUM=="DX_1") %>% mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"), expln = explain_ccsr(prin_CCSR)) ## BAD df_tidy %>% mutate(prin_CCSR = classify_ccsr_dx1(ICD, setting="ip"), expln = explain_ccsr(prin_CCSR)) rm(df, df_tidy)
Second, not all ICD-10 codes are acceptable for default classification. For example if B95.2
were listed as the Principal/First diagnoses, it would map to XXX000
(for inpatient data)
and XXX111
(for outpatient data).
# B95.2: Enterococcus as the cause of diseases classified elsewhere classify_ccsr_dx1("B952", setting = "inpatient") classify_ccsr_dx1("B952", setting = "outpatient")
This could be because B95.2 was inappropriately listed as the principal diagnosis, and if you run into this problem frequently, it may be helpful to consult the CCSR DX user guide.
As of this version of the CCSR (see ?hcup.data::CCSR_DX_mapping
to check the version),
the setting (i.e. inpatient or outpatient) doesn't drastically change the results. Using
the dataset this function is based off of, we can see that the only situations where
setting = "inpatient" and setting = "outpatient" don't agree are codes that are acceptable
for the first diagnosis in the outpatient setting, not for the principal diagnosis in the
inpatient setting.
if(interactive()){ hcup.data::CCSR_DX_mapping %>% # Recode the Unacceptable category to be the same for ip/op mutate(default_CCSR_IP = recode(default_CCSR_IP, "XXX000"="Unccpt"), default_CCSR_OP = recode(default_CCSR_OP, "XXX111"="Unccpt")) %>% # Remove default categories that match across settings filter(default_CCSR_IP!=default_CCSR_OP) %>% count(default_CCSR_IP, default_CCSR_OP, sort=T) }
ccsr_dx()
is different than most of the other functions in this package, because each
ICD code can return multiple categories per ICD code. This means that we'll be dealing
with lists. To demonstrate, consider the two ICD-10 codes below:
|pt_id | DX_NUM|I10_DX
|
|:--------|:------|:--------|
|John Doe | 1|A401
|
|John Doe | 2|K432
|
The code A401
maps to two CCSR categories (INF002 & INF003) while K432
only has
one associated CCSR category (DIG010)
ccsr_dx("A401") ccsr_dx("K432")
Appending this to the previous table gives us
|pt_id | DX_NUM|I10_DX
|CCSR |
|:--------|------:|:--------|:--------------|
|John Doe | 1|A401
|INF002, INF003 |
|John Doe | 2|K432
|DIG010 |
df.listcol <- dplyr::tibble(pt_id = "John Doe", DX_NUM = 1:2, I10_DX = c("A401", "K432")) %>% mutate(CCSR = ccsr_dx(I10_DX)) df.listcol str(df.listcol$CCSR)
The column CCSR in the table above is now a list. For those unfamiliar with list
columns the book Functional Programming
has a helpful overview chapter and the Rectangle vignette in tidyr. You can also use View(df.listcol)
in RStudio to see the
contents of the list column in a more familiar structure.
If you want to get this list-column into an atomic vector, tidyr::unnest_longer()
will
make a new row for every element within the list
df.listcol %>% unnest_longer(CCSR)
df <- tibble::tribble( ~pt_id, ~DX_NUM, ~ICD10, "A", 1, "K432", "A", 2, "A401", "B", 1, "I495", "B", 2, "BadCode", "C", 1, "E8771", "C", 2, "A5442", "C", 3, "A564" )
df %>% mutate(CCSR = ccsr_dx(ICD10)) %>% tidyr::unnest_longer(CCSR) %>% mutate(CCSR_expl = explain_ccsr(CCSR))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.