The routines in the medicalrisk package [@R-medicalrisk] are designed to help determine comorbidity and medical risk status of a given patient using several popular models published in the peer-reviewed literature.
Administrative healthcare data is frequently the only available source for determining individual risk of mortality when looking at thousands or millions of patient records. Medical chart abstraction just isn't feasible for projects of this scale.
In the United States, the records for every inpatient and outpatient encounter is reviewed by a qualified medical coder who assigns a set of diagnosis and procedural codes based on phrases within the medical record. The coding system currently in use is ICD-9-CM. ICD-9-CM is an adaptation of the venerable ICD-9 standard which was developed in 1978. The U.S. National Center for Health Statistics (NCHS) developed ICD-9-CM, which has been required for Medicare and Medicaid claims since 1979. ICD-9-CM is updated annually.
At some point, perhaps as soon as October 2015, ICD-10-CM codes will need to be used instead. It is likely that "dual coding" of claims in both sets will continue for some time.
In the meantime, there is a wealth of administrative data available within the ICD-9-CM diagnostic and procedural codes stored within US healthcare systems.
In order to demonstrate this package, this package includes data on 100 patients from the Vermont Uniform Hospital Discharge Data Set for 2011, Inpatient.
library(medicalrisk) library(plyr) data(vt_inp_sample) x <- count(vt_inp_sample, c('id')) cat("average count of ICD codes per patient is: ", mean(x$freq)) y <- count(vt_inp_sample, c('icd9cm'))
library(knitr) kable(head(y[order(-y$freq),], n=5), row.names=F, caption='Top 5 most popular ICD-9-CM codes in this dataset')
Within this package, ICD-9-CM codes are presented as a string where the first letter is "P" or "D" depending on whether the code is Procedure or Diagnosis. The rest of the code is present as a string of numbers. Periods are omitted. In the list above, the code "D4019" is diagnostic code 401.9 which corresponds to Hypertension.
The package includes a set of mapping functions that transform a list of ICD-9-CM codes into a comorbidity matrix:
"Charlson" refers to the Charlson Comorbidity Index[@Charlson1987].
The names
"Deyo"[@Deyo1992],
"Romano"[@Romano1993], and
"Quan"[@Quan2005]
refer to the primary authors of different methods of determining Charlson
comorbidities from ICD-9-CM codes.
"Elixhauser" refers to the Elixhauser comorbidity map,
which is a more detailed list than
Charlson. "AHRQ37" is an adapation of the
AHRQ version 37 software[@AgencyforHealthcareResearchQuality2013].
"Quan" refers to the same paper by Quan mentioned above.
"RCRI" is the Revised Cardiac Risk Index[@Lee1999] set of categories using a method published by Boersma[@Boersma2005].
For example, the #5 ICD-9-CM code above is D25000, or "250.00", which is for "Diabetes Mellitus Unspecified Type". Here's what happens when that code is passed to a few of the mapping functions listed above:
kable( icd9cm_charlson_quan(c('D25000'))) kable( icd9cm_elixhauser_ahrq37(c('D25000'))) kable( icd9cm_rcri(c('D25000')))
For each of these maps the "dm" column becomes TRUE.
The most efficient way to use these maps for a set of patients is to generate a single map for all ICD-9-CM codes in the set and then apply that map to each patient. Here's an example that generates a comorbidity matrix for the first five patients in the Vermont dataset:
cases <- vt_inp_sample[vt_inp_sample$id %in% 1:5, c('id','icd9cm')] cases_with_cm <- merge(cases, icd9cm_charlson_quan(levels(cases$icd9cm)), by.x="icd9cm", by.y="row.names", all.x=TRUE) # generate crude comorbidity summary for each patient kable( ddply(cases_with_cm, .(id), function(x) { data.frame(lapply(x[,3:ncol(x)], any)) }), row.names=F)
The above process is encapsulated in a single function generate_comorbidity_df
.
This function also includes an optimization from Van Walraven that
reduces dmcx
to dm
if the specific diabetic complication is separately coded.
kable( generate_comorbidity_df(cases, icd9mapfn=icd9cm_charlson_quan))
This function only considers each ICD-9-CM code once and then merges the resulting comorbidity flags together for each patient. This makes the function quite fast for large data sets.
Given appropriate arguments, the generate_comorbidity_df
function will
use the parallel backend provided by foreach
to improve performance.
It is common in the medical literature to see a set of comorbidities reduced to an index. When the Charlson Comorbidity Index was first published it had the following weights for each comorbidity:
data(charlson_weights_orig) kable( t(charlson_weights_orig))
However, these weights have not stood the test of time. For example, the
prognosis for HIV/AIDS has dramatically improved.
The medicalrisk package offers the
revised Charlson weights developed by Schneeweiss[@Schneeweiss2003]:
data(charlson_weights) kable( t(charlson_weights))
The generate_charlson_index_df
function will sum the weights for each patient
to generate a final index:
kable( generate_charlson_index_df(generate_comorbidity_df(cases)), row.names=F)
The Risk Stratification Index uses ICD-9-CM codes to determine four risk estimates:
The author of the paper (Sessler) published SPSS code to perform the calculation. The medicalrisk implements the RSi calculation using a method based on the SPSS code.
ddply(cases, .(id), function(x) { icd9cm_sessler_rsi(x$icd9cm) } )
The medicalrisk package can be used to generate risk data from ICD-9-CM codes in large datasets. The above discussion describes basic use of the package. There are some additional helper functions not described above which are included in the per function documentation.
The aim of this package is to include future medical risk estimation procedures as they are published in the literature.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.