Introduction to the ICD10gm Package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

ICD10gm is an R Package for working with the German Modification of the International Statistical Classification of Diseases and Related Health Problems (ICD-10-GM).

The ICD-10 classification is an international standard for the coding of health service data. It is used widely both to document morbidity in healthcare systems, usually in the context of remuneration claims, and to encode mortality statistics. In Germany, the German Instutite of Medical Documentation and Information (DIMDI) releases a German Modification (ICD-10-GM) of the classification that forms a compulsory part of all remuneration claims in the ambulatory and hospital sectors. Further information and historical context can be found in, for example, Graubner (2007) or Jetté et al (2010).

Aims

This package was created to facilitate the analysis of data coded using the ICD-10-GM. In particular, it has the following aims:

  1. Provide convenient access to the extended ICD-10-GM metadata
  2. Identify and extract ICD-10 codes from character strings
  3. Facilitate the specification of ICD codes for analysis, utilising the ICD hierarchy (e.g. given the specification "A0" return all subcodes in the range "A01" to "A09")
  4. Enable the historization of ICD specifications when analysing longitudinal claims data, applying the automatic code transitions provided by DIMDI, identifying potentially problematic codes and enabling the specification of custom transitions

ICD10gm is designed for use in the context of medical and health services research using routinely collected claims data. It is not suitable for use in operative coding as it does not include all relevant metadata (e.g. inclusion and exclusion notes and the detailed definitions of psychiatric diagnoses). The metadata provided in the ICD10gm package is not intended to replace the official DIMDI documentation, which should always be consulted when specifying ICD codes for analysis.

The following presents an overview of the basic functionality provided by the ICD10gm package, illustrated by means of simple examples. To access this vignette in R, type:

vignette("icd10gm_intro", package = "ICD10gm")

Basic Use

Access ICD-10-GM metadata

The ICD-10-GM metadata are provided by four data.frames that form the core of the ICD10gm package:

Documentation for the individual datasets can be accessed using the R help system by typing, for example, either help("icd_meta_codes", package = "ICD10gm") or simply ?icd_meta_codes.

While column names have been translated into English, the ICD-10-GM labels are in German with UTF-8 character encoding throughout.

In addition to this tabular data, several utility functions are provided to perform common queries on the metadata.

Example

First, we load the ICD10gm package alongside some tidyverse packages:

library(dplyr)
library(purrr)
library(tidyr)
library(ICD10gm)

By way of example, we examine the coding of unspecific gastroenteritis (i.e. without identification of a specific cause), a very common diagnosis in primary care. We can look up the appropriate code as follows:

icd_search("gastroenteritis", level = 3)

We see that A09 is used for infectious gastroenteritis, whereas K52 corresponds to non-infectious gastroenteritis. We are interested in A09, but might wish to read up on the details in the official documentation:

icd_browse("A09")

This will open the documentation in our system's default browser.

Now, we check whether whether this code has been affected by code transitions in any revision since 2003:

icd_showchanges_icd3("A09") %>%
  knitr::kable(row.names = FALSE)

Diagnoses that, prior to 2009, were coded as K52.9 are now coded as A09.9. We can investigate exactly what changed by looking the relevant codes for the years 2009 and 2010:

get_icd_labels(icd3 = c("A09", "K52"), year = 2009:2010) %>%
  arrange(year, icd_sub) %>% 
  filter(icd_sub %in% c("K529") | icd3 == "A09") %>% 
  select(year, icd_normcode, label) %>% 
  knitr::kable(row.names = FALSE)

Prior to 2010, A09 had been reserved for gastroenteritis of presumed infectious origin (German: vermutlich infektiösen Ursprungs), with unspecified gastroenteritis coded by K52.9. Since 2010, A09.9 codes any unspecified gastroenteritis, with K52.9 reserved for cases determined to be non-infectious. The effect of this change is that A09.9 has replaced K52.9 as the unspecific code used to document the vast majority of routine cases in primary care. Failure to account for this would constitute a major error in medical or epidemiological research.

Test whether a string represents an ICD-10-GM code

The function is_icd_code tests whether a character vector represents a valid ICD-10-GM code (i.e. a code listed in the data.frame icd_meta_codes, allowing for alternative code specifications). The test may be limited to a particular version of the ICD-10-GM by specifying the year argument.

Examples

The function is_icd_code recognises ICD codes regardless of their formatting, returning TRUE if the string is recognised as an ICD code and FALSE otherwise:

is_icd_code(c("E10.1", "E101", "E10.1-", "J44", "This is not an ICD code"))

Extracting ICD codes from a string

The function icd_parse extracts all ICD-10 codes from an arbitrary character vector. On the one hand, this may be used as in the icd_expand function to convert ICD-10 codes to a standardised format or extract parts of the code. On the other hand, it may be used to extract potentially many ICD-10 codes from any document that can be converted to text format (perhaps using the pdftools package to scrape a PDF document or rvest to scrape a website).

Example: Scraping codes from a website

As an example of how ICD10gm can be used to extract ICD codes from arbitrary text, the following code uses the rvest package to scrape the code block "A00-A09" from the online version of the DIMDI ICD-10-GM reference. We apply the filter to exclude codes below A10, thus revealing which other ICD-10 codes are reference from this block. To simply the package building process, the code has not been evaluated. This is left as an exercise to the reader.

library(dplyr)
library(rvest)

read_html("https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2018/block-a00-a09.htm") %>% 
  html_text() %>%
  icd_parse(type = "bounded") %>%
  select(-icd_spec) %>% 
  unique() %>% 
  filter(icd_sub >= "A10") %>% 
  arrange(icd_sub) %>% 
  left_join(
    get_icd_labels(year = 2018)[, c("icd_sub", "icd_normcode", "label")],
    by = "icd_sub") %>% 
  select(icd_normcode, label) %>% 
  knitr::kable(row.names = FALSE,
               caption = "Additional ICD-10 codes referred to in block A00-A09 (Intestional infectious diseases) of the ICD-10-GM (2018).")

Expand a ICD specification down the hierarchy

The function icd_expand takes a data.frame containing ICD codes and optional metadata as input. It returns a data.frame containing all ICD codes at or below the specified level of the hierarchy (e.g. the specification "E11" is expanded to include all three, four and five-digit codes beginning with E11). Expansion is done within a specified version of the ICD-10-GM (e.g. year 2018).

Example

Irritable bowel syndrome is coded using either the three-digit code K58 (conceiving IBS as the somatic condition) or the code F45.32 (focussing on IBS as a psychosomatic condition). We can retrieve all subcodes in the year 2019 as follows:

icd_k58 <- data.frame(DIAG_GROUP = c("IBS", "IBS"), ICD_SPEC = c("K58", "F45.32")) %>% 
  icd_expand(col_icd = "ICD_SPEC", year = 2019, col_meta = "DIAG_GROUP")

knitr::kable(icd_k58)

Note that the data.frame containing the specification should normally be stored as a separate metadata file (eg. csv or Excel format) to facilitate maintenance and sharing of the specification. The column DIAG_GROUP is a label that can be allocated to one or multiple rows of the specification and is useful when aggregating diagnoses. This is similar to the concept of diagnosis groupers used, for example, in risk adjustment schemes (e.g. as operated by the German Federal Social Insurance Office). In this case, we may want to treat the two alternative codes as equivalent by allocating the label "IBS" to both. In this way, we overcome the common problem that, in practice, multiple codes are used to document the same underlying disease.

Historise an ICD specification

The function icd_history takes the result of icd_expand, specified for a particular year, and returns a data.frame containing all corresponding codes for the specified years (from 2003). To do this, it applies the ICD-10-GM transition tables to map codes between successive ICD-10-GM versions. Only automatic transitions are followed to ensure that the specification retains its meaning. Custom transitions, tailored to the needs of the project at hand, can be specified to yield a more complete history.

Example

We historise the code K58, specified for the year 2019, backwards to obtain the corresponding codes for the years 2017 to 2019:

icd_history(icd_k58, years = 2017:2019) %>% 
  select(icd_spec, DIAG_GROUP, year, icd_code) %>% 
  arrange(year, icd_code)

Copyright

Program code is released under the MIT license.

The underlying ICD-10-GM metadata is copyright of the German Instutite of Medical Documentation and Information (DIMDI). The source files are available free of charge from the DIMDI Download Centre. I believe that their use in this package is compatible with the copyright restrictions. In particular:

Summary

The ICD10gm package provides a convenient means of accessing and manipulating the German modification of the ICD-10 classification. It is designed for use in medical, epidemiological and health services research.

To the author's knowledge, this package represents the only publicly available repository of pre-processed metadata for the ICD-10-GM. Indeed, a key contribution of the package is the compilation and processing of the metadata provided by DIMDI, which is designed more for the needs of operational use than for the purpose of longitudinal secondary data analysis.

Building on the metadata, the ICD10gm package provides various functions to facilitate the analysis of ICD-10 data. Possible uses include:

Related Packages

The icd package has similar aims but has a slightly different approach and focusses on the US version of the ICD-10 system.

Cite

citation(package = "ICD10gm")


Try the ICD10gm package in your browser

Any scripts or data that you put into this service are public.

ICD10gm documentation built on March 7, 2023, 7:03 p.m.