knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4, tibble.print_max = 4)
library('magrittr')

icd10es is a tool for Spanish-speaking Bioinformatics specialist who have to deal with classifying written descriptions of diseases, symptoms, and injuries, among other health-related issues, in the 10th edition of the International Statistical Classification of Diseases and Related Health Problems (ICD-10 for short), referred to as CIE-10 in Spanish. This package offers the following functionalities:

Printing information of a CIE-10 entry

Let's start with a simple task: say you wish to know what the entry 'A00.0' in the catalog contains. The function printInfo can help with that. Changing the value of the parameter tabular you can decide whether you want to

library(icd10es)

printInfo('S72.1', tabular = 'single')

printInfo('S72.1', tabular = 'simple')

printInfo('S72.1', tabular = 'full')

Looking up a string in the catalog

The main function of icd10es consists in entering a string which is expected to match some entry in the CIE-10 and finding said entry, all via the ICDLookUp function. The string does not have to be identical to the entry: herein lies the usefulness of the package.

The function first tries to find an exact match in the catalog, but often it occurs that the string either has a typo of some kind (e.g. writing 'pnuemonia' instead of 'pneumonia) or uses a more colloquial way of referring to the disease or symptom and is not its 'full name'. When this happens, the function tries fuzzy matching using the Jaro-Winkler similarity metric.

For example, in the CIE-10, all cancers are referred to in a more formal way, such as 'tumor maligno del colon' instead of 'cancer de colon' (in English: 'malignant neoplasm of colon' instead of 'colon cancer'). ICDLookUp would give the following output:

ICDLookUp('cancer de colon', tabular = 'simple')

Note how the tabular parameter is inherited to printInfo.

When doing fuzzy matching, one can be more or less strict. This is reflected in the jwBound parameter of ICDLookUp: the Jaro-Winkler similarity goes from 0 (no similarity) to 1 (exact match), and the default value of jwBound is 0.9. That is, only entries with a similarity to the entered string equal or higher than 0.9 will be considered. But if one finds that the function didn't find a result, one can try lowering the bound:

ICDLookUp('sindrome dandie-waker', jwBound = 0.9, tabular = 'simple')

ICDLookUp('sindrome dandie-waker', jwBound = 0.8, tabular = 'simple')

Using an external catalog

It can happen that the user wants to look up strings in a different, specialized catalog. This could be for example when using an auxiliary catalog which has alternative names of some diseases due to regional variations (like when a country or a country's province historically calls a disease in a special way).

This can be done by making the ICDLookUp parameter useExternal = TRUE, and by giving a dataframe to externalCatalog:

auxCatalog <- read.csv('https://raw.githubusercontent.com/mcarmonabaez/icd10es/master/inst/extdata/inputs/diabetes_subcategories.csv',
                       sep = '\t')
ICDLookUp('Diabetes tipo i con coma', tabular = 'simple',
          useExternal = TRUE, externalCatalog = auxCatalog)

Looking up entries within death certificates

It is very common to be in possession of longer texts that describe a series of diseases and symptoms which could be matched to the CIE-10. Some examples include death certificates or medical records. There, a physician may list some or all comorbidities a person presents when having a medical checkup or when passing away. One may then wish to match all listed health-related problems with the CIE-10.

exampleCerificates <-
  tibble::tribble(~id, ~cause,
                  1, 'HEMORRAGIA SUBARACNOIDEA. HIPERTENSION ARTERIAL SISTEMICA. DISLIPIDEMIA.',
                  2, 'INFARTO CEREBRAL, HIPERTENSION ARTERIAL SISTEMICA, TRIGLICERIDEMIA.',
                  3, 'HERIDA PRODUCIDA POR PROYECTIL DE ARMA DE FUEGO PENETRANTE DE TORAX.',
                  4, 'CHOQUE HIPOVOLEMICO, DIARREA CRONICA, INFECCION POR VIRUS DE INMUNODEFICIENCIA HUMANA.',
                  5, 'EVENTO VASCULAR CEREBRAL, ENFERMEDAD RENAL TERMINAL, DIABETES MELLITUS TIPO 2.',
                  6, 'ANEURISMA CEREBRAL, ENCEFALOPATIA HEPATICA.',
                  7, 'INFARTO AGUDO AL MIOCARDIO, CARDIOPATIA HIPERTENSIVA, HIPERTENSION ARTERIAL SISTEMICA',
                  8, 'MENINGIOMA, HIPERTENSION ARTERIAL SISTEMICA.',
                  9, 'ENCEFALOPATIA HEPATICA, CIRROSIS HEPATICA, ALCOHOLISMO CRONICO',
                  10, 'INFARTO AGUDO AL MIOCARDIO, DIABETES MELLITUS TIPO II.'
  )

exampleCerificates

First, one would have to tokenize each entry in the certificate, creating a long dataframe in the following way using tokenizeCertificates:

tokenizedCerificates <- tokenizeCertificates(exampleCerificates)
print(tokenizedCerificates, n = Inf)

One can then proceed to use ICDLookUp to try to find an entry in the catalog for each of the entries in the certificate:

results <- lapply(unique(tokenizedCerificates$id),
                     function(x) {
                       print(x)
                       subset <- dplyr::filter(tokenizedCerificates, id == x)
                       lapply(subset$cause, ICDLookUp)
                     }) %>%
  dplyr::bind_rows(.id = 'id',) %>%
  dplyr::mutate(id = as.numeric(id)) %>%
  dplyr::arrange(id) %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(order = dplyr::row_number()) %>%
  dplyr::ungroup() 
tokenizedCerificates$result <- results$disease
print(tokenizedCerificates, n = Inf)


mcarmonabaez/icd10es documentation built on June 16, 2021, 11:24 p.m.