knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(codemapper) library(dplyr)
The CALIBER team have manually curated clinical code lists for 308 common health conditions, providing a rich resource for researchers working with electronic health records.[@kuan2019] All code lists are publicly available in csv format on github.[^1] These are divided into primary care (Read 2 codes and Medcodes) and secondary care (ICD10 and OPCS4).
[^1]: See also the HDRUK Phenotype Library for even more clinical code lists.
The UK Biobank contains both linked primary and secondary care diagnostic records. Primary care records are in both Read 2 and Read 3 formats, while secondary care records are in both ICD10 and ICD9 formats (although the large majority of these are ICD10) as well as OPCS4. Data analysts may therefore wish to extend the CALIBER resource by mapping from Read 2 to Read 3, and from ICD10 to ICD9. The raw Read 2 and ICD10 CALIBER codes also require reformatting to match the format in UK Biobank data.
The CALIBER repository may be imported into R, reformatted for use with UK Biobank data, and mapped to Read 3 and ICD9 equivalents as follows:
# download CALIBER repository, returning file path caliber_dir_path <- download_caliber_repo() # read all CALIBER codes into R caliber_raw <- read_caliber_raw(caliber_dir_path) # build all_lkps_maps resource - contains clinical code lookup and mapping tables all_lkps_maps <- build_all_lkps_maps() # reformat CALIBER codes for UK Biobank, and map from ICD10 and Read 2 to ICD9 and Read 3 respectively. Expect various warnings to be raised at this stage caliber_ukb <- reformat_caliber_for_ukb( caliber_raw, all_lkps_maps = all_lkps_maps, overlapping_disease_categories_csv = default_overlapping_disease_categories_csv() )
Using dummy data:
# read all codes from dummy CALIBER repo into R caliber_raw_dummy <- read_caliber_raw(dummy_caliber_dir_path()) # build dummy all_lkps_maps resource all_lkps_maps_dummy <- build_all_lkps_maps_dummy() # reformat CALIBER codes for UK Biobank, and map from ICD10 and Read 2 to ICD9 and Read 3 respectively (warnings suppressed) caliber_ukb_dummy <- suppressWarnings(reformat_caliber_for_ukb( caliber_raw_dummy, all_lkps_maps = all_lkps_maps_dummy, overlapping_disease_categories_csv = default_overlapping_disease_categories_csv() )) # view first few rows caliber_ukb_dummy %>% head() %>% knitr::kable()
This vignette outlines the steps performed by read_caliber_raw()
and reformat_caliber_for_ukb()
to achieve this.
Before:
caliber_raw_dummy$read2 %>% arrange(category) %>% head() %>% knitr::kable()
After:
caliber_ukb_dummy %>% arrange(category) %>% filter(code_type == "read2") %>% head() %>% knitr::kable()
The following read codes are not present in the Read 2 lookup table provided by UK Biobank resource 592:
| CALIBER disease | Unrecognised Read 2 code | |------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| | Alcohol Problems | Z191.; Z1911; Z1912; Z4B1. | | Anxiety disorders | Z481.; Z4L1. | | Coeliac disease | ZC2C2 | | Crohn's disease | ZR3S. | | Dementia | ZS7C5 | | End stage renal disease | Z1A..; Z1A1.; Z1A2.; Z919.; Z9191; Z9192; Z9193; Z91A. | | Erectile dysfunction | Z9E9.; ZG436 | | Hearing loss | Z8B5.; Z8B51; Z8B53; Z8B55; Z911.; Z9111; Z9113; Z9114; Z9115; Z9117; Z9118; Z9119; Z911A; Z911B; Z911E; Z911G; Z9E81; ZE87.; ZL716; ZN569; ZN56A | | Heart failure | ZRad. | | Intrauterine hypoxia | Z2648; Z2649; Z264A; Z264B | | Lupus erythematosus (local and systemic) | ZRq8.; ZRq9. | | Obesity | ZC2CM | | Other psychoactive substance misuse | 9G24.; 9K4..; Z1Q62; Z416. | | Tinnitus | Z9112; ZEB.. | | Transient ischaemic attack | Z7CE7 | | Urinary Incontinence | Z9EA. | | Visual impairment and blindness | Z96..; Z961.; Z962.; ZK74.; ZN568; ZN56A; ZRhO.; ZRr6. |
ALT_CODE
format used in UK Biobank data. Note that while undivided three character ICD10 codes are flagged by an 'X' suffix in UK Biobank resource 592 (e.g. 'A38X', Scarlet fever), the suffix does not appear in the UK Biobank dataset itself (e.g. 'A38X' should instead appear as 'A38').[^2]: See this warning note (under the 'Coding lists' tab)
Before:
caliber_raw_dummy$icd10 %>% arrange(category) %>% head() %>% knitr::kable()
After:
caliber_ukb_dummy %>% filter(code_type == "icd10") %>% arrange(category) %>% head() %>% knitr::kable()
The following ICD10 codes are not present in the ICD10 lookup table provided by UK Biobank resource 592:[^3]
[^3]: These were retired after the 4th ICD10 edition, whereas the lookup table in UK Biobank resource 592 is based on the 5th edition.
| CALIBER disease | Unrecognised ICD10 code | |---------------------------------------------|-------------------------| | Infections of Other or unspecified organs | A90; A91 | | Viral diseases (excl chronic hepatitis/HIV) | A90; A91 |
Before:
caliber_raw_dummy$opcs4 %>% arrange(category) %>% head() %>% knitr::kable()
After:
caliber_ukb_dummy %>% filter(code_type == "opcs4") %>% arrange(category) %>% head() %>% knitr::kable()
Mapping from Read 2 to Read 3 is performed using the read_v2_read_ctv3
mapping sheet from UK Biobank resource 592. Points to be aware of:
read_v2_read_ctv3
are flagged as 'not assured' (IS_ASSURED
'0'). These mappings are excluded by default - this action can be adjusted with the col_filters
argument to reformat_caliber_for_ukb()
.Mapping from ICD10 to ICD9[^4] is performed using the icd9_icd10
mapping sheet from UK Biobank resource 592. Points to be aware of:
[^4]: Note that there are relatively few ICD9 diagnostic records.
There are a number of rows with missing values for either DESCRIPTION_ICD9
or DESCRIPTION_ICD10
, indicating that these codes have no ICD9/ICD10 equivalent.[^5]
One-to-many mappings occur in either direction (i.e. ICD9 to ICD10, and ICD10 to ICD9).
[^5]: Although some of these codes look like they should map to each other (e.g. ICD9 '0030' SALMONELLA GASTROENTERITIS and ICD10 'A020' Salmonella enteritis).
The mapping process results in some codes appearing under more than one disease category within a single disease. As a general rule, subcategories within a clinical code list should be mutually exclusive (e.g. a clinical code list for diabetes may be sub categorised into type 1 and type 2 diabetes - a clinical code for type 1 diabetes should not also be used for type 2 diabetes).[^6]
[^6]: Note that clinical codes may appropriately appear under more than one disease however (e.g. 'E103' Type 1 diabetes mellitus With ophthalmic complications, is listed under both 'Diabetes' and 'Diabetic ophthalmic complications' by CALIBER)
By default, these cases are dealt with by using default_overlapping_disease_categories_csv()
with reformat_caliber_for_ukb()
. This uses the following csv file, which has been manually annotated ('Y' in column 'keep') to indicate which disease category a code should belong to:
read.csv(default_overlapping_disease_categories_csv()) %>% knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.