tidy_clinical_events: Tidy clinical events data from a UK Biobank main dataset

View source: R/clinical_events.R

tidy_clinical_eventsR Documentation

Tidy clinical events data from a UK Biobank main dataset

Description

Data in a UK Biobank main dataset is stored in wide format i.e. a single row of data per UK Biobank participant ('eid's). Clinical events may be ascertained from numerous sources (e.g. self-reported medical conditions, linked hospital records) with coded events and their associated dates recorded across multiple columns. This function tidies this data into a standardised long format table.

Usage

tidy_clinical_events(
  ukb_main,
  ukb_data_dict = get_ukb_data_dict(),
  ukb_codings = get_ukb_codings(),
  clinical_events_sources = c("primary_death_icd10", "secondary_death_icd10",
    "self_report_medication", "self_report_non_cancer", "self_report_non_cancer_icd10",
    "self_report_cancer", "self_report_operation", "cancer_register_icd9",
    "cancer_register_icd10", "summary_hes_icd9", "summary_hes_icd10",
    "summary_hes_opcs3", "summary_hes_opcs4"),
  strict = TRUE,
  .details_only = FALSE
)

Arguments

ukb_main

A UK Biobank main dataset.

ukb_data_dict

The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

ukb_codings

The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

clinical_events_sources

A character vector of clinical events sources to tidy. By default, all available options are included.

strict

If TRUE, raise an error if required columns for any clinical events sources listed in clinical_events are not present in ukb_main. If FALSE, then a warning message will be displayed instead. Default value is TRUE.

.details_only

If TRUE, return a list detailing required Field IDs

Details

A named list of data frames is returned, with the names corresponding to the data sources specified by clinical_events. Each data frame has the following columns:

  • eid - participant identifier

  • source - the FieldID (prefixed by 'f') where clinical codes were extracted from. See clinical_events_sources for further details.

  • index

    • the corresponding instance and array (e.g. '0-1' means instance 0 and array

    1. code - clinical code. The type of clinical codings system used depends on source.

    2. date - associated date. Note that in cases where participants self-reported a medical condition but recorded the date as either 'Date uncertain or unknown' or 'Preferred not to answer' (see data coding 13) then the date is set to NA.

Value

A named list of clinical events data frames.

Other notes

Results may be combined into a single data frame using bind_rows.

See Also

Other clinical events: clinical_events_sources(), example_clinical_codes(), extract_phenotypes(), make_clinical_events_db()

Examples

# dummy UKB main dataset and metadata
dummy_ukb_main <- get_ukb_dummy("dummy_ukb_main.tsv")
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

# tidy clinical events in a UK Biobank main dataset
clinical_events <- tidy_clinical_events(
  ukb_main = dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings
)

# returns a named list of data frames, one for each `clinical_events_source`
names(clinical_events)

clinical_events$summary_hes_icd10

# use .details_only = TRUE to return details of required Field IDs for
# specific clinical_events sources
tidy_clinical_events(.details_only = TRUE)

rmgpanw/ukbwranglr documentation built on April 30, 2024, 7:47 a.m.