tidy_clinical_events: Tidy clinical events data from a UK Biobank main dataset
In rmgpanw/ukbwranglr: Exploring UKB Data

tidy_clinical_events

R Documentation

Tidy clinical events data from a UK Biobank main dataset

Description

Data in a UK Biobank main dataset is stored in wide format i.e. a single row of data per UK Biobank participant ('eid's). Clinical events may be ascertained from numerous sources (e.g. self-reported medical conditions, linked hospital records) with coded events and their associated dates recorded across multiple columns. This function tidies this data into a standardised long format table.

Usage

tidy_clinical_events(
  ukb_main,
  ukb_data_dict = get_ukb_data_dict(),
  ukb_codings = get_ukb_codings(),
  clinical_events_sources = c("primary_death_icd10", "secondary_death_icd10",
    "self_report_medication", "self_report_non_cancer", "self_report_non_cancer_icd10",
    "self_report_cancer", "self_report_operation", "cancer_register_icd9",
    "cancer_register_icd10", "summary_hes_icd9", "summary_hes_icd10",
    "summary_hes_opcs3", "summary_hes_opcs4"),
  strict = TRUE,
  .details_only = FALSE
)

Arguments

`ukb_main`	A UK Biobank main dataset.
`ukb_data_dict`	The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type `character`.
`ukb_codings`	The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type `character`.
`clinical_events_sources`	A character vector of clinical events sources to tidy. By default, all available options are included.
`strict`	If `TRUE`, raise an error if required columns for any clinical events sources listed in `clinical_events` are not present in `ukb_main`. If `FALSE`, then a warning message will be displayed instead. Default value is `TRUE`.
`.details_only`	If `TRUE`, return a list detailing required Field IDs

Details

A named list of data frames is returned, with the names corresponding to the data sources specified by clinical_events. Each data frame has the following columns:

eid - participant identifier
source - the FieldID (prefixed by 'f') where clinical codes were extracted from. See clinical_events_sources for further details.
index
- the corresponding instance and array (e.g. '0-1' means instance 0 and array
2. code - clinical code. The type of clinical codings system used depends on source.
3. date - associated date. Note that in cases where participants self-reported a medical condition but recorded the date as either 'Date uncertain or unknown' or 'Preferred not to answer' (see data coding 13) then the date is set to NA.

Value

A named list of clinical events data frames.

Other notes

Results may be combined into a single data frame using bind_rows.

Examples

# dummy UKB main dataset and metadata
dummy_ukb_main <- get_ukb_dummy("dummy_ukb_main.tsv")
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

# tidy clinical events in a UK Biobank main dataset
clinical_events <- tidy_clinical_events(
  ukb_main = dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings
)

# returns a named list of data frames, one for each `clinical_events_source`
names(clinical_events)

clinical_events$summary_hes_icd10

# use .details_only = TRUE to return details of required Field IDs for
# specific clinical_events sources
tidy_clinical_events(.details_only = TRUE)

rmgpanw/ukbwranglr documentation built on April 30, 2024, 7:47 a.m.