harmonize_ukb_data: Prepare harmonized long format data from various data sources

View source: R/prepare_ukb_data.R

harmonize_ukb_dataR Documentation

Prepare harmonized long format data from various data sources

Description

This function reads ukb files including main dataset (ukbxxxxx.tab) and record based (.txt) files from data portal. It returns a list of 3 elements: lst.data: event tables by episodes organized by source as well as classification system dfukb: the main dataset with columns required for generating lst.data vct.identifiers: a vector of participant identifiers from dfukb

Usage

harmonize_ukb_data(
  f.ukbtab = NULL,
  f.html = NULL,
  dfDefinitions = NULL,
  f.hesin = NULL,
  f.hesin_diag = NULL,
  f.hesin_oper = NULL,
  f.death_portal = NULL,
  f.death_cause_portal = NULL,
  f.gp_clinical = NULL,
  f.gp_scripts = NULL,
  f.withdrawal_list = NULL,
  allow_missing_fields = TRUE,
  death_from_portal = TRUE,
  add_extra_hesin_columns = F,
  ...
)

Arguments

f.ukbtab

Path to the main dataset (.tab) file

f.html

Path to html file containing the metadata of the main dataset which can be generated using ukb utility

dfDefinitions

A processed and expanded definition table (data.table object), which can be generated by read_defnition_table()

f.hesin

Path to HESIN (master file), RECORD LEVEL DATA

f.hesin_diag

Path to HESIN_DIAG file containing diagnosis codes, RECORD LEVEL DATA

f.hesin_oper

Path to HESIN_OPER file containing Operations and procedural codes, RECORD LEVEL DATA

f.death_portal

Path to file with DEATH table, RECORD LEVEL DATA

f.death_cause_portal

Path to file with DEATH_CAUSE table, RECORD LEVEL DATA

f.gp_clinical

Path to GP clinical event records, RECORD LEVEL DATA

f.gp_scripts

Path to GP prescription event records, RECORD LEVEL DATA

f.withdrawal_list

Path to participant withdrawal list (.csv)

allow_missing_fields

Logical flag specifying whether missing data field(s) is allowed (ignored) by the function. If FALSE, function will halt if any field is missing from the main dataset

death_from_portal

Logical flag specifying whether death records will be read from data portal files and from the main dataset. The main dataset will be taken if the files from data portal are not present (readable).

add_extra_hesin_columns

if True, adds extra columns "ins_index","source"

Value

main dataset as dataframe with only selected data fields

Examples

lst.harmonized.data<-harmonize_ukb_data(f.ukbtab = fukbtab,f.html = fhtml,f.gp_clinical = fgp_clinical,f.gp_scripts = fgp_scripts,f.hesin = fhesin,f.hesin_diag = fhesin_diag,f.hesin_oper =fhesin_oper,f.death_portal = fdeath_portal,f.death_cause_portal = fdeath_cause_portal )
summary(lst.harmonized.data)

niekverw/ukbpheno documentation built on Oct. 30, 2023, 9:17 p.m.