make_clinical_events_db: Create a SQLite database with a 'clinical_events' table

View source: R/sqlite_db.R

make_clinical_events_dbR Documentation

Create a SQLite database with a clinical_events table

Description

Adds tables named clinical_events, and optionally 'gp_clinical_values' and 'gp_scripts_names_and_quantities' to a SQLite database file (the latter 2 are only added if gp_clinical_path and/or gp_scripts_path respectively are provided). This is a long format table combining all clinical events data from a UK Biobank main dataset and the UK Biobank primary care clinical events dataset. Use clinical_events_sources() to see a list of all currently included clinical events sources. Expect this to take ~1 hour to finish running.

Usage

make_clinical_events_db(
  ukb_main_path,
  ukb_db_path,
  ukb_main_delim = "auto",
  gp_clinical_path = NULL,
  gp_scripts_path = NULL,
  ukb_data_dict = get_ukb_data_dict(),
  ukb_codings = get_ukb_codings(),
  overwrite = FALSE,
  chunk_size = 5e+05
)

Arguments

ukb_main_path

Path to the main UKB dataset file.

ukb_db_path

Path to the SQLite database file. The file name must end with '.db'. If no file with this name exists then one will be created.

ukb_main_delim

Delimiter for ukb_main_path. Default value is "auto".

gp_clinical_path

(Optional) path to the UKB primary care clinical events file (gp_clinical.txt).

gp_scripts_path

(Optional) path to the UKB primary care prescriptions file (gp_scripts.txt).

ukb_data_dict

The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

ukb_codings

The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

overwrite

If TRUE, then tables clinical_events and gp_clinical_values will be overwritten if they already exist in the database. Default value is FALSE.

chunk_size

The number of rows to include in each chunk when processing primary care datasets.

Details

See the introduction to dbplyr vignette for getting started with databases and dplyr::dplyr.

Indexes are set on the source, code and eid columns in the clinical_events table for faster querying.

Value

Returns ukb_db_path invisibly.

See Also

Other clinical events: clinical_events_sources(), example_clinical_codes(), extract_phenotypes(), tidy_clinical_events()

Examples

# dummy UKB data dictionary and codings
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

# file paths to dummy UKB main and primary care datasets
dummy_ukb_main_path <- get_ukb_dummy(
  "dummy_ukb_main.tsv",
  path_only = TRUE
)

dummy_gp_clinical_path <- get_ukb_dummy(
  "dummy_gp_clinical.txt",
  path_only = TRUE
)

dummy_gp_scripts_path <- get_ukb_dummy(
  "dummy_gp_scripts.txt",
  path_only = TRUE
)

# file path where SQLite database will be created
dummy_ukb_db_path <- file.path(tempdir(), "ukb.db")

# build database
suppressWarnings(make_clinical_events_db(
  ukb_main_path = dummy_ukb_main_path,
  gp_clinical_path = dummy_gp_clinical_path,
  gp_scripts_path = dummy_gp_scripts_path,
  ukb_db_path = dummy_ukb_db_path,
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings,
))

# connect to database
con <- DBI::dbConnect(
  RSQLite::SQLite(),
  dummy_ukb_db_path
)

ukbdb <- db_tables_to_list(con)

# table names
names(ukbdb)

# view tables
ukbdb$clinical_events

ukbdb$gp_clinical_values

ukbdb$gp_scripts_names_and_quantities

# close database connection
DBI::dbDisconnect(con)

rmgpanw/ukbwranglr documentation built on April 30, 2024, 7:47 a.m.