README.md

ukbschemas

Lifecycle:
experimental Build
Status

This R package can be used to create and/or load a database containing the UK Biobank Data Showcase schemas, which are data dictionaries describing the structure of the UK Biobank main dataset.

Installation

You can install the current version of ukbschemas from GitHub with:

# install.packages("devtools")
devtools::install_github("bjcairns/ukbschemas")

library(ukbschemas)

Examples

The package supports two workflows.

Save-Load workflow (recommended)

The recommended approach is to use ukbschemas_db() to download the schema tables and save them to an SQLite database, then use load_db() to load the tables from the database and store them as tibbles in a named list:

db <- ukbschemas_db(path = tempdir())
sch <- load_db(db = db)

By default, the database is named ukb-schemas-YYYY-MM-DD.sqlite (where YYYY-MM-DD is the current date) and placed in the current working directory. (path = tempdir() in the above example puts it in the current temporary directory instead.) At the most recent compilation of the database (03 August 2019), the size of the .sqlite database file produced by ukbschemas_db() was approximately 10.1MB.

Note that without further arguments, ukbschemas_db() tidies up the database to give it a more consistent relational structure (the changes are summarised in the output of the first example, above). Alternatively the raw data can be loaded with the as_is argument:

db <- ukbschemas_db(path = tempdir(), overwrite = TRUE, as_is = TRUE)

The overwrite option allows the database file to be overwritten (if TRUE), or prevents this (FALSE), or if not specified and the session is interactive (interactive() == TRUE) then the user is prompted to decide.

Note: If you have created a schemas database with an earlier version of ukbschemas, it should be possible to load that database with the latest version of load_db(), which (currently) should load any SQLite database, regardless of contents.

Load-Save workflow

The second approach is to download the schemas and store them in memory in a list, and save them to a database only as requried.

This is not recommended, because it is better (for everyone) not to download the schema files every time they are needed, and because the database assumes a certain structure that should be guaranteed when the database is saved. If you still want to take this approach, use:

sch <- ukbschemas()
db <- save_db(sch, path = tempdir())

Why R?

This package was originally written in bash (a Unix shell scripting language). However, R is more accessible and all dependencies are loaded when you install the package; there is no need to install any secondary software (not even SQLite).

Notes

Design notes

Known code issues



bjcairns/ukbschemas documentation built on Nov. 4, 2019, 7:22 a.m.