knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) # Download the files? do_dl <- FALSE save_load_file <- "~/ukbschemas-test-data/ukbschemas_db_test.sqlite"
This R package can be used to create and/or load a database containing the UK Biobank Data Showcase schemas, which are data dictionaries describing the structure of the UK Biobank main dataset.
You can install the current version of ukbschemas from GitHub with:
# install.packages("devtools") devtools::install_github("bjcairns/ukbschemas") library(ukbschemas)
try(rm(list=c("db","sch"))) library(ukbschemas)
The package supports two workflows.
The recommended approach is to use ukbschemas_db()
to download the schema tables and save them to an SQLite database, then use load_db()
to load the tables from the database and store them as tibbles in a named list:
db <- ukbschemas_db(path = tempdir()) sch <- load_db(db = db)
if (do_dl) { file <- paste0(tempdir(), "\\ukb-schemas-", Sys.Date(), ".sqlite") file.copy(file, save_load_file) } finfo <- file.info(save_load_file) fsize <- round(finfo$size/1e6, 1) fmtime <- finfo$mtime
By default, the database is named ukb-schemas-YYYY-MM-DD.sqlite
(where YYYY-MM-DD
is the current date) and placed in the current working directory. (path = tempdir()
in the above example puts it in the current temporary directory instead.) At the most recent compilation of the database (r format(fmtime, "%d %B %Y")
), the size of the .sqlite database file produced by ukbschemas_db()
was approximately r fsize
MB.
Note that without further arguments, ukbschemas_db()
tidies up the database to give it a more consistent relational structure (the changes are summarised in the output of the first example, above). Alternatively the raw data can be loaded with the as_is
argument:
db <- ukbschemas_db(path = tempdir(), overwrite = TRUE, as_is = TRUE)
The overwrite
option allows the database file to be overwritten (if TRUE
), or prevents this (FALSE
), or if not specified and the session is interactive (interactive() == TRUE
) then the user is prompted to decide.
Note: If you have created a schemas database with an earlier version of ukbschemas, it should be possible to load that database with the latest version of load_db()
, which (currently) should load any SQLite database, regardless of contents.
The second approach is to download the schemas and store them in memory in a list, and save them to a database only as requried.
This is not recommended, because it is better (for everyone) not to download the schema files every time they are needed, and because the database assumes a certain structure that should be guaranteed when the database is saved. If you still want to take this approach, use:
sch <- ukbschemas() db <- save_db(sch, path = tempdir())
This package was originally written in bash (a Unix shell scripting language). However, R is more accessible and all dependencies are loaded when you install the package; there is no need to install any secondary software (not even SQLite).
esimpint
, esimpstring
, esimpreal
, esimpdate
, ehierint
, ehierstring
) have been harmonised and combined into a single table encvalues
. The value
column in encvalues
has type TEXT
, but a type
column has been added in case the value is not clear from context. The original type-specific tables have been deleted.categories
, as column parent_id
, from table catbrowse
(which has been deleted).main_category
column in the fields
schema, but has been renamed to category_id
for consistency with the categories
schema.value_type
, stability
, item_type
, strata
and sexed
) are available elsewhere on the Data Showcase. These have been added manually to tables valuetypes
, stability
, itemtypes
, strata
and sexed
, and appropriate ID references have been renamed with the _id
suffix in tables fields
and encodings
.base_type
in fields, availability
in encodings
and categories
, and others). Additional tables documenting these encoded values may be included in future versions (and suggestions are welcome).readr::read_csv()
reads whole numbers as type double
, not integer
(allowing 64-bit integers without loss of information), column types in schemas loaded in R will differ depending on whether the schemas are loaded directly to R or first saved to a database. This should make little or no difference for most applications.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.