This R package can be used to create and/or load a database containing the UK Biobank Data Showcase schemas, which are data dictionaries describing the structure of the UK Biobank main dataset.
You can install the current version of ukbschemas from GitHub with:
# install.packages("devtools")
devtools::install_github("bjcairns/ukbschemas")
library(ukbschemas)
The package supports two workflows.
The recommended approach is to use ukbschemas_db()
to download the
schema tables and save them to an SQLite database, then use load_db()
to load the tables from the database and store them as tibbles in a
named list:
db <- ukbschemas_db(path = tempdir())
sch <- load_db(db = db)
By default, the database is named ukb-schemas-YYYY-MM-DD.sqlite
(where
YYYY-MM-DD
is the current date) and placed in the current working
directory. (path = tempdir()
in the above example puts it in the
current temporary directory instead.) At the most recent compilation of
the database (03 August 2019), the size of the .sqlite database file
produced by ukbschemas_db()
was approximately 10.1MB.
Note that without further arguments, ukbschemas_db()
tidies up the
database to give it a more consistent relational structure (the changes
are summarised in the output of the first example, above). Alternatively
the raw data can be loaded with the as_is
argument:
db <- ukbschemas_db(path = tempdir(), overwrite = TRUE, as_is = TRUE)
The overwrite
option allows the database file to be overwritten (if
TRUE
), or prevents this (FALSE
), or if not specified and the session
is interactive (interactive() == TRUE
) then the user is prompted to
decide.
Note: If you have created a schemas database with an earlier version
of ukbschemas, it should be possible to load that database with the
latest version of load_db()
, which (currently) should load any SQLite
database, regardless of contents.
The second approach is to download the schemas and store them in memory in a list, and save them to a database only as requried.
This is not recommended, because it is better (for everyone) not to download the schema files every time they are needed, and because the database assumes a certain structure that should be guaranteed when the database is saved. If you still want to take this approach, use:
sch <- ukbschemas()
db <- save_db(sch, path = tempdir())
This package was originally written in bash (a Unix shell scripting language). However, R is more accessible and all dependencies are loaded when you install the package; there is no need to install any secondary software (not even SQLite).
esimpint
, esimpstring
,
esimpreal
, esimpdate
, ehierint
, ehierstring
) have been
harmonised and combined into a single table encvalues
. The value
column in encvalues
has type TEXT
, but a type
column has been
added in case the value is not clear from context. The original
type-specific tables have been deleted.categories
, as column parent_id
, from table
catbrowse
(which has been deleted).main_category
column in the fields
schema, but has been renamed
to category_id
for consistency with the categories
schema.value_type
,
stability
, item_type
, strata
and sexed
) are available
elsewhere on the Data Showcase. These have been added manually to
tables valuetypes
, stability
, itemtypes
, strata
and sexed
,
and appropriate ID references have been renamed with the _id
suffix in tables fields
and encodings
.base_type
in fields, availability
in
encodings
and categories
, and others). Additional tables
documenting these encoded values may be included in future versions
(and suggestions are welcome).readr::read_csv()
reads whole numbers as type double
,
not integer
(allowing 64-bit integers without loss of
information), column types in schemas loaded in R will differ
depending on whether the schemas are loaded directly to R or first
saved to a database. This should make little or no difference for
most applications.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.