read_ukb | R Documentation |
Reads a UK Biobank main dataset file into R using either
fread
or read_dta
. Optionally
renames variables with descriptive names, add variable labels and label coded
values of type character as factors.
read_ukb(
path,
delim = "auto",
data_dict = NULL,
ukb_data_dict = get_ukb_data_dict(),
ukb_codings = get_ukb_codings(),
descriptive_colnames = TRUE,
label = TRUE,
max_n_labels = 30,
na.strings = c("", "NA"),
nrows = Inf,
...
)
path |
The path to a UK Biobank main dataset file. |
delim |
Delimiter for the UKB main dataset file. Default is "auto" (see
|
data_dict |
A data dictionary specific to the UKB main dataset file,
generated by |
ukb_data_dict |
The UKB data dictionary (available online at the UK
Biobank
data
showcase. This should be a data frame where all columns are of type
|
ukb_codings |
The UKB codings file (available online at the UK Biobank
data
showcase. This should be a data frame where all columns are of type
|
descriptive_colnames |
If |
label |
If |
max_n_labels |
Coded variables with associated value labels less than or
equal to this threshold will be labelled as factors. If |
na.strings |
A character vector of strings which are to be interpreted as |
nrows |
The maximum number of rows to read. Unlike |
... |
Additional parameters are passed on to either
|
Note that na.strings
is not recognised by
read_dta
. Reading in a STATA file may therefore require
careful checking for empty strings that need converting to NA
.
A UK Biobank phenotype dataset as a data table with human-readable variables labels and data values.
library(magrittr)
# get dummy UKB data dictionary and codings
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")
# file path to dummy UKB main dataset
dummy_ukb_main_path <- get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE)
# read dummy UKB main dataset into R
read_ukb(
path = dummy_ukb_main_path,
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
) %>%
# (convert to tibble for concise print method)
tibble::as_tibble()
# to read only a subset of variables, create a data dictionary and filter
# for selected variables, then supply to `read_ukb()`
data_dict_selected <- make_data_dict(
ukb_main = dummy_ukb_main_path,
ukb_data_dict = dummy_ukb_data_dict
) %>%
dplyr::filter(FieldID %in% c("eid", "31", "34", "21001"))
read_ukb(
path = dummy_ukb_main_path,
data_dict = data_dict_selected,
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
)
# set `descriptive_colnames` and `label` to FALSE to read the raw dataset as is
read_ukb(
path = dummy_ukb_main_path,
data_dict = data_dict_selected,
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings,
descriptive_colnames = FALSE,
label = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.