ukbnmr | R Documentation |
@description
This package provides utilities for working with the
UK Biobank MR metabolomics data.
Details are provided below, and in the package vignette (type vignette("ukbnmr")
to view).
There are three groups of functions in this package: (1) data extraction, (2) removal of technical variation, and (3) recomputing derived biomarkers and biomarker ratios.
All functions can be applied directly to raw data extracted from UK Biobank.
This package also provides a data.frame
of biomarker information, loaded
as nmr_info
, and data.frame
of sample processing information,
loaded as sample_qc_info
.
The extract_biomarkers()
function will take a phenotype dataset extracted on
the UK Biobank Research Analysis Platform by the Table Exporter tool, extract the
NMR biomarker fields
and give them short comprehensible column names as described in nmr_info
.
Measurements are also split into multiple rows where a participant has measurements
at both baseline and first repeat assessment.
The extract_biomarker_qc_flags()
function will take a phenotype
dataset extracted on the UK Biobank Research Analysis Platform by the Table
Exporter tool, extract the Nightingale quality control flags
for each biomarker measurement, returning a single column per biomarker
(corresponding to respective columns output by extract_biomarkers()
).
The extract_sample_qc_flags()
function will take a phenotype
dataset extracted on the UK Biobank Research Analysis Platform by the Table
Exporter tool and extract the sample quality control tags
for the Nightingale NMR metabolomics data.
These functions will also work with older datasets predating the UK Biobank Research Analysis Platform, e.g. those extracted by ukbconv, and/or by the ukbtools R package.
The remove_technical_variation()
function will take a phenotype
dataset extracted on the UK Biobank Research Analysis Platform by the Table
Exporter tool, extract all the biomarkers and QC flags, remove the effects of
technical variation on biomarker concentrations, and return a list
containing the adjusted NMR biomarker data, biomarker QC flags, and sample quality control
and processing information.
This applies a multistep process as described in Ritchie et al. 2023:
First biomarker data is filtered to the 107 biomarkers that cannot be derived from any combination of other biomarkers.
Absolute concentrations are log transformed, with a small offset applied to biomarkers with concentrations of 0.
Each biomarker is adjusted for the time between sample preparation and sample measurement (hours) on a log scale.
Each biomarker is adjusted for systematic differences between rows (A-H) on the 96-well shipment plates.
Each biomarker is adjusted for remaining systematic differences between columns (1-12) on the 96-well shipment plates.
Each biomarker is adjusted for drift over time within each of the six spectrometers. To do so, samples are grouped into 10 bins, within each spectrometer, by the date the majority of samples on their respective 96-well plates were measured.
Regression residuals after the sequential adjustments are transformed back to absolute concentrations.
Samples belonging to shipment plates that are outliers of non-biological origin are identified and set to missing.
The 61 composite biomarkers and 81 biomarker ratios are recomputed from their adjusted parts.
An additional 76 biomarker ratios of potential biological significance are computed.
Further details can be found in Ritchie S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants, Sci Data 10, 64 (2023). doi: 10.1038/s41597-023-01949-y
The compute_extended_ratios()
function will compute an extended
set of biomarker ratios expanding on the biomarkers available directly from
the Nightingale platform. A companion function, compute_extended_ratio_qc_flags()
,
will aggregate the QC flags for the biomarkers underlying each ratio.
The recompute_derived_biomarkers()
function will recompute all
composite biomarkers and ratios from 107 non-derived biomarkers, which is
useful for ensuring data consistency when adjusting for unwanted biological
variation. This includes the extended biomarker rations computed by the
compute_extended_ratios()
function. A companion function,
recompute_derived_biomarker_qc_flags()
will aggregate the QC
flags for the biomarkers underlying each composite biomarker and ratio.
Maintainer: Scott C Ritchie sritchie73@gmail.com (0000-0002-8454-9548)
Useful links:
Report bugs at https://github.com/sritchie73/ukbnmr/issues
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.