R/ukbnmr.R

#' Tools for processing the UK Biobank NMR metabolomics biomarker data
#'
#'  @description
#' This package provides utilities for working with the
#' \href{https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=220}{UK Biobank MR metabolomics data}.
#' Details are provided below, and in the package vignette (type \code{vignette("ukbnmr")} to view).
#'
#' @details
#' There are three groups of functions in this package: (1) data extraction,
#' (2) removal of technical variation, and (3) recomputing derived biomarkers
#' and biomarker ratios.
#'
#' All functions can be applied directly to raw data extracted from UK Biobank.
#'
#' This package also provides a \code{data.frame} of biomarker information, loaded
#' as \code{\link{nmr_info}}, and \code{data.frame} of sample processing information,
#' loaded as \code{\link{sample_qc_info}}.
#'
#' @section Data Extraction Functions:
#' The \code{\link{extract_biomarkers}()} function will take a phenotype dataset extracted on
#' the UK Biobank Research Analysis Platform by the Table Exporter tool, extract the
#' \href{https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=220}{NMR biomarker fields}
#' and give them short comprehensible column names as described in \code{\link{nmr_info}}.
#' Measurements are also split into multiple rows where a participant has measurements
#' at both baseline and first repeat assessment.
#'
#' The \code{\link{extract_biomarker_qc_flags}()} function will take a phenotype
#' dataset extracted on the UK Biobank Research Analysis Platform by the Table
#' Exporter tool, extract the \href{https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=221}{Nightingale quality control flags}
#' for each biomarker measurement, returning a single column per biomarker
#' (corresponding to respective columns output by \code{\link{extract_biomarkers}()}).
#'
#' The \code{\link{extract_sample_qc_flags}()} function will take a phenotype
#' dataset extracted on the UK Biobank Research Analysis Platform by the Table
#' Exporter tool and extract the \href{https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=222}{sample quality control tags}
#' for the Nightingale NMR metabolomics data.
#'
#' These functions will also work with older datasets predating the UK Biobank
#' Research Analysis Platform, e.g. those extracted by \href{https://biobank.ctsu.ox.ac.uk/crystal/exinfo.cgi?src=accessing_data_guide}{ukbconv},
#' and/or by the ukbtools R package.
#'
#' @section Removal of technical variation:
#' The \code{\link{remove_technical_variation}()} function  will take a phenotype
#' dataset extracted on the UK Biobank Research Analysis Platform by the Table
#' Exporter tool, extract all the biomarkers and QC flags, remove the effects of
#' technical variation on biomarker concentrations, and return a list
#' containing the adjusted NMR biomarker data, biomarker QC flags, and sample quality control
#' and processing information.
#'
#' This applies a multistep process as described in Ritchie \emph{et al.} 2023:
#'
#' \enumerate{
#'   \item{First biomarker data is filtered to the 107 biomarkers that
#'   cannot be derived from any combination of other biomarkers.}
#'   \item{Absolute concentrations are log transformed, with a small offset
#'   applied to biomarkers with concentrations of 0.}
#'   \item{Each biomarker is adjusted for the time between sample preparation
#'   and sample measurement (hours) on a log scale.}
#'   \item{Each biomarker is adjusted for systematic differences between rows
#'   (A-H) on the 96-well shipment plates.}
#'   \item{Each biomarker is adjusted for remaining systematic differences
#'   between columns (1-12) on the 96-well shipment plates.}
#'   \item{Each biomarker is adjusted for drift over time within each of the six
#'   spectrometers. To do so, samples are grouped into 10 bins, within each
#'   spectrometer, by the date the majority of samples on their respective
#'   96-well plates were measured.}
#'   \item{Regression residuals after the sequential adjustments are
#'   transformed back to absolute concentrations.}
#'   \item{Samples belonging to shipment plates that are outliers of
#'   non-biological origin are identified and set to missing.}
#'   \item{The 61 composite biomarkers and 81 biomarker ratios are recomputed
#'   from their adjusted parts.}
#'   \item{An additional 76 biomarker ratios of potential biological
#'   significance are computed.}
#' }
#'
#' Further details can be found in Ritchie S. C. \emph{et al.} Quality control
#' and removal of technical variation of NMR metabolic biomarker data in
#' ~120,000 UK Biobank participants, \emph{Sci Data} \strong{10}, 64 (2023). doi:
#' \href{https://www.nature.com/articles/s41597-023-01949-y}{10.1038/s41597-023-01949-y}
#'
#' @section  Methods for computing biomarker ratios:
#' The \code{\link{compute_extended_ratios}()} function will compute an extended
#' set of biomarker ratios expanding on the biomarkers available directly from
#' the Nightingale platform. A companion function, \code{\link{compute_extended_ratio_qc_flags}()},
#' will aggregate the QC flags for the biomarkers underlying each ratio.
#'
#' The \code{\link{recompute_derived_biomarkers}()} function will recompute all
#' composite biomarkers and ratios from 107 non-derived biomarkers, which is
#' useful for ensuring data consistency when adjusting for unwanted biological
#' variation. This includes the extended biomarker rations computed by the
#' \code{\link{compute_extended_ratios}()} function. A companion function,
#' \code{\link{recompute_derived_biomarker_qc_flags}()} will aggregate the QC
#' flags for the biomarkers underlying each composite biomarker and ratio.
#'
#' @name ukbnmr
#' @import data.table
#' @importFrom stats na.omit
#' @keywords package
"_PACKAGE"
sritchie73/ukbnmr documentation built on Nov. 24, 2024, 8:51 p.m.