extract_sample_qc_flags: Extract NMR sample QC flags from a data.frame of UK Biobank...

View source: R/extractor_functions.R

extract_sample_qc_flagsR Documentation

Extract NMR sample QC flags from a data.frame of UK Biobank fields

Description

Given an input data.frame loaded from a dasaset of NMR metabolomics processing fields extracted by the Table Exporter tool on the UK Biobank Research Analysis Platform, this function extracts the sample quality control flags for the NMR metabolomics biomarker data giving them short variable names as listed in the sample_qc_info information data sheet available in this package.

Usage

extract_sample_qc_flags(x)

Arguments

x

data.frame with column names "eid" followed by extracted fields e.g. "p23649_i0", "p23649_i1", ..., "p23655_i1".

Details

Data sets extracted on the UK Biobank Research Analysis Platform have one row per UK Biobank participant, whose project specific sample identifier is given in the first column named "eid". Columns following this follow a naming scheme based on the unique identifier of each field, assessment visit, and (optionally if relevant) repeated measurement of "p<field_id>_i<visit_index>_a<repeat_index>". For example, the Shipment Plate for each sample collected at baseline assessment has the column name "p23649_i0". For the UKB NMR data, measurements are available at baseline assessment and the first repeat assessment (e.g. "p23649_i1"). For the UKB NMR data, the <repeat_index> is reserved for cases where biomarker measurements have more than one QC Flag (see extract_biomarker_qc_flags()).

The data.frame returned by this function gives each field a unique recognizable name, with measurements from baseline and repeat assessment given in separate rows. The "visit_index" column immediately after the "eid" column indicates whether the biomarker measurement was quantified from the blood samples taken at baseline assessment (visit_index == 0) or first repeat assessment (visit_index == 1). Rows are uniquely identifiable by the combination of entries in columns "eid" and "visit_index".

This function will also work with data predating the Research Analysis Platform, including data sets extracted by the ukbconv tool and/or the ukbtools R package.

If your UK Biobank project only has access to a subset of biomarkers, then this function will only return the subset of ratios that can be computed from the biomarker data provided.

A data.table will be returned instead of a data.frame if the the user has loaded the package into their R session.

Value

a data.frame or data.table with column names "eid" and "visit_index", followed by columns for each sample QC tag, e.g. "Shipment.Plate", ..., "Low.Protein".

Examples

ukb_data <- ukbnmr::test_data # Toy example dataset for testing package
sample_qc_flags <- extract_sample_qc_flags(ukb_data)


sritchie73/ukbnmr documentation built on Nov. 24, 2024, 8:51 p.m.