vignettes/qc_metabolomics.md

title: "MotrpacBicQC: Metabolomics QC" date: "2024-01-04" output: rmdformats::downcute: code_folding: show self_contained: true thumbnails: false lightbox: true pkgdown: as_is: true

vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{MotrpacBicQC: Metabolomics QC} %\usepackage[UTF-8]{inputenc}

Expected submission

The folder/file structure of a required metabolomics submission is as follows:

Example:

PASS1A-06/
  T55/
   HILICPOS/ 
    metadata-phase.txt  ## Note: "new" required file
    file_manifest_YYYYMMDD.txt
    BATCH1_20190725/ 
     RAW/
      Manifest.txt
      file1.raw
      file2.raw
      etc
    PROCESSED_20190725/
     metadata_failedsamples_[cas_specific_labeling].txt
     NAMED/
        results_metabolites_named_[cas_specific_labeling].txt 
        metadata_metabolites_named_[cas_specific_labeling].txt
        metadata_sample_named_[cas_specific_labeling].txt
        metadata_experimentalDetails_named_[cas_specific_labeling].txt
     UNNAMED/ ## Note: Only required for untargeted assays
        results_metabolites_unnamed_[cas_specific_labeling].txt
        metadata_metabolites_unnamed_[cas_specific_labeling].txt
        metadata_sample_unnamed_[cas_specific_labeling].txt
        metadata_experimentalDetails_unnamed_[cas_specific_labeling].txt

With the following file relations...

Install MotrpacBicQC

First, download and install R and RStudio:

Then, open RStudio and install the devtools package

install.packages("devtools")

Finally, install the MotrpacBicQC package. Important: install it every time that you run the QCs to ensure that the latest version is used.

library(devtools)
devtools::install_github("MoTrPAC/MotrpacBicQC", build_vignettes = FALSE)

Usage

Load the library

library(MotrpacBicQC)

And run any of the following tests to check that the package is correctly installed and it works. For example:

# Just copy and paste in the RStudio terminal

check_metadata_metabolites(df = metadata_metabolites_named, name_id = "named")
check_metadata_samples(df = metadata_sample_named, cas = "umichigan")
check_results(r_m = results_named, m_s = metadata_sample_named, m_m = metadata_metabolites_named)

which should generate the following output:

check_metadata_metabolites(df = metadata_metabolites_named, name_id = "named")
##   + (+) All required columns present
##   + (+) `metabolite_name` OK
##   + (+) `refmet_name` unique values: OK
##   + Validating `refmet_name` (it might take some time)
##   + (+) `refmet_name` ids found in refmet: OK
##   + (+) {rt} all numeric: OK
##   + (+) {mz} all numeric: OK
##   + (+) {`neutral_mass`} all numeric values OK
##   + (+) {formula} available: OK
check_metadata_samples(df = metadata_sample_named, cas = "umichigan")
##    - (-) `metadata_sample`: Expected COLUMN NAMES are missed: FAIL
##   The following required columns are not present: `extraction_date, acquisition_date, lc_column_id`
##   + (+) `sample_id` seems OK
##   + (+) `sample_type` seems OK
##   + (+) `sample_order` is numeric
##   + (+) `sample_order` unique values OK
##   + (+) `raw_file` unique values: OK
##    - (-) `extraction_date` column missed: FAIL
##    - (-) `acquisition_date` column missed: FAIL
##    - (-) `lc_column_id` column missed: FAIL
check_results(r_m = results_named, m_s = metadata_sample_named, m_m = metadata_metabolites_named)
##   + (+) All samples from `results_metabolite` are available in `metadata_sample`
##   + (+) `metabolite_name` is identical in both [results] and `metadata_metabolites` files: OK
##   + (+) `sample_id` columns are numeric: OK

How to process a metabolomics dataset

Two approaches available:

Check full PROCESSED_YYYYMMDD folder (recommended)

Run test on the full submission. For that, run the following command:

validate_metabolomics(input_results_folder = "/full/path/to/PROCESSED_YYYYMMDD", 
                      cas = "your_site_code")

cas can be one of the followings:

This function can also print out a number of QC plots, including:

For that, run it like this:

validate_metabolomics(input_results_folder = "/full/path/to/PROCESSED_YYYYMMDD", 
                      cas = "your_site_code",
                      f_proof = TRUE,
                      out_qc_folder = "/path/to/the/folder/to/save/plots/",
                      printPDF = TRUE)

It is recommended to provide the path to the folder where the pdf files should be saved (argument: out_qc_folder). If it doesn't exist, it will be created.

Check individual files

In the rare case that you need to process individual files, that also can be done. Cases:

Check metadata metabolites:

# Open the metadata_metabolites file(s)

metadata_metabolites_named <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)
metadata_metabolites_unnamed <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)

check_metadata_metabolites(df = metadata_metabolites_named, name_id = "named")
check_metadata_metabolites(df = metadata_metabolites_unnamed, name_id = "unnamed")

Check metadata samples:

# Open your files
metadata_sample_named <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)
metadata_sample_unnamed <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)

check_metadata_samples(df = metadata_sample_named, cas = "your_side_id")
check_metadata_samples(df = metadata_sample_unnamed, cas = "your_side_id")

Check results, which needs both both metadata metabolites and samples

# Open your files
metadata_metabolites_named <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)
metadata_sample_named <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)
results_named <- read.delim(file = "/path/to/your/file", stringsAsFactors = FALSE)

check_results(r_m = results_named, 
              m_s = metadata_sample_named, 
              m_m = metadata_metabolites_named)

Help

Additional details for each function can be found by typing, for example:

?validate_metabolomics

Need extra help? Please, either contact the BIC at motrpac-helpdesk@lists.stanford.edu and/or submit an issue here providing as many details as possible



MoTrPAC/MotrpacBicQC documentation built on Sept. 26, 2024, 11:10 a.m.