In YasinEl/mzRAPP: Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment

library(mzRAPP)
library(data.table)
library(dplyr)
trash <- capture.output({
    result_list <- suppressWarnings(derive_performance_metrics(params$comp_d))
})

Assessment of non targeted data pre-processing (NPP) was performed via mzRAPP (version 1.0). mzRAPP is based on user-provided manually curated information on several chromatographic peaks with a known molecular composition which is then automatically checked and extended to generate a benchmark data set (BM). This BM is then applied as a reference to the NPP output to extract some performance metrics for the overall NPP output as well as individual steps. Additional information on the underlying processes is provided here.

NPP performance metrics

Different NPP performance metrics are assessed automatically via mzRAPP. During the evaluation process, mzRAPP is considering a multitude of orthogonal information for benchmark generation as well as for the determination of NPP performance metrics. This is done to avoid relying completely on user-provided or mzRAPP-generated information.

Peak/feature detection
Peaks recovered from the benchmark after the step of peak picking (inner donut) and peak alignment (outer donut). Peaks appearing only after alignment could be due to filling gaps, errors in NPP peak alignment, or differences in peak matching between benchmark and NPP output after alignment. [Click here](https://github.com/YasinEl/mzRAPP#matching-between-bm-and-npp-output-background) for more details. wzxhzdk:1

wzxhzdk:2

Peak/feature quality
The quality of reported NPP peak abundances is assessed via the increase of the isotopic ratio (IR) bias relative to the IR bias of the BM. The inner donut corresponds to IRs calculated from abundances reported after peak picking, the outer donut to abundances after alignment. Changes of IR qualities downstream of peak picking can be due to errors in peak alignments (see below). [Click here](https://github.com/YasinEl/mzRAPP#peak-abundance-qualitydegenerated-ir) for more details. wzxhzdk:3

wzxhzdk:4

Peak alignment
The alignment process is responsible for assembling peaks of different samples into features. mzRAPP is counting errors in alignment by checking whether those assignments are performed symmetrically over different isotopologues of the same compound. This way alignment errors in the benchmark do not affect this count (confirmable errors). The number of additional benchmark divergences which can not be verified via isotopologues is also shown. However, its accuracy depends on the correct (user provided) alignment of the benchmark data set. "Lost peaks" are those which have been detected with a specific area before alignment, but the same peak (with the same area) is not present anymore after alignment. [Click here](https://github.com/YasinEl/mzRAPP#alignment-error-counting) for more details. wzxhzdk:5

wzxhzdk:6

Generated benchmark

Key metrics of the mzRAPP generated benchmark are summarized in the following. Additionally the dependence of some benchmark variables on the successful detection of peaks in the aligned NPP results is provided in different histograms.

bm <- data.table::rbindlist(list(params$comp_d$Matches_BM_NPPpeaks, params$comp_d$Unmatched_BM_NPPpeaks), fill = TRUE, use.names = TRUE)

dt_benchmark <- data.table(metrics = c("molecule count", "sample count", "peak count", "med_scan_acqu._rate [1/s]"),
                           values = c(round(length(unique(bm[Split_peak == FALSE & main_peak == TRUE | 
                                                               is.na(peak_area_ug)]$molecule_b)), 0),
                                      round(length(unique(bm[Split_peak == FALSE & main_peak == TRUE | 
                                                               is.na(peak_area_ug)]$sample_name_b)),0),
                                      round(nrow(bm[Split_peak == FALSE & main_peak == TRUE | 
                                                               is.na(peak_area_ug)]),0),
                                      round(median(bm[(Split_peak == FALSE & main_peak == TRUE | 
                                                               is.na(peak_area_ug)) & 
                                                  !is.na(peaks.data_rate_b)]$peaks.data_rate_b), 1)

                                      ))


kableExtra::kable(dt_benchmark) %>%
  kableExtra::kable_styling(bootstrap_options = "striped", full_width = FALSE, position="left")

wzxhzdk:8

YasinEl/mzRAPP documentation built on Feb. 18, 2024, 11:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

YasinEl/mzRAPP
Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment

In YasinEl/mzRAPP: Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment

NPP performance metrics

Generated benchmark

R Package Documentation

Browse R Packages

We want your feedback!

YasinEl/mzRAPP Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment

In YasinEl/mzRAPP: Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment

NPP performance metrics

Generated benchmark

R Package Documentation

Browse R Packages

We want your feedback!

YasinEl/mzRAPP
Benchmark Dataset Generation and Non-Targeted Data Pre-Processing Assessment