RawHumms Data Quality Control Report
In RawHummus: Raw Data Quality Control Tool for LC-MS System

#| 1.Setup, 
#| echo = FALSE, 
#| warning = FALSE, 
#| message = FALSE

## load necessary library
library(dplyr)
library(ggplot2)
library(plotly)
library(kableExtra)
library(RaMS)
library(data.table)
library(purrr)

## get data
msdata <- params$msdata
mynoise <- params$mynoise
mypeaks <- params$mypeaks
myppm <- params$myppm
myrt <- params$myrt
startRT <- params$startRT
endRT <- params$endRT

## remove rows with m/z being NA in mypeaks and rename RT column
mypeaks <- as.data.frame(mypeaks) %>% 
  rename(rt = Expected_RT) %>%
  filter(!is.na(mz))

Introduction

Robust and reproducible data is essential to ensure confidence in analytical results and is particularly important for large-scale metabolomics studies. Therefore raw data need to be inspected before data processing and statistical analysis in order to detect measurement bias and verify system consistency. In liquid chromatography mass spectrometry (LCMS) based metabolomics studies, proper quality control (QC) checks are particularly important to ensure reliable and comparable results within experimental measurements [1-2].

RawHummus is an user-friendly web application for rapid data quality check based on raw QC samples. It generates an HTML report with interactive plots, statistics and illustrations that help users evaluate their data quality and LCMS system performance.

Chromatogram

TIC plot

Total Ion Current (TIC) chromatogram represents the summed ion intensity across each scan in the analysis. The interactive overlaid TIC plot can be used for rapid inspection of retention time (RT) and ion intensity fluctuations.

#| 2.TIC, 
#| echo = FALSE, 
#| out.width = "100%"

# trim msdata
if(startRT != 0 | endRT != 0) {
  if((min(msdata$MS1$rt) + startRT) > max(msdata$MS1$rt)){
    print(paste("Since the first", startRT, "minutes fall out of the data retention time range, it's not possible to trim the data. Consequently, no trimming will be performed at the beginning of the data.", sep = " "))
    startRT = 0
  }
  if(endRT > max(msdata$MS1$rt)){
    print(paste("Since the last", endRT, "minutes fall out of the data retention time range. it's not possible to trim the data. Consequently, no trimming will be performed at the end of data", sep = " "))
    endRT = 0
  }
  msdata <- lapply(msdata,
                 function(x) x %>%
                   dplyr::filter(rt > (min(rt) + startRT)) %>%
                   dplyr::filter(rt < (max(rt) - endRT))
                 )
  }

plotTIC <- ggplot2::ggplot(msdata$TIC) + 
  ggplot2::geom_line(aes(x = rt, y = int, color = filename)) +
  ggplot2::labs(x = "RT (min)", y = "Intensity", color = "File name: ") +
  ggplot2::theme_bw()

plotly::ggplotly(plotTIC)

Summed TIC Plot

Summed TIC plot is another quick-and-dirty way to overview global ion intensity variations among QC samples. It summed TIC across the entire points (scans) from the analysis. Dashed red line is mean of summed TIC and blue lines represent mean + 2SD and mean - 2SD, respectively.

#| 3.sumTIC, 
#| echo = FALSE, 
#| warning = FALSE, 
#| message = FALSE, 
#| out.width = "100%"

sumTIC <- msdata$TIC %>% 
  dplyr::group_by(filename) %>%
  dplyr::summarise(peakArea = sum(int)) %>%
  dplyr::mutate(Mean = mean(peakArea), 
                `Mean+2SD` = mean(peakArea) + 2*sd(peakArea),
                `Mean-2SD` = mean(peakArea) - 2*sd(peakArea)
                )

plotSumTIC <- ggplot2::ggplot(sumTIC, aes(x = filename, y = peakArea, group = 1)) +
  ggplot2::geom_bar(stat = "identity", aes(fill = filename), alpha = 0.5) +
  ggplot2::geom_hline(aes(yintercept = Mean), linetype = 'dotted', colour = "red", alpha = 0.5) +
  ggplot2::geom_hline(aes(yintercept = `Mean+2SD`), colour = "blue", linetype = 'dotted', alpha = 0.5) +
  ggplot2::geom_hline(aes(yintercept = `Mean-2SD`), colour = "blue", linetype = 'dotted', alpha = 0.5) +
  ggplot2::labs(x = "File name", y = "Summed TIC", fill = "File name") +
  ggplot2::theme_bw() +
  ggplot2::theme(axis.text.x = element_blank())

plotly::ggplotly(plotSumTIC)

TIC Correlation Analysis

Pearson correlation is used to quantify the metabolic profile similarity among QC samples. Pearson correlation coefficient (R) over 0.85 indicate high metabolic profile similarity in RT and chromatogram peak shape. If the value is below 0.85, it will be highlight in red in Table 1.

Note that RT were binned by 0.1 min for Pearson correlation calculation.

#| 4.corTIC, 
#| echo = FALSE, 
#| warning = FALSE, 
#| message = FALSE, 
#| out.width = "100%"

## if scan No. is the same, corTIC is a DF, otherwise it is a list (this case is not common after RT binning)
corTIC <- msdata$TIC %>%
  dplyr::filter(rt >= 0.5) %>% # trim first 0.5 min to avoid unequal scans at the beginning
  dplyr::mutate(rt_bins = cut(rt, breaks = floor(max(rt)/0.1))) %>% # bin RT 0.1 min, it is important to bin raw data
  dplyr::group_by(rt_bins, filename) %>%
  dplyr::mutate(int2 = sum(int), rt2 = mean(rt)) %>%
  dplyr::ungroup() %>%
  dplyr::select(filename, int2, rt2) %>%
  dplyr::distinct_all() %>%
  dplyr::select(int2, filename) %>%
  unstack()

## calculate correlation.
if(is.data.frame(corTIC)){
  as.data.frame(round(cor(corTIC), 3)) %>%
    dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85, "#ff8080", "#FFFFFF"))) %>%
    kableExtra::kbl(escape = FALSE, caption = "Table 1: Metabolic profile similarity") %>%
    kableExtra::kable_classic("hover", full_width = TRUE) %>%
    kableExtra::scroll_box(width = "100%")
} else {
  ## get the minimum scan number
  minScan <- min(map_df(corTIC, length))

  ## trim scans
  corTIC2 <- map_df(corTIC, `[`, 1: minScan)

  ## correlation table
  as.data.frame(round(cor(corTIC2), 2)) %>%
    dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85 , "#ff8080", "#FFFFFF"))) %>%
    kableExtra::kbl(escape = FALSE, 
                    caption = paste0("Table 1: Metabolic profile similarity (Sample trimmed to ", minScan, " TIC scans)")) %>%
    kableExtra::kable_classic("hover", full_width = TRUE) %>%
    kableExtra::scroll_box(width = "100%")
}

Note that TIC Correlation Analysis is mainly used to evaluate peak shape similarity and RT consistency. QC files with ion intensity drift but similar profiles could still have good Pearson correlation coefficient.

MS1

Auto Peaks Evaluation

In order to accurately monitor variations in mass, RT and ion intensity, RawHummus automatically selects 6 peaks across the entire RT range, and use them to evaluate LCMS system.

Below are the Extracted ion chromatogram (EIC) of the 6 selected ions. You can interactively view, inspect and compare them among different QC samples.

#| 5.MS1Plot, 
#| echo = FALSE, 
#| out.width = "100%"

# get lists of peaks for evaluation
MS1Filter <- msdata$MS1[int > mynoise]
# get min and max RT of msdata
minRT <- min(MS1Filter$rt)
maxRT <- max(MS1Filter$rt)

##############################################################################
# topPeaks <- MS1Filter %>% 
#   mutate(bins = as.factor(cut(rt, 6, labels = FALSE))) %>%
#   group_by(bins) %>%
#   slice_max(int, n = 1, with_ties = FALSE) %>%
#   ungroup() %>%
#   select(rt, mz)
##############################################################################

## use data.table instead of dplyr
topPeaks <- MS1Filter[, bins := cut(rt, 6, labels = FALSE)] %>%
  .[, .SD[which.max(int)], by = bins] %>%
  .[, list(rt, mz)]

# search in raw data
autoTarget <- vector(mode = "list", length = dim(topPeaks)[1])

##############################################################################
# for (i in 1:dim(topPeaks)[1]){
#   autoTarget[[i]] <- MS1Filter %>%
#     filter(between(mz, pmppm(topPeaks$mz[i], myppm)[1], pmppm(topPeaks$mz[i], myppm)[2])) %>%
#     filter(between(rt, max(topPeaks$rt[i] - myrt, minRT), min(topPeaks$rt[i] + myrt, maxRT)))
#   names(autoTarget)[i] <- paste0("RT: ", round(topPeaks$rt[i], 2), " mz: ", round(topPeaks$mz[i], 3))
# }
##############################################################################

## use data.table instead of dplyr
for (i in 1:dim(topPeaks)[1]){
  autoTarget[[i]] <- MS1Filter[mz %between% pmppm(topPeaks$mz[i]) & rt %between% c(max(topPeaks$rt[i] - myrt, minRT), min(topPeaks$rt[i] + myrt, maxRT))]
  names(autoTarget)[i] <- paste0("RT: ", round(topPeaks$rt[i], 2), " mz: ", round(topPeaks$mz[i], 3))
}


##############################################################################
# unlist
# autoTarget <- dplyr::bind_rows(autoTarget, .id = "id")
##############################################################################

## unlist
autoTarget <- data.table::rbindlist(autoTarget, idcol = T)
data.table::setnames(autoTarget, ".id", "id") ## keep the same name as in dplyr
autoTarget <- autoTarget %>%
  dplyr::mutate(id = factor(id)) %>%
  dplyr::mutate(id = factor(id, levels = levels(id)[order(read.table(text = levels(id), fill = TRUE)[[2]])])) # order by RT

# plot
if (nrow(autoTarget) > 0) {
    aPlot <- ggplot2::ggplot(autoTarget) + 
      ggplot2::geom_line(aes(x = rt, y = int, color = filename)) +
      ggplot2::facet_wrap(~ id, scales = "free_y", ncol = 2) +
      ggplot2::labs(x = "RT (min)", y = "Intensity", color = "File name: ") +
      ggplot2::theme_bw()
    plotly::ggplotly(aPlot)
    } else {
      print("Peasks of interests are not found")
    }

RawHummus performs a simple statistics to make rapid evaluation. The table below summarized the comparison result, in which maximum RT difference, mass difference and ion intensity difference, and Intensity CV are given.

Max RT Diff (min): is the maximum retention time variation (in min unit). Small value indicates a good retention time consistency. If the maximum retention time variation is over 1 min, the value will be highlight in red in Table 2.

Max Mass Diff (ppm): is the maximum mass variation (in ppm unit). Small values indicate good mass accuracy. If the maximum mass variation is over 5 ppm, the value will be highlight in red in Table 2.

Max Intensity Fold Change: is the maximum ion intensity variation. The value closer to 1 suggests that the ion intensity is stable. If the maximum intensity ratio is over 2, the value will be highlight in red in Table 2.

If a peak is missing in some of your samples, NA values will be give in the table. You need to carefully inspect the peak so as to evaluate the reproducibility.

Intensity CV (%): is the ion intensity coefficient of variation (also termed as relative standard deviation, STD). Smaller value indicates better ion intensity consistency. If the intensity CV is over 30%, the value will be highlight in red in Table 2.

if (nrow(autoTarget) > 0) {

##############################################################################
 # autoTargetTable <- autoTarget %>%
 #  group_by(id, filename) %>%
 #  slice_max(int) %>%
 #  tidyr::complete(id, filename) %>%
 #  group_by(id) %>%
 #  mutate("Max RT Diff (min)" = round(max(rt) - min(rt), 2),
 #         "Max Mass Diff (ppm)" = round((max(mz) - min(mz))/min(mz) * 10^6, 2),
 #         "Max Intensity Ratio" = round(max(int) / min(int), 2)) %>%
 #  mutate(Peak = id) %>%
 #  ungroup() %>%
 #  select(Peak, "Max RT Diff (min)", "Max Mass Diff (ppm)", "Max Intensity Ratio") %>%
 #  distinct_all() %>% 
 #  mutate("Max RT Diff (min)" = kableExtra::cell_spec(`Max RT Diff (min)`,
 #                                                     background = ifelse(`Max RT Diff (min)` > 1, "#ff8080", "#FFFFFF")),
 #         "Max Mass Diff (ppm)" = kableExtra::cell_spec(`Max Mass Diff (ppm)`, 
 #                                                       background = ifelse(`Max Mass Diff (ppm)` > 5, "#ff8080", "#FFFFFF")),
 #         "Max Intensity Ratio" = kableExtra::cell_spec(`Max Intensity Ratio`, 
 #                                                       background = ifelse(`Max Intensity Ratio` > 3, "#ff8080", "#FFFFFF")))
##############################################################################

  ## fix bug#2
  autoTargetTable  <- autoTarget[, .SD[which.max(int)], by = .(id, filename)] %>% 
  ## data.table fill missing rows 
  .[CJ(id = unique(id), filename = unique(msdata$TIC$filename)), on = c('id','filename')] %>%
  .[, `:=`("Max RT Diff (min)" =  round(max(rt) - min(rt), 2),
           "Max Mass Diff (ppm)" = round((max(mz) - min(mz))/min(mz) * 10^6, 2),
           "Max Intensity Ratio" = round(max(int) / min(int), 2),
           "Intensity CV (%)" = round(sd(int)/mean(int) * 100, 2)), by = id] %>%
  mutate(Peak = id) %>%
  select(Peak, "Max RT Diff (min)", "Max Mass Diff (ppm)", "Max Intensity Ratio", "Intensity CV (%)") %>%
  distinct_all() %>%
  mutate("Max RT Diff (min)" = kableExtra::cell_spec(`Max RT Diff (min)`, 
                                                     background = ifelse((`Max RT Diff (min)` > 1 | is.na(`Max RT Diff (min)`)) , "#ff8080", "#FFFFFF")),
         "Max Mass Diff (ppm)" = kableExtra::cell_spec(`Max Mass Diff (ppm)`, 
                                                       background = ifelse((`Max Mass Diff (ppm)` > 5 | is.na(`Max Mass Diff (ppm)` )), "#ff8080", "#FFFFFF")),
         "Max Intensity Ratio" = kableExtra::cell_spec(`Max Intensity Ratio`, 
                                                       background = ifelse(c(`Max Intensity Ratio` > 2 | is.na(`Max Intensity Ratio`)), "#ff8080", "#FFFFFF")),
         "Intensity CV (%)" = kableExtra::cell_spec(`Intensity CV (%)`, 
                                                       background = ifelse(c(`Intensity CV (%)` > 30 | is.na(`Intensity CV (%)`)), "#ff8080", "#FFFFFF")))

 ## use kableExtra instead of DT
 kableExtra::kbl(autoTargetTable, escape = FALSE, caption = "Table 2: Summary of auto-selected peaks") %>%
  kableExtra::kable_classic("hover", full_width = TRUE)

} else {
  print("Peasks of interests are not found")
}

User defined peaks

Additionally, users could add their peaks of interests for inspection and comparison. If these peaks are defined in RawHummus and are found in the data. Similar plots and a data summary table will be given below. Otherwise, this section will be left blank.

Note that noise peaks can be used to minitor the mass accuracy variation, but they may not work well to evalute RT and ion intensity variation.

#| 6.userMS1Plot, 
#| echo = FALSE, 
#| out.width = "100%"

if(!all(is.na(mypeaks))) {

  # search according to mz and RT with tolerances
  getTarget <- vector(mode = "list", length = dim(mypeaks)[1])

  ##############################################################################
  # for (i in 1:dim(mypeaks)[1]){
  #   getTarget[[i]] <- MS1Filter %>%
  #     filter(between(mz, pmppm(mypeaks$mz[i], myppm)[1], pmppm(mypeaks$mz[i], myppm)[2])) %>%
  #     filter(between(rt, max(mypeaks$rt[i] - myrt, minRT, na.rm = TRUE), min(mypeaks$rt[i] + myrt, maxRT, na.rm = TRUE)))
  #   names(getTarget)[i] <- paste0(" RT: ", round(mypeaks$rt[i], 2), " mz: ", round(mypeaks$mz[i], 3))
  # }

  # unlist
  # myTarget <- dplyr::bind_rows(getTarget, .id = "id")
  ##############################################################################

  ## use data.table instead of dplyr
  for (i in 1:dim(mypeaks)[1]){
  getTarget[[i]] <- MS1Filter[mz %between% pmppm(mypeaks$mz[i], myppm) & 
                                rt %between% c(max(mypeaks$rt[i] - myrt, minRT, na.rm = TRUE), min(mypeaks$rt[i] + myrt, maxRT, na.rm = TRUE))]
  names(getTarget)[i] <- paste0(" RT: ", round(mypeaks$rt[i], 2), " mz: ", round(mypeaks$mz[i], 3))
  }

  # unlist
  myTarget <- data.table::rbindlist(getTarget, idcol = T)
  data.table::setnames(myTarget, ".id", "id") ## keep the same name as in dplyr

  # plot
  if (nrow(myTarget) > 0) {
    uPlot <- ggplot2::ggplot(myTarget) + 
      ggplot2::geom_line(aes(x = rt, y = int, color = filename)) +
      ggplot2::facet_wrap(~ id, scales = "free_y", ncol = 2) +
      ggplot2::labs(x = "RT (min)", y = "Intensity", color = "File name: ") +
      ggplot2::theme_bw()
    plotly::ggplotly(uPlot)
    }

} else {
  myTarget <- NULL
  print("User defined peaks are not found")
}

#| 7.userMS1Table, 
#| echo = FALSE, 
#| out.width = "100%"

if (!is.null(myTarget)) {

   ##############################################################################

 # myTargetTable <- myTarget %>%
 #  group_by(id, filename) %>%
 #  slice_max(int) %>%
 #  tidyr::complete(id, filename) %>%
 #  group_by(id) %>%
 #  mutate("Max RT Diff (min)" = round(max(rt) - min(rt), 2),
 #         "Max Mass Diff (ppm)" = round((max(mz) - min(mz))/min(mz) * 10^6, 2),
 #         "Max Intensity Ratio" = round(max(int) / min(int), 2)) %>%
 #  mutate(Peak = id) %>%
 #  ungroup() %>%
 #  select(Peak, "Max RT Diff (min)", "Max Mass Diff (ppm)", "Max Intensity Ratio") %>%
 #  distinct_all() %>%
 #  mutate("Max RT Diff (min)" = kableExtra::cell_spec(`Max RT Diff (min)`,
 #                                                     background = ifelse(`Max RT Diff (min)` > 1, "#ff8080", "#FFFFFF")),
 #         "Max Mass Diff (ppm)" = kableExtra::cell_spec(`Max Mass Diff (ppm)`, 
 #                                                       background = ifelse(`Max Mass Diff (ppm)` > 5, "#ff8080", "#FFFFFF")),
 #         "Max Intensity Ratio" = kableExtra::cell_spec(`Max Intensity Ratio`, 
 #                                                       background = ifelse(`Max Intensity Ratio` > 3, "#ff8080", "#FFFFFF")))

   ##############################################################################

   myTargetTable  <- myTarget[, .SD[which.max(int)], by = .(id, filename)] %>% 
  ## data.table fill missing rows 
     .[CJ(id = unique(id), filename = unique(msdata$MS1$filename)), on = c('id','filename')] %>%
     .[, `:=`("Max RT Diff (min)" =  round(max(rt) - min(rt), 2),
              "Max Mass Diff (ppm)" = round((max(mz) - min(mz)) / min(mz) * 10^6, 2),
              "Max Intensity Ratio" = round(max(int) / min(int), 2),
              "Intensity CV (%)" = round(sd(int) / mean(int) * 100, 2)), by = id] %>%
     mutate(Peak = id) %>%
     select(Peak, "Max RT Diff (min)", "Max Mass Diff (ppm)", "Max Intensity Ratio", "Intensity CV (%)") %>%
     distinct_all() %>%
     mutate("Max RT Diff (min)" = kableExtra::cell_spec(`Max RT Diff (min)`,
                                                        background = ifelse((`Max RT Diff (min)` > 1 | is.na(`Max RT Diff (min)`)) , "#ff8080", "#FFFFFF")),
            "Max Mass Diff (ppm)" = kableExtra::cell_spec(`Max Mass Diff (ppm)`, 
                                                        background = ifelse((`Max Mass Diff (ppm)` > 5 | is.na(`Max Mass Diff (ppm)` )), "#ff8080", "#FFFFFF")),
            "Max Intensity Ratio" = kableExtra::cell_spec(`Max Intensity Ratio`, 
                                                        background = ifelse(c(`Max Intensity Ratio` > 3 | is.na(`Max Intensity Ratio`)), "#ff8080", "#FFFFFF")),
            "Intensity CV (%)" = kableExtra::cell_spec(`Intensity CV (%)`, 
                                                        background = ifelse(c(`Intensity CV (%)` > 30 | is.na(`Intensity CV (%)`)), "#ff8080", "#FFFFFF")))

## use kableExtra instead of DT 
 kableExtra::kbl(myTargetTable, escape = FALSE, caption = "Table 3: Summary of user-defined peaks") %>%
  kableExtra::kable_classic("hover", full_width = TRUE)

 }

MS2

MS/MS fragmentation is important for metabolite identification. RawHummus is also able to identify problems with regard to fragmentation.

Note that if your data files do not contain any MS/MS information, this section will be left blank.

Number of MS2 Events

Number of triggered MS/MS: is the total number of MS/MS events in the sample. Similar number of triggered MS/MS events indicates good reproducibility.

#| 8.MS2, 
#| echo = FALSE, 
#| warning = FALSE

if(nrow(msdata$MS2) == 0){
  getMS2 <- NULL
  MS2_mz <- NULL
  MS2_RT <- NULL
  TableMS2Pmass <- NULL
  TableMS2PRT <- NULL
} else {
  getMS2 <- msdata$MS2 %>% 
    dplyr::group_by(filename) %>%
    dplyr::distinct(rt, premz) %>%
    dplyr::summarise("MS2 Events" = n())

  MS2unique <- msdata$MS2 %>%
    dplyr::distinct(rt, premz, voltage, filename)

  ## define cosine similarity function
  cosSim <- function(X){
  X2 <- as.matrix(X)
  denom <- solve(diag(sqrt(diag(t(X2)%*%X2))))
  Sim <- as.data.frame(denom%*%(t(X2)%*%X2)%*%denom)
  colnames(Sim) <- colnames(X)
  rownames(Sim) <- colnames(X)
  return(Sim)
  } 

  ##1.1 plot precursor ion across mz range. Split subplot for better visulization for large data files.
  ## here nsub is used to better split the plot, 8 is the max plot in one subplot
  nfile <- length(levels(as.factor(MS2unique$filename)))
  nsub <- ceiling(nfile/ceiling(nfile/8))

  MS2_mz <- MS2unique %>% 
  split(ceiling(group_indices(., filename)/nsub)) %>% 
  map(~ggplot(., aes(premz)) +
        geom_density(aes(fill = filename), alpha = 0.3) +
        facet_wrap(~ filename, scales = "free_y", ncol = 2) + 
        labs(x = "Mass Range", y = "Density", fill = "File name") +
        theme_bw() +
        theme(legend.position = "none"))

  ##1.2 for MS2_mz table
  corPmass <- MS2unique %>%
    mutate(mz_bins = cut(premz, breaks = floor(max(premz)/10))) %>% ## bin by 10 Da
    group_by(filename) %>%
    count(mz_bins) %>%
    select(n, filename) %>%
    unstack()

  if(is.data.frame(corPmass)){
  TableMS2Pmass <- as.data.frame(round(cosSim(corPmass), 3)) %>%
    dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85 , "#ff8080", "#FFFFFF"))) %>%
    kableExtra::kbl(escape = FALSE, caption = "Table 5: Precursor Distribution across mass similarity") %>%
    kableExtra::kable_classic("hover", full_width = TRUE) %>%
    kableExtra::scroll_box(width = "100%")
    } else {
      ## get the minimum scan number
      minScan <- min(map_df(corPmass, length))
      ## trim scans
      corPmass2 <- map_df(corPmass, `[`, 1: minScan)
      ## correlation table
      TableMS2Pmass <- as.data.frame(round(cosSim(corPmass2), 3)) %>%
        dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85 , "#ff8080", "#FFFFFF"))) %>%
        kableExtra::kbl(escape = FALSE, 
                    caption = "Table 5: Precursor Distribution across mass similarity") %>%
        kableExtra::kable_classic("hover", full_width = TRUE) %>%
        kableExtra::scroll_box(width = "100%")
      }


  ##2.1 plot precursor ions across RT range. Split subplot for better visualization of large data files.
  MS2_RT <- MS2unique %>% 
  split(ceiling(group_indices(., filename)/nsub)) %>% 
  map(~ggplot(., aes(rt, fill = filename)) +
        geom_density(alpha = 0.3) +
        facet_wrap(~ filename, scales = "free_y", ncol = 2) + 
        labs(x = "RT (min)", y = "Density", fill = "File name") +
        theme_bw() +
        theme(legend.position = "none"))

  ##2.2 for MS2_RT table
  corPRT <- MS2unique %>%
    mutate(rt_bins = cut(rt, breaks = floor(max(rt)/0.05))) %>% ## bin by 0.05 min
    group_by(filename) %>%
    count(rt_bins) %>%
    select(n, filename) %>%
    unstack()

  if(is.data.frame(corPRT)){
  TableMS2PRT <- as.data.frame(round(cosSim(corPRT), 3)) %>%
    dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85 , "#ff8080", "#FFFFFF"))) %>%
    kableExtra::kbl(escape = FALSE, caption = "Table 6: Precursor Distribution across RT similarity") %>%
    kableExtra::kable_classic("hover", full_width = TRUE) %>%
    kableExtra::scroll_box(width = "100%")
    } else {
      ## get the minimum scan number
      minScan <- min(map_df(corPRT, length))
      ## trim scans
      corPRT2 <- map_df(corPRT, `[`, 1: minScan)
      ## correlation table
      TableMS2PRT <- as.data.frame(round(cosSim(corPRT2), 3)) %>%
        dplyr::mutate_all(~ kableExtra::cell_spec(.x, background = ifelse(.x < 0.85 , "#ff8080", "#FFFFFF"))) %>%
        kableExtra::kbl(escape = FALSE, 
                    caption = "Table 6: Precursor Distribution across RT similarity") %>%
        kableExtra::kable_classic("hover", full_width = TRUE) %>%
        kableExtra::scroll_box(width = "100%")
      }
}

#| 8.MS2Table, 
#| echo = FALSE, 
#| out.width = "100%"

if(!is.null(getMS2)){
  kableExtra::kbl(getMS2, escape = FALSE, caption = "Table 4: Summary of MS2 events") %>%
    kableExtra::kable_classic("hover", full_width = TRUE)
} else {
  print("not exist")
}

Precursor Distribution across Mass Range

Precursor Distribution across Mass plot is used to visualize the mass range proportion of fragmented precursors.

It can be used to check peaks at which mass ranges are mainly fragmented.
High similarity in Precursor Distribution across Mass among QC samples indicates good reproducibility in MS2.

#| 9.MS2MZ, 
#| echo = FALSE, 
#| out.width = "100%"

if(!is.null(MS2_mz)){
   MS2_mz 
} else {
  print("not exist")
}

Cosine Similarity is used to quantify the similarity of precursor distribution across mass range plots among QC samples. Cosine similarity over 0.85 indicate high similarity. If the value is below 0.85, it will be highlight in red in Table 5.

Note that precursor ions were binned by 10 Da for Pearson correlation calculation.

if(!is.null(TableMS2Pmass)){
   TableMS2Pmass 
} else {
  print("not exist")
}

Precursor Distribution across RT

Precursor Distribution across RT plot is used to visualize the number of fragmented precursors at each RT (or scan). It can be useful to spot any signal dropouts during data acquisition.

In both data-dependent acquisition (DDA) and data-independent acquisition (DIA) mode, you are expected to see a continuous precursor ions distribution across the entire RT range.
High similarity in Precursor Distribution across RT among QC samples indicates good reproducibility in MS2.

#| 10.MS2RT, 
#| echo = FALSE, 
#| out.width = "100%"

if(!is.null(MS2_RT)){
   MS2_RT
} else {
  print("not exist")
}

Cosine Similarity is used to quantify the similarity of precursor distribution across RT plots among QC samples. Cosine similarity over 0.85 indicate high similarity. If the value is below 0.85, it will be highlight in red in Table 6.

Note that RT were binned by 0.05 min for Pearson correlation calculation.

#| 11.MS2RTTable, 
#| echo = FALSE, 
#| out.width = "100%"

if(!is.null(TableMS2Pmass)){
   TableMS2PRT
} else {
  print("not exist")
}

Reference

[1] Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B.S., van Ommen, B., Pujos-Guillot, E., Verheij, E., Wishart, D. and Wopereis, S., 2009. Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics, 5(4), pp.435-458.

[2] Begou, O., Gika, H.G., Theodoridis, G.A. and Wilson, I.D., 2018. Quality Control and Validation Issues in LC-MS Metabolomics. Methods in molecular biology (Clifton, NJ), 1738, pp.15-26.

Any scripts or data that you put into this service are public.

RawHummus documentation built on April 20, 2023, 9:11 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RawHummus
Raw Data Quality Control Tool for LC-MS System

RawHumms Data Quality Control Report
In RawHummus: Raw Data Quality Control Tool for LC-MS System

Introduction

Chromatogram

TIC plot

Summed TIC Plot

TIC Correlation Analysis

MS1

Auto Peaks Evaluation

User defined peaks

MS2

Number of MS2 Events

Precursor Distribution across Mass Range

Precursor Distribution across RT

Reference

Try the RawHummus package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

RawHummus Raw Data Quality Control Tool for LC-MS System

RawHumms Data Quality Control Report In RawHummus: Raw Data Quality Control Tool for LC-MS System

Introduction

Chromatogram

TIC plot

Summed TIC Plot

TIC Correlation Analysis

MS1

Auto Peaks Evaluation

User defined peaks

MS2

Number of MS2 Events

Precursor Distribution across Mass Range

Precursor Distribution across RT

Reference

Try the RawHummus package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

RawHummus
Raw Data Quality Control Tool for LC-MS System

RawHumms Data Quality Control Report
In RawHummus: Raw Data Quality Control Tool for LC-MS System